Translation Limits

Some of the limits chosen represent interesting compromises. The goal was to allow reasonably large portable programs to be written, without placing excessive burdens on reasonably small implementations, some of which might run on machines with only 64KB of memory. In C99, the minimum amount of memory for the target machine was raised to 512 KB. In addition, the Committee recognized that smaller machines rarely serve as a host for a C compiler: programs for embedded systems or small machines are almost always developed using a cross compiler running on a personal computer or workstation. This allows for a great increase in some of the translation limits.

Nesting of blocks

Nesting of blocks is part of the language syntax and is usually implemented with a table-driven syntax analyzer. Table-driven syntax analyzers maintain their own stack, often a predefined fixed size, of information. A very large number of nested blocks are likely to cause this parser table to overflow.
C99 – 127 nesting levels of blocks C90 – 15 nesting levels of compound statements, iteration control structures, and selection control structures
Blocks are created by a number of different kinds of statements. The one most commonly thought of is a compound statement. There is no dependency on the kinds of statement that causes a block to be created. There could be any combination of if/while/for/switch or simply with no associated header. This limit might be reached in automatically generated code, but it would be considered an extreme case.
The number of constructs that could create a block increased between C90 and C99, including selection statements and their associated sub-statements, and iteration statements and their associated bodies. Although use of these constructs doubles the number of blocks created in C99, the limit on the nesting of blocks has increased by a factor of four. So, the conformance status of a program will not be adversely affected.

Nesting levels of conditional inclusion

Conditional inclusion is performed as part of pre-processing. As such, it is independent of the syntax processing performed by subsequent translation phases and is given its own limit.
C99 – 63 nesting levels of conditional inclusion
C90 – 8 nesting levels of conditional inclusion
The value of this limit is consistent with other limit values. It is something of a fortunate coincidence, because the same ratios applied in C90, where the following rationale did not apply. The value is half the limit value for nesting of blocks. This difference occurs because the C if statement is defined to create two blocks. Nesting if statements 64 deep would be sufficient to exceed the block limit and 64 nested #if directives would exceed the above limit.
Several different techniques are used to write preprocessors. Given the relatively simple C preprocessor syntax, many have a flat view of the nesting of these constructs. They simply maintain a count of the current nesting level. A full parser/syntax-based approach faces the same type of problems found in handling the C syntax limits discussed here. But, such an approach is rarely seen in C preprocessor implementations.
C99 – 63 nesting levels of parenthesized declarators within a full declarator
C90 – 31 nesting levels of parenthesized declarators within a full declarator
The limit of 12 modifiers on a declaration is likely to be reached before this limit of 63 is reached on a full declarator (unless redundant ( ) are used, or some very rarely seen structure declarations). This limit is unlikely to be reached, even in automatically generated code.
Nesting of parentheses is part of the language syntax and is usually implemented with a table-driven syntax analyzer. The same implementation details as nesting of blocks applies. The nesting of parentheses can occur within a nested block, slightly increasing the chances of reaching an internal implementation limit.
C99 – 63 nesting levels of parenthesized expressions within a full expression
C90 – 31 nesting levels of parenthesized expressions within a full expression
While it is possible to keep within this limit in an expression containing one instance of every operator (C contains 47 unique operators), an expression containing more than one instance of two operators may need to exceed this limit — for  instance, (((( (a0/x+a1) /x+a2) /x+a3) /x+ …. This limit is rarely reached except in automatically generated code. Even then it is rare.
The same implementation details as nesting of declarators applies, with the difference that nesting of expressions is more common and likely to be deeper. Although some uses of parentheses may be technically redundant, they may be used to simplify the visual appearance of an expression, or to divide an expression into meaningful chunks. While minimizing the number of parentheses in an expression may be an interesting math problem, minimization is not a desirable goal when writing source code.
C99 — 63 significant initial characters in an internal identifier or a macro name (each universal character name or extended source character is considered a single character)
C90 – 31 significant initial characters in an internal identifier or a macro name
An internal identifier is one whose name is never visible outside of the source file in which it is declared. Because it is not necessary to worry about external representation issues, it is possible to count one UCN as one character. This limit may be reached in automatically generated code. This minimum limit may be increased in a future revision of the standard.
This is one area where translators are likely to use a fixed-size data structure (usually an array). Using a linked list of characters to represent an identifier name would be a significant overhead. Having a fixed-size data structure that grows once the available free space is filled is an alternative used by some implementations.
C99 – 31 significant initial characters in an external identifier
C90 – 6 significant initial characters in an external identifier
Each universal character name specifying a short identifier of 0000FFFF or less is considered 6 characters, each universal character name specifying a short identifier of 00010000 or more is considered 10 characters, and each extended source character is considered the same number of characters as the corresponding universal character name, if any.
Information on externally visible identifiers needs to be stored in the files (usually object files) created by a translator. This information is compared against identifiers declared in other translation units when linking to build a program image. The predefined format of such files (not always within the control of the translator writer) may have limitations on what characters are acceptable in an identifier. The values of 6 and 10 were chosen so that the encodings \u1234 and \U12345678 could be used.
Historically, the number of significant characters in an external identifier was driven by the behavior of the host vendor-supplied linker. Only since the success of MS-DOS have developers become used to translator vendors supplying their own linker. Previously, most linkers tended to be supplied by the hardware vendor.
C90 had a six character limit. Such a limit is very low and is an ideal that only a few, ultra-portable programs should still aspire to. However, it is possible that some C90 translators never migrate to the C99 limit (it being uneconomical to upgrade them).
C99 – 4095 external identifiers in one translation unit
C90 – 511 external identifiers in one translation unit
This limit may appear to be generous. But, it includes identifiers declared both by the developer and the implementation (when a system header is included). This limit may be reached in automatically generated code. The standard does not define a per program limit. This is mainly because some linkers are not provided by the translator vendor and are in many ways outside of these vendors’ control.
Most vendors include a large number of identifiers in their system headers. This is particularly true on workstations where the total number of identifiers declared in system headers can exceed 15,000. Developers have no control over the contents of these headers.
C99 – 511 identifiers with block scope declared in one block
C90 – 127 identifiers with block scope declared in one block
This limit may be reached in automatically generated code. In human-written code more than 10 identifiers declared in block scope is uncommon.
Having a large number of objects defined in the same block may be an indicator that a function definition has grown too large and needs to be split up, or an indicator that a structure type needs to be created. Although this is a design issue, there is a potential impact on comprehension effort. However, your author knows of no method of comparing the comprehension effort required for the various cases and so is silent on the subject.
C99 – 4095 macro identifiers simultaneously defined in one preprocessing translation unit
C90 – 1024 macro identifiers simultaneously defined in one translation unit
This limit may appear to be generous. But, it includes macro identifiers declared both by the developer and the implementation (when a system header is included). The standard does not specify limits on the bodies of macro definitions. This is something that usually occupies much more storage than the identifier itself.
There are several public domain preprocessors that might be of use if this translator limit on number of macro identifiers is encountered. However, if the problem is caused by lack of storage on the host where the translation is performed, such a tool may not be of practical use. Using a different preprocessor, from the one provided as part of the implementation also introduces the problem of ensuring that any predefined, by one preprocessor, macro names are also defined with the same bodies when another pre-processor is used.
C99 – 127 parameters in one function definition
C90 – 31 parameters in one function definition
This limit is rarely reached except in automatically generated code, even then it is rare.
Some coding guideline documents recommend that use of file scope objects be minimized. This has the consequence of increasing the number of parameters in function definitions. Other guideline documents recommend keeping the number of parameters below a certain limit to reduce the possibility of developers making mistakes.
C99 – 127 arguments in one function call
C90 – 31 arguments in one function call
Functions declared using the ellipsis notation can be called with arguments that exceed this limit, while their definitions do not exceed the limit on the number of parameters.
Few hosted implementations place restrictions on the number of arguments passed in one function call. However, storage-limited execution environments (invariably freestanding) sometimes have limits on the number of bytes available on the function call stack.
C99 – 127 parameters in one macro definition
C90 – 31 parameters in one macro definition
Function-like macro definitions are sometimes used to provide an alternative to an actual function call. These limits ensure that such definitions can handle at least as many parameters as function definitions.
In the case where the macro body is not syntactically a function body, a large number of parameters may be the most reliable method of ensuring that the intended objects are accessed. Because macro bodies are expanded at the point of reference, the objects visible at that point (not the point of definition) are accessed.
C99 – 127 arguments in one macro invocation
C90 – 31 arguments in one macro invocation
It is now possible, in C99, to define macros taking a variable number of arguments, using a similar principle to that used in function definitions. Although, arguments corresponding to the notation are treated as a single parameter, inside the body of the macro definition, not individual ones.
Some implementations limit the size (e.g., the number of characters) of an argument (an early version of Microsoft C had a 256-character limit).
C99 – 4095 characters in a logical source line
C90 – 509 characters in a logical source line
A logical line is created from a physical line after any line splicing has taken place in translation phase 2. Line splicing is only really needed in macro definitions. This limit can really be thought of as applying to the number of characters in a macro definition. Note that this limit does not apply to the result of any macro expansion. The C Standard defines a token-based preprocessor; characters and line length need not enter into the macro expansion process.
Some implementations use a fixed-length buffer for handling single logical source lines. Others use a fixed-length buffer and handle those situations where the length of a logical line exceeds the length of that buffer as a special case. The environmental limit on the minimum number of characters that may be supported on a physical line may affect translators written in C.
C99 – 4095 characters in a character string literal or wide string literal (after concatenation)
C90 – 509 characters in a character string literal or wide string literal (after concatenation)
This limit applies after translation phase 6. If the limit on the number of characters in a logical line is taken into account then, allowing for the delimiting quote characters, the only way of reaching or exceeding this limit without exceeding any other limits is via concatenation. Strings longer than this limit can be created by copying character values into object storage. But, these would not be string literals.
If a very long string literal is needed by the application, it makes sense to try to create it as a single entity. String literals containing more than a few hundred characters are rare enough not to be worth a coding guideline.
C99 – 65535 bytes in an object (in a hosted environment only)
C90 – 32767 bytes in an object (in a hosted environment only)
Many processors have an efficient 16-bit addressing mode to access objects. It is often possible to create and access objects that exceed this addressing range, but the implementation and execution time overhead can be much higher. In many ways this limit can be seen as giving permission for implementations to stay within the natural addressing structure of their target processor (should it be a 16-bit one).
The standard does not say anything about the storage duration of objects of this size; does it apply to all of them or at least one of them? There is no specification requiring that it be possible to define more than one of these objects, or for several smaller objects whose total size is 65,535 bytes to be supported. Many freestanding environments don’t even have 64 K bytes of memory in total. However, the standard does not specify a minimum object size that must be supported in these environments.
Although the standard may require that it be possible to define an object of the specified size, it is silent on the circumstances in which a program containing such a definition must be capable of executing. The standard does not provide any mechanism for verifying that a particular function invocation will successfully start to execute. In the case of static storage allocation, the program will either start to execute, or fail to start executing. Use of dynamic allocation for objects does provide a degree of developer control of the situation where the allocation request fails. The disadvantage of such allocation methods is that it puts more responsibility for getting things right onto the developers’ shoulders. Handling execution environment object storage limitations is a design and algorithmic issue that is outside the scope of these coding guidelines.
C99 – 15 nesting levels for #included files
C90 – 8 nesting levels for #included files
This limit makes no distinction between system headers and developer-written header files. However, an implementation is required to support its own system headers whose contents are defined by the standard. If a particular implementation chooses to use nested #includes, then it is responsible for ensuring that these do not prevent a translator from meeting its obligations with regard to this limit.
Developers have no control over the nesting used by system headers. These headers are essentially black boxes, so it is permissible to ignore any nesting that occurs within them, from the point of view of calculating the maximum #include nesting level.
In a traditional development environment the source code editor does not usually have any knowledge of the character sequences it is displaying, although some editors do support a tags facility (enabling a database of identifier tags and the files that reference them to be built). Syntax highlighting is also becoming common and C++ style class browsers are growing in popularity, but structured support for displaying header file contents is still uncommon.
C99 – 1023 case labels for a switch statement (excluding those for any nested switch statements)
C90 – 257 case labels for a switch statement (excluding those for any nested switch statements)
The limit chosen in C90 had a rationale behind it. The value used for C99 has been increased in line with the percentage increases seen in other limits.
The number of case labels in a switch statement may affect the generated code. For instance, a processor instruction designed to indirectly jump based on an index into a jump table may have a 256 (bytes or addresses) limit on the size of jump table supported.
C99 — 1023 members in a single structure or union
C90 – 127 members in a single structure or union
This limit does not include the members of structure tags, or typedef names that have been defined elsewhere and are referenced in a structure or union definition.
The organization of the contents of data structures is generally determined by the application and algorithms used. While it may be difficult to imagine a structure definition containing a large number of members without some subset sharing a common characteristic, which enables them to be split into separate definitions; this does not mean that structures with large numbers of members cannot occur for a good reason.
C99 – 1023 enumeration constants in a single enumeration
C90 – 127 enumeration constants in a single enumeration
This C99 limit has a value comparable to the increased limits of other constructs sharing the same C90 limit.
Would an application ever demand a large number of enumeration constants in a single enumerator? There are some cases where a large number do occur; for instance, specifying the tokens in the SQL/2 grammar requires 318 enumeration constants. Limiting the number of enumeration constants in an enumeration would not appear to offer any benefits, especially since there is no mechanism (like there is for structure members) for creating hierarchies of enumeration constants.
C99 – 63 levels of nested structure or union definitions in a single struct-declaration-list
C90 – 15 levels of nested structure or union definitions in a single struct-declaration-list
The specification of definitions, rather than types, suggests that this limit does not include nested structures and unions that occur via the use of typedef names.
Nesting of definitions is part of the language syntax and usually implemented using a table-driven syntax analyzer. Most nested structure and union definitions occur at file scope; that is, they are not nested within other constructs. However, even if a declaration occurs in block scope, it is likely that any internal table limits will not be exceeded by a definition nested to 63 levels. This limit is only half that of nested block levels, even if creating a new level of structure nesting consumes 2 to 3 times as many table entries as a new level of nested block, there is likely to be sufficient table entries remaining to handle a deeply nested structure definition.
Advertisements
Tagged , , , ,
%d bloggers like this: