Linkages of identifiers

Mostly, Computer programmers, are unfamiliar with the details of the linking process. Terms such as externally visible or just external not externally visible (or not external, only visible within one translation unit) are used by developers when discussing issues covered by the C term linkage. From the developers point of view, the most important property is whether an object can be referenced from more than one translation unit. The common usage terms external and not external effectively describe the two states that are of interest to developers. Unless a developer wants to become an expert on the C Standard, the cost/benefit of learning how to apply the technically correct terminology (linkage) is not worthwhile.

What does a linker do?

It’s simple: a linker converts object files into executable and shared libraries. Let’s look at what that means. For cases where a linker is used, the software development process consists of writing program code in some language: e.g., C or C++ or Fortran (but typically not Java). A compiler translates this program code, which is human readable text, into another form of human readable text known as assembly code. Assembly code is a readable form of the machine language which the computer can execute directly. An assembler is used to turn this assembly code into an object file. For completeness, note that some compilers include an assembler internally, and produce an object file directly.

History

Before 1940’s, the earliest computers were programmed entirely in machine language. Programmers would write out the symbolic programs on sheets of paper, hand assemble them into machine code and then toggle the machine code into the computer, or perhaps punch it on paper tape or cards. If any instruction had to added or deleted, the entire program had to inspected again and re-analyse the code or perhaps re-write the code from starting to end again.
In between 1940 – 1952, the Assembly Languages are developed. It is also known as second generation programming language. One of the early developments was a symbolic assembler. Instead of writing down a series of binary numbers, the programmer would write down a list of machine instructions, using human-readable symbols. A special program, the assembler, would convert these symbolic instructions into object or machine code. Assembly languages have the advantage that they are easier to understand than raw machine code, but still give access to all of the power of the computer (as each assembler symbol translates directly into a specific machine instruction). Assembly languages have the disadvantage that they are still very close to machine language. These can be difficult for a human to follow and understand and time-consuming for a human to write. Also, programs written in assembly are tied to a specific computer hardware and can’t be reused on another kind of computer. The human readable version of assembly code is known as source code (it is the source that the assembler converts into object code).
After 1952, the third generation programming languages started to appear, they are also known as high level programming language. A third-generation programming language (3GL) is a refinement of a second-generation programming language. The second generation of programming languages brought logical structure to software. The third generation brought refinements to make the languages more programmer-friendly. This includes features like improved support for aggregate data types, and expressing concepts in a way that favours the programmer, not the computer (e.g. no longer needing to state the length of multi-character (string) literal’s in Fortran). A third generation language improves over a second generation language by having the computer take care of non-essential details, not the programmer. “High level language” is a synonym for third-generation programming language. First introduced in the late 1950s, Fortran, ALGOL, and COBOL are early examples of this sort of language. Most popular general-purpose languages today, such as C, C++, C#, Java, BASIC and Delphi, are also third-generation languages.

Basic Linker Data Types

The linker operates on a small number of basic data types: symbols, relocations, and contents. These are defined in the input object files. Here is an overview of each of these.
A symbol is basically a name and a value. Many symbols represent static objects in the original source code–that is, objects which exist in a single place for the duration of the program. For example, in an object file generated from C code, there will be a symbol for each function and for each global and static variable. The value of such a symbol is simply an offset into the contents. This type of symbol is known as a defined symbol. It’s important not to confuse the value of the symbol representing the variable my_global_var with the value of my_global_var itself. The value of the symbol is roughly the address of the variable: the value you would get from the expression &my_global_var in C. Symbols are also used to indicate a reference to a name defined in a different object file. Such a reference is known as an undefined symbol. During the linking process, the linker will assign an address to each defined symbol, and will resolve each undefined symbol by finding a defined symbol with the same name.
A relocation is a computation to perform on the contents. Most relocations refer to a symbol and to an offset within the contents. Many relocations will also provide an additional operand, known as the added. A simple, and commonly used, relocation is “set this location in the contents to the value of this symbol plus this added.” The types of computations that relocations do are inherently dependent on the architecture of the processor for which the linker is generating code. For example, RISC processors which require two or more instructions to form a memory address will have separate relocations to be used with each of those instructions; for example, “set this location in the contents to the lower 16 bits of the value of this symbol.” During the linking process, the linker will perform all of the relocation computations as directed. A relocation in an object file may refer to an undefined symbol. If the linker is unable to resolve that symbol, it will normally issue an error (but not always: for some symbol types or some relocation types an error may not be appropriate).
The contents are what memory should look like during the execution of the program. Contents have a size, an array of bytes, and a type. They contain the machine code generated by the compiler and assembler (known as text). They contain the values of initialized variables (data). They contain static unnamed data like string constants and switch tables (read-only data or rdata). They contain uninitialized variables, in which case the array of bytes is generally omitted and assumed to contain only zeroes (bss). The compiler and the assembler work hard to generate exactly the right contents, but the linker really doesn’t care about them except as raw data. The linker reads the contents from each file, concatenated them all together sorted by type, applies the relocations, and writes the result into the executable file.

Basic Linker Operation

At this point we already know enough to understand the basic steps used by every linker:
  • Read the input object files. Determine the length and type of the contents. Read the symbols.
  • Build a symbol table containing all the symbols, linking undefined symbols to their definitions.
  • Decide where all the contents should go in the output executable file, which means deciding where they should go in memory when the program runs.
  • Read the contents data and the relocations. Apply the relocations to the contents. Write the result to the output file.
  • Optionally write out the complete symbol table with the final values of the symbols.

C – International Standard

According to International Standard Committee Draft, June, 25, 2010, defined the linkage of identifier as under:
  1. An identifier declared in different scopes or in the same scope more than once can be made to refer to the same object or function by a process called linkage. There are three kinds of linkage: external, internal, and none.
  2. In the set of translation units and libraries that constitutes an entire program, each declaration of a particular identifier with external linkage denotes the same object or function. Within one translation unit, each declaration of an identifier with internal linkage denotes the same object or function. Each declaration of an identifier with no linkage denotes a unique entity.
  3. If the declaration of a file scope identifier for an object or a function contains the storage class specifier static, the identifier has internal linkage.
  4. For an identifier declared with the storage-class specifier extern in a scope in which a prior declaration of that identifier is visible, if the prior declaration specifies internal or external linkage, the linkage of the identifier at the later declaration is the same as the linkage specified at the prior declaration. If no prior declaration is visible, or if the prior declaration specifies no linkage, then the identifier has external linkage.
  5. If the declaration of an identifier for a function has no storage-class specifier, its linkage is determined exactly as if it were declared with the storage-class specifier extern. If the declaration of an identifier for an object has file scope and no storage-class specifier, its linkage is external.
  6. The following identifiers have no linkage: an identifier declared to be anything other than an object or a function; an identifier declared to be a function parameter; a block scope identifier for an object declared without the storage-class specifier extern.
  7. If, within a translation unit, the same identifier appears with both internal and external linkage, the behavior is undefined.
Rule No.1 describe what are linkage and kinds of linkage’s and from Rule No.2 to Rule No.7, it describe the conditions of external, internal and no linkage. Let us first study what  are 3 kinds of Linkage’s they are:

External Linkage

The extern keyword tells the compiler that a variable is declared in another source module (outside of the current scope). The linker then finds this actual declaration and sets up the extern variable to point to the correct location. Variables described by extern statements will not have any space allocated for them, as they should be properly defined elsewhere. If a variable is declared extern, and the linker finds no actual declaration of it, it will throw an “Unresolved external symbol” error.

Internal Linkage

A function declared inside a block will usually have external linkage. An object declared inside a block will usually have external linkage if it is specified extern. If a variable that has static storage is defined outside a function, the variable has internal linkage and is available from the point where it is defined to the end of the current translation unit. Internal linkage limits access to the data or function to the current file. The following kinds of identifiers have internal linkage:
  • Objects, references, or functions explicitly declared static.
  • Objects or references declared in namespace scope (or global scope in C) with the specifier const and neither explicitly declared extern, nor previously declared to have external linkage.
  • Data members of an anonymous union.

No Linkage

The following kinds of identifiers have no linkage:
  • Names that have neither external or internal linkage
  • Names declared in local scopes (with exceptions like certain entities declared with the extern keyword)
  • Identifiers that do not represent an object or a function, including labels, enumerators, typedef names that refer to entities with no linkage, type names, function parameters, and template names
You cannot use a name with no linkage to declare an entity with linkage. For example, you cannot use the name of a class or enumeration or a typedef name referring to an entity with no linkage to declare an entity with linkage. The following example demonstrates this:
int main() {
    struct A {};       // extern A a1;
    typedef A myA;     // extern myA a2;
}
The compiler will not allow the declaration of a1 with external linkage. Class A has no linkage. The compiler will not allow the declaration of a2 with external linkage. The typedef name a2 has no linkage because A has no linkage.

Important Points about extern keywords with examples

1. It is default storage class of all global variables as well all functions.:
#include <stdio.h>
int i;                  //By default it is extern variable
int main(){
    printf("%d",i);
    return 0;
}
Output: 0
2. When we use extern modifier with any variables it is only declaration i.e. memory is not allocated for these variable. If it is not initialize the compiler would show error unknown symbol as under:
#include <stdio.h>
extern int i;                  //extern variable
int main(){
    printf("%d",i);
    return 0;
}
Output:  Compilation  error, undefined symbol i
To define a variable i.e. allocate the memory for extern variables it is necessary to initialize the variables. For example:
#include <stdio.h>
extern int i=10;                  //extern variable
int main(){
    printf("%d",i);
    return 0;
}
Output:  10
3. If you will not use extern keyword with global variables then compiler will automatically initialize with default value to extern variable. Default initial value of extern integral type variable is zero otherwise null. For example:
#include <stdio.h>
char c;
int i;
float f;
char *str;
int main(){
    printf("%d %d %f %s", c, i, f, str);
    return 0;
}
Output:  0 0 0.000000  (null)
4. We cannot initialize extern variable locally i.e. within any block either at the time of declaration or separation. Extern variable can be initialize globally only. For example:
#include <stdio.h>
int main(){
    extern int i = 10;     //Trying to initialize extern var locally
    printf("%d", i);
    return 0;
}
Output:  Compilation error: Cannot initialize extern variable
5. If we declare any variable as extern variable then it searches that variable either it has been initialized or not. If it has been initialized which may be either extern or static* then it is ok otherwise compiler will show an error. For example’s:
#include <stdio.h>
int main(){
    extern int i;         //It will search initialization
    printf("%d", i);
    return 0;
}
int i =20;
Output:  20
#include <stdio.h>
int main(){
    extern int i;        //It will search initialization
    printf("%d", i);
    return 0;
}
extern int i=20;         //initialization of extern var
Output:  20
#include <stdio.h>
int main(){
    extern int i;        //It will search initialization
    printf("%d", i);
    return 0;
}
static int i=20;        //initialization of static var i
Output:  20
#include <stdio.h>
int main(){
    extern int i;       //declared but not initialization
    printf("%d", i);
    return 0;
}
Output: Compilation error: unknown symbol i
6. A particular extern variable can be declared many times but we can initialize at only one time. For example:
extern int i;              //Declaration of variable
int i=25;                  //Initialization of var
extern int i;              // Again declaration
#include <stdio.h>
int main(){
    extern int i;          //Again declaration
    printf("%d", i);
    return 0;
}
Output:  25
extern int i;               //Declaration of variable
int i=25;                   //Initialization of var
#include <stdio.h>
int main(){
    printf("%d", i);
    return 0;
}
int i =20;                  //Again Initialization
Output:  Compiler error: Multiple initialization of variable i
7. We cannot write any assignment statement globally. Assigning any value to the variable at the time of declaration is known as initialization while assigning any value to variable not at the time of declaration is known assignment. For example:
#include <stdio.h>
extern int i;              //Declaration of variable
int i=10;                  //Initialization of var
i = 25;                    //Assignment statement
int main(){
    printf("%d", i);
    return 0;
}
Output:  Compilation error
#include <stdio.h>
extern int i;              //Declaration of variable
int main(){
    i=25;                  //Assignment Statement
    printf("%d", i);
    return 0;
}
int i = 10;               //Initialization of var
Output:  25
Advertisements
Tagged , , , , , , , , , , ,

4 thoughts on “Linkages of identifiers

  1. interesting information, thanks. hope you add some more posts soon.

  2. Patti Mansel says:

    good work, continue the great blog.

  3. cool website, rss following now and wish to see some similar posts soon.

  4. made my way to your site from bing and and am glad i found it, hope you keep up the good work

Comments are closed.

%d bloggers like this: