Data Types

C language provides various data types for holding different kinds of values. There are several integral data types, a character data type, floating point data types for holding real numbers and more. In addition you can define your own data types using aggregations of the native types.
According to ISO Standard 6.2.5 paragraph 1 as:

The meaning of a value stored in an object or returned by a function is determined by the type of the expression used to access it. (An identifier declared to be an object is the simplest such expression; the type is specified in the declaration of the identifier.) Types are partitioned into object types (types that describe objects) and function types (types that describe functions). At various points within a translation unit an object type may be incomplete (lacking sufficient information to determine the size of objects of that type) or complete (having sufficient information).

C standard gives detailed specifications on the language; implementations of C must conform to the standard. An “implementation” is basically the combination of the compiler, and the hardware platform. As you will see, the C standard usually does not specify exact sizes for the native types, preferring to set a basic minimum that all conforming implementations of C must meet. Implementations are free to go above these minimum requirements. This means that different implementations can have different sizes for the same type. For example a C compiler from the fictitious company ABC running on a 64-bit Intel processor might define an int to be 64 bits, while another C compiler from the fictitious XYZ organization for a 16-bit TI embedded processor might define an int to be 16 bits. They both meet the minimum requirements so both are valid.

_Bool Data type

According to ISO Standard 6.2.5 paragraph 2 as:

An object declared as type _Bool is large enough to store the values 0 and 1.

The C language did not have a boolean type until the C99 version of the standard. In C99 the boolean variable has been added as _Bool. Additionally, a new header stdbool.h has been added for compatibility reasons. The missing identifiers are defined as macros – bool is defined as _Bool, true as 1, false as 0. Additionally, __bool_true_false_are_defined is defined as 1.
/*
 * stdbool.h - Standard Library C - _Bool
 */
#ifndef _STDBOOL_H_
    #define	_STDBOOL_H_
    #define	__bool_true_false_are_defined	1
    #ifndef __cplusplus
        #define	false	0
        #define	true	1
        #define	bool	_Bool
        #if __STDC_VERSION__ < 199901L && __GNUC__ < 3
            typedef	int	_Bool;
        #endif
    #endif          /* !__cplusplus */
#endif              /* !_STDBOOL_H_ */
Now, while looking the above code which is standard code of stdbool.h. _Bool is an integer type and thus it can store the integer values only. Stdbool.h also defines false =0 and true = 1 and bool = _Bool. Lets take the below example:
#include
int main(void){
    int a = 3944;
    long b = -199020930;
    double c = 7.534e-10;
    double * d = &c;

    _Bool ba = a;
    _Bool bb = b;
    _Bool bc = c;
    _Bool bd = d;
    _Bool be = ( 1 == 2 );

     printf( "ba = %d\n", ba );
     printf( "bb = %d\n", bb );
     printf( "bc = %d\n", bc );
     printf( "bd = %d\n", bd );
     printf( "be = %d\n", be );
}
Output:
ba = 1
bb = 1
bc = 1
bd = 1
be = 0
At the source code level the difference in specification of behavior for boolean and the other arithmetic types can be a small one. In all but the first case C defines the result of the following operations have an integer type (which differs from C++):
  • Cast of an expression to type _Bool.
  • The result of a relational, equality, or logical operator.
  • The definition of an enumerated type containing two enumeration constants.
  • The definition of a bit-field of width one.
  • The result of the remainder operator when the denominator has a value of 2.

Question: Why the _Bool is defined as int or not char?

What I see as a major drawback in this approach is, that this gives the impression to the users that the implementation of _Bool to occupy a byte is the final solution. They might then be tempted to (ab)use the (fundamentally incorrect) assumption that pointer of _Bool can be freely converted to any other pointer type; and codebase created with this assumption might later, when moving to true bits, fail.
_Bool occupies one byte, when casting to _Bool we do the test for non-zeroness and set the result to 1 or 0. This is the only required special case for _Bool. Then the bool can be directly used in any arithmetic expression without a further cast. When reading a _Bool, we either treat it as if we would read a byte or read just the lowest bit. This should give a correct implementation of _Bool.

Char Data Type

According to ISO Standard 6.2.5 paragraph 3 as:

An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be non-negative. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.

A character is a graphic sign(glyph) that may be represented differently in different representation schemes developed and used on different machines. In most computers the translation between characters and numbers is made according to the ASCII scheme. There are other schemes like EBCDIC on IBM computers – in that scheme the decimal digits and the letters doesn’t have successive values.  The “char” type of data is usually used for ASCII data, more commonly known as text. The text you are reading was originally written on a computer with a word processor that stored the words in the computer one character per byte.
An object of the char data type represents a single character of the character set used by your computer. For example, ‘A’ is a character, and so is ‘a’. Characters comprise a variety of symbols such as the alphabet (both upper A-Z and  lower case a-z) the numeral digits (0 to 9), punctuation(. , + – etc.). But a computer can only store numeric code. Therefore, characters such as ‘A’, ‘a’, ‘B’, ‘b’, and so on all have a unique numeric code that is used by computers to represent the characters. Usually, a character takes 8 bits (that is, 1 byte) to store its numeric code. The original ASCII character set has only 128 characters because it uses the lower 7 bits that can represent 28-1 (that is, 128) Characters. Character Set which has established values for 0 to 127. For the values of 128 to 255 they usually use the Extended ASCII Character Set. When we hit the capital A on the keyboard, the keyboard sends a byte with the bit pattern equal to an integer 65. When the byte is sent from the memory to the monitor, the monitor converts the integer value of 65 to into the symbol of the capital A to display on the monitor. The following code generates universal  ASCII character representation:
#include<stdio.h>
int main(void){
    int i;
    printf("Upper Case : ");
    for(i=65;i<=90;i++;){
        printf("%c ",i);
    }
    printf("\n");
    printf("Lower Case : ");
    for(i=97;i<=122;i++;){
        printf("%c ",i);
    }
    printf("\n");
    printf("Numbers : ");
    for(i=48;i<=57;i++;){
        printf("%c ",i);
    }
    return 0;
}
Output:
Upper Case: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Lower Case: a b c d e f g h i j k l m n o p q r s t u v w x y z
Numbers: 0 1 2 3 4 5 6 7 8 9

Question: Does int i; i=’A’; generate error? If not why?

The statement would not generate any error as chars are numbers that are displayed as characters based on that ASCII chart. They take values between 0 and 255; anything after that and it simply goes back to 0 and starts again (the math would be char# % 255, if you’re interested). Chars can be assigned integer values and vice versa, but if you don’t want to have that ASCII table sitting in front of you at all times, you can simply assign the character representation.
#include<stdio.h>
int main(void){
    int val;
    for(val='A';val<='Z';val++;){
        printf("ASCII Value %d Char = %c", val, val);
    }
    return 0
}
Output:
ASCII Value 65 Char=A
ASCII Value 66 Char=B
ASCII Value 67 Char=C
ASCII Value 68 Char=D
ASCII Value 69 Char=E
ASCII Value 70 Char=F
ASCII Value 71 Char=G
ASCII Value 72 Char=H
ASCII Value 73 Char=I
ASCII Value 74 Char=J
ASCII Value 75 Char=K
ASCII Value 76 Char=L
ASCII Value 77 Char=M
ASCII Value 78 Char=N
ASCII Value 79 Char=O
ASCII Value 80 Char=P
ASCII Value 81 Char=Q
ASCII Value 82 Char=R
ASCII Value 83 Char=S
ASCII Value 84 Char=T
ASCII Value 85 Char=U
ASCII Value 86 Char=V
ASCII Value 87 Char=W
ASCII Value 88 Char=X
ASCII Value 89 Char=Y
ASCII Value 90 Char=Z

Signed integer type

According to ISO Standard 6.2.5 paragraph 4 as:

There are five standard signed integer types, designated as signed char, short int, int, long int and long long int. (These and other types may be designated in several additional ways, as described in 6.7.2.) There may also be implementation-defined extended signed integer types. The standard and extended signed integer types are collectively called signed integer types.

To know the difference between signed and unsigned, kindly read the article  Singed variable store the positive and negative integers but unsigned stores only zero or positive integers only.
Having five different types does not mean that there are five different representations. Multiple integer types, based on processor implementation characteristics, are part of the fabric of C. An implementation could choose to implement. They would still be different types, irrespective of the underlying representation. Each type can be named in several equivalent ways, the names of equivalent ways are shown for each type are:
//Short short;
 short int;
 signed short;
 signed short int;
//Int
 int;
 signed int;
//Long
 long;
 long int;
 singed long int;
 singed long int;
//long long
 long long;
 long long int;
 signed long long;
 signed long long int;
The precise range of values re-presentable by a singed type depends not only on the numbers of bits used in representation, but also on the encoding technique. By far the most common technique used for integer is called two complement’s notation, in which a signed integer represented win n bits will range from -2n to 2n-1.
Standard C requires that implementations document of the integer types in the header file <limits.h>; it has also specified the maximum re-presentable range of C programmer can assume for each integer type. The following code will generate the default integer range of values:
#include
#include

int main(void){
    printf("Character Types\n");
    printf("===============\n");
    printf("1. Number of bits in char: %d\n", CHAR_BIT);
    printf("2. Size of char type is %d byte\n", (int)sizeof(char));
    printf("3. Signed char min: %d max: %d\n", SCHAR_MIN, SCHAR_MAX);
    printf("4. Unsigned char min: 0 max: %u\n", (unsigned int)UCHAR_MAX);
    printf("5. Default char: ");
    if (CHAR_MIN< 0){
        printf("signed\n");
    } else if (CHAR_MIN== 0){
        printf("unsigned\n");
    } else {
        printf("non-standard\n");
    }
    int i;
    i=(int)sizeof(short);
    i=i*8;

    printf("\nShort Int Types\n");
    printf("===============\n");
    printf("1. Number of bits in short int: %d\n", i);
    printf("2. Size of short int type is %d byte\n", (int)sizeof(short));
    printf("3. Signed short int min: %d max: %d\n", SHRT_MIN, SHRT_MAX);
    printf("4. Unsigned short int min: 0 max: %u\n", (unsigned int)USHRT_MAX);

    i=(int)sizeof(int);
    i=i*8;

    printf("\nInt Types\n");
    printf("=========\n");
    printf("1. Number of bits in int: %d\n", i);
    printf("2. Size of int type is %d byte\n", (int)sizeof(int));
    printf("3. Signed int min: %d max: %d\n", INT_MIN, INT_MAX);
    printf("4. Unsigned int min: 0 max: %u\n", (unsigned int)UINT_MAX);

    i=(int)sizeof(long);
    i=i*8;

    printf("\nLong Int Types\n");
    printf("==============\n");
    printf("1. Number of bits in long int: %d\n", i);
    printf("2. Size of long int type is %d byte\n", (int)sizeof(long));
    printf("3. Signed long int min: %ld max: %ld\n", LONG_MIN, LONG_MAX);
    printf("4. Unsigned long int min: 0 max: %lu\n", ULONG_MAX);

    i=(int)sizeof(long long);
    i=i*8;

    printf("\nLong Long Int Types\n");
    printf("===================\n");
    printf("1. Number of bits in long long int: %d\n", i);
    printf("2. Size of long long int type is %d byte\n", (int)sizeof(long long));
    printf("3. Signed long long int min: %lld max: %lld\n", LLONG_MIN, LLONG_MAX);
    printf("4. Unsigned long long int min: 0 max: %llu\n", ULLONG_MAX);

    return 0;
}
Output:
Character Types
===============
1. Number of bits in char: 8
2. Size of char type is 1 byte
3. Signed char min: -128 max: 127
4. Unsigned char min: 0 max: 255
5. Default char: signed
Short Int Types
===============
1. Number of bits in short int: 16
2. Size of short int type is 2 byte
3. Signed short int min: -32768 max: 32767
4. Unsigned short int min: 0 max: 65535
Int Types
=========
1. Number of bits in int: 32
2. Size of int type is 4 byte
3. Signed int min: -2147483648 max: 2147483647
4. Unsigned int min: 0 max: 4294967295
Long Int Types
==============
1. Number of bits in long int: 32
2. Size of long int type is 4 byte
3. Signed long int min: -2147483648 max: 2147483647
4. Unsigned long int min: 0 max: 4294967295
Long Long Int Types
===================
1. Number of bits in long long int: 64
2. Size of long long int type is 8 byte
3. Signed long long int min: -9223372036854775808 max: 9223372036854775807
4. Unsigned long long int min: 0 max: 18446744073709551615
Important Point: Standard C had defined the sizeof char, short, long and long long, in bits and types with minimum and maximum values in numbers but not defined any such thing for int. It is dependent on compiler implementation or the environment dependent. On 16 bit processor int storage is 16 bits and on 32 bits processors it is 32 bits.

signed char Data type

According to ISO Standard 6.2.5 paragraph 5 as:

An object declared as type signed char occupies the same amount of storage as a “plain” char object. A “plain” int object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range INT_MIN to INT_MAX as defined in the header <limits.h>).

and ISO Standard 6.2.5 paragraph 15 as:

The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.

There’s no dedicated “character type” in C language. char is an integer type, same (in that regard) as int, short and other integer types. char just happens to be the smallest integer type. So, just like any other integer type, it can be signed or unsigned. It is true that (as the name suggests) char is mostly intended to be used to represent characters. But characters in C are represented by their integer “codes”, so there’s nothing unusual in the fact that an integer type char is used to serve that purpose. The only general difference between char and other integer types is that plain char is generally not synonymous with signed char, while with other integer types the signed modifier is optional/implied.
Historically the “plain” int type was usually the same size as either the types short or long. It is rare to find an int type having a size that is between the sizes of these two types. This existing practice has affected existing code, which is often found to contain implicit assumptions about various integer types having the same widths. The choice of representation need not be decided purely on how the available processor instructions manipulate their operands. In function calls the most commonly passed argument type is “plain” int. The organization of the function call stack is an important design issue.
This is implementation dependent, as the C standard does NOT define the signed-ness of “char”. Depending on the platform, char may be signed or unsigned, so you need to explicitly ask for “signed char” or “unsigned char” if your implementation depends on it. Just use “char” if you intend to represent characters from strings, as this will match what your platform puts in the string.
The difference between signed char and unsigned char is as you’d expect. On most platforms, signed char will be an 8-bit two’s complement number ranging from -128 to 127, and unsigned char will be an 8-bit unsigned integer (0 to 255). Note the standard does NOT require that char types have 8 bits, only that sizeof(char) return 1. You can get at the number of bits in a char with CHAR_BIT in limits.h. There are few if any platforms today where this will be something other than 8, though.
An unsigned char is a (unsigned) byte value (0 to 255). You may be thinking of “char” in terms of being a “character” but it is really a numerical value. The regular “char” is signed, so you have 128 values, and these values map to characters using ASCII encoding. But in either case, what you are storing in memory is a byte value.

signed integer corresponding unsigned integer

According to ISO Standard 6.2.5 paragraph 6 as:

For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements. The type _Bool and the unsigned integer types that correspond to the standard signed integer types are the standard unsigned integer types. The unsigned integer types that correspond to the extended signed integer types are the extended unsigned integer types. The standard and extended unsigned integer types are collectively called unsigned integer types.

This is another case of C providing a construct that mirrors a data type, and associated operations, commonly found in hardware processors. The unsigned integer types can generally represent positive values twice as great as their corresponding signed counterparts.

Question: What is Alignment?

In English, the ‘Alignment’ means, “The process of adjusting a mechanism such that its parts are aligned; the condition of having its parts so adjusted; the spatial property possessed by an arrangement or position of things in a straight line or in parallel lines.” According to Section 3.2 alignment means:

requirement that objects of a particular type be located on storage boundaries with addresses that are particular multiples of a byte address

Alignment is a very important issue for implementations. Internally it usually involves trade-offs between storage and runtime efficiency. Externally it is necessary to deal with parameter passing interfaces and potentially the layout of structure members. The C language is fortunate in that many system interfaces are specified in terms of C. This means that other languages have to interface to its way of doing things, not the other way around. In an ideal world there would be no alignment requirements. In practice the designers of some processors have placed restrictions on the fetching of variously sized objects from storage. The underlying reason for these restrictions is to simplify (reduce cost) and improve the performance of the processor. Some processors do not have alignment requirements, but may access storage more quickly if the object is aligned on a particular address boundary.
The requirement that a pointer to an object behave the same as a pointer to an array of that object’s type forces the requirement that sizeof(T) be a multiple of the alignment of T. If two objects, having different types, have the same alignment requirements then their addresses will be located on the same multiple of a byte address. Objects of character type have the least restrictive alignment requirements, compared to objects of other types. Alignment and padding are also behind the assumptions that need to be made for the common initial sequence concept to work. Thus, Alignment means rounded up value. Section data size is rounded up for efficiency because the OS moves stuff around in chunks anyway.
Advertisements
Tagged ,
%d bloggers like this: