Data
Characteristics Location Data type Structure Size
Characteristics Location Data type Structure Size Where it is stored global/static location set at compile time automatic variables location set at runtime (offset from stack position)
Characteristics Location Data type Structure Size Defines acceptable values and interpretation char integer float Boolean pointer/reference
Characteristics Location Data type Structure Size Contents and layout Primitive Array Record
Characteristics Memory consumed Location Fixed/Variable Data type Structure Size Memory consumed Fixed/Variable Bounds Lower Upper
Primitives: Integers Usually a range of integer values corresponding to the number of bits used to store the value C: char, short, int, long, long long But some languages support arbitrary-sized integers Java: BigNum Python: Long
Primitives: Integers C supports signed and unsigned integers Java supports only signed integers This limitation is at the JVM level, so languages like Clojure and Scala have the same restriction
Primitives: Integers Most hardware represents signed integers using twos-complement notation: Any integer for which the most significant bit is a one is considered negative. To invert the sign of a negative number, you invert all of the bits and add one. 11101111b is -17 00010000b + 1 = 00010001b The advantage is that the same adder logic can be used for all integers
Primitives: Floating Point A means of approximating real numbers in a fixed amount of space IEEE 754 16, 32, 64, 128, 256 - bit floats
Primitives: Floating Point Sign bit, Significand, Exponent The significand is interpreted as having an assumed one as the most significant bit The granularity is a function of the size of the value
Primitives: Floating Point Take 16-bit float as an example For numbers greater than 256, the fractional part has a granularity of 0.25 Consider 300
Primitives: Decimal IBM mainframes provided efficient operations PL/I and COBOL provided primitives Binary-Coded Decimal Either eight bits or four bits per digit
Primitives: Decimal Advantage Accurate representation of money Binary floating point representations can't describe 0.1
Primitives: Character Historically, an unsigned 8-bit value Some character sets and protocols only supported seven-bit characters SMTP (simple mail transfer protocol)
Primitives: Character Eight-bit characters are not adequate for representing all of the worlds character sets Unicode provides code-points for many languages It is better to think of Unicode encoding on the entire string The most popular, UTF-8, uses a different number of bytes per code-point depending on its value English (7-bit ASCII) - one byte Korean - three bytes
Primitives: Integer Subranges Allows you to specify the minimum and maximum value of an integer Pascal provided this Type T = 0..51; From a type theory perspective, these are a bit problematic. We usually like to think of integer types as closed under addition, but the sum of two variables of type T should be stored in a bigger type Type TT = 0..102 In general, these types require runtime checks to be maintained. This is a really simple version of a dependent type (about which we may say more later).
Primitives: Enumerations A version of integer subranges that names each of the available values enum workdays { Monday, Tuesday, Wednesday, Thursday, Friday}; They are implemented as an integer "under the hood" They are particularly useful for C's switch statement
Complex Data String Arrays Associative Arrays Records Unions
Strings An ordered collection of characters Options Mutable? C,C++ : yes Java, Python : no Size stored as metadata? C : no C++ : yes (std::string) Java: yes
Strings in C
Strings in Java
Arrays Collection of one or more data elements Dimensions may be fixed or dynamic Fixed dimensions may be known at compile time or at runtime In C99, a function may declare an array with size set as a function of the function's parameters.
Dynamic Arrays Grow as needed Two implementations Contiguous memory with resize C++ std::vector Segmented C++ std::deque
Dynamic Arrays: std::vector
Dynamic Arrays: std::vector
Dynamic Arrays: std::vector
Dynamic Arrays: std::vector
Dynamic Arrays: std::vector
Dynamic Arrays: std::deque
Multi-Dimensional Arrays Guaranteed rectangular (solid) In C, int a[2][3] looks like But in Fortran, it would be 0,0 0,1 0,2 1,0 1,1 1,2 0,0 0,1 1,0 1,1 2,0 2,1
Arrays of Arrays
Arrays of Arrays
Arrays of Arrays
Associative Array Also called key-value pairs Any object can be used as a key
Associative Array C++ provides two versions std::map Requires that keys provide a < (less-than) operator Implemented with red-black tree std::unordered_map Implemented with a hash table
Record A data structure composed of a fixed number of elements Each of which is at a known offset from the beginning of the structure That may be different data types
Record In C, these are structs struct data { char a; int b; short c; float d; double e; }
Record In C, these are structs struct data { char a; int b; short c; float d; double e; } How much memory does this consume?
Record In C, these are structs struct data { char a; int b; short c; float d; double e; } How much memory does this consume? Nominally: 1+4+2+4+8 = 19 bytes
Record In C, these are structs struct data { char a; int b; short c; float d; double e; } But most architectures perform better on values that are aligned in memory according to their size How much memory does this consume?
Record In C, these are structs struct data { char a; int b; short c; float d; double e; } How much memory does this consume? 24 bytes!
Union types In C, this is like a structure, but for which its elements overlap in memory union U { float floatVal; int intVal; char charVal; };
Union types In C, this is like a structure, but for which its elements overlap in memory union U { float floatVal; int intVal; char charVal; }; Only consumes four bytes (size of largest member)
Union types In C, this is like a structure, but for which its elements overlap in memory U u; Elements accessed as u.floatVal; or u.intVal; union U { float floatVal; int intVal; char charVal; }; Only consumes four bytes (size of largest member)
Union types In C, this is like a structure, but for which its elements overlap in memory U u; Elements accessed as u.floatVal; or u.intVal; union U { float floatVal; int intVal; char charVal; }; Only consumes four bytes (size of largest member) No type checking is done! You can write as an integer and read as a float!
Algebraic Data Types Available in languages like ML, Haskell, etc. Based on building data types out of the operators + and * A record of name, age, and favorite color would be String * integer * color
Algebraic data types They are useful for building structures without resorting to the use of null pointers Consider a binary tree data BinTree: | leaf | node(value :: Number, left :: BinTree, right : BinTree) end
Algebraic Data Types data BinTree: | leaf | node(value :: Number, left :: BinTree, right : BinTree) end Every element in the binary tree must be either a node or a leaf The only way to access the value and left/right fields of a node version of a BinTree is a test that ensures that it actually is a node rather than a leaf
No null-pointer exceptions! Algebraic Data Types data BinTree: | leaf | node(value :: Number, left :: BinTree, right : BinTree) end Every element in the binary tree must be either a node or a leaf The only way to access the value and left/right fields of a node version of a BinTree is a test that ensures that it actually is a node rather than a leaf No null-pointer exceptions! Ever!