Data Types simple and compound abstraction and implementation
A brief history simple types arrays - compounds of same type strings records – compounds of different types pointers and references user defined types abstract data types objects
Simple types integer floating pointbinary-coded decimal character boolean user-defined types usually in hardware usually in software not composed of other types hardware or software implemented
Integer 2’s complement unsigned operations exact within range range depends on size of virtual cell - typical size: 1, 2, 4, 8 bytes
Floating Point based on scientific notation representations and operations are approximate range and precision depend on size of virtual cell (usually 4 or 8 bytes) bits
Binary Coded Decimal ‘exact’ decimal arithmetic decimal digits in 4 bit code range and precision depend on size of virtual cell – 2 digits per byte defined decimal point
Character ASCII – 128 character set – 1 byte Unicode – 2 byte extension usually coded as unsigned integer
Boolean 1 bit is sufficient but... no bit-wise addressability in hardware store in a byte – space inefficient store 8 per byte – execution inefficient c: 0=false, non-zero=true
User-defined types implemented (like character and boolean usually are) as a coding of unsigned integer enumerated type: (Pascal example) type suit = (club, diamond, heart, spade); var lead: suit; lead := heart; internally represented as { 0, 1, 2, 3 } operations:
User-defined types implemented as a restricted range of integer subrange type: (Ada example) subtype CENTURY20 is INTEGER range ; BIRTHYEAR: CENTURY20; BIRTHYEAR := 1981;
User-defined types Type compatibility issues: -can two enumerated types contain same constant? -can defined types be coerced with integer, with each other?
Memory management intro The parser creates a symbol table of identifiers including variables: Some information, name plus more, is bound at this time and as the program is compiled by storage in symbol table: e.g. int x; --> xtype: int addr: offset name type address
Strings First use: output formatting only Quasi-primitive type in most languages (not just arrays of character) - operations: initialization, substring, catenation, comparison The length problem: fixed or varying? No standard string model
c char *s = “abc”; int len = strlen(s); array of char with terminal: extended syntax library of methods Strings - examples JAVA String s = “abc”+x; s = s.substring(0,2); fixed length array extended syntax class with 70 methods a b c 0
Strings - representations fixed length and content (static) fixed length and varying content (FORTRAN) varying length and content by reallocation (java String) varying length and content by extension (java StringBuffer) Varying length and content(c) Static str Length Address Dynamic str MaxLength CurrLength Address char* Address In symbol table
Compound (1) Arrays collection of elements of one type access to individual elements is computed at execution time by position, O(1), or O(dim)
Arrays – design decisions indexing: dimensions – limit? recursive? types – int, other, user defined? first index: 0, 1, variable range checking – no(c), yes(java) lexemes – ‘subscripts’ = (),[]?
Arrays – design decisions binding times type, index type index range(ie array size), space static fixed stack-dynamic stack-dynamic heap-dynamic initial values of elements at storage allocation? e.g. int[] x = {1,2,3};
Arrays – operations on elements – based on type on entire array as variables - - vector and matrix operations e.g.,APL - sub array (~ substring) subarray dimensions(slices)
Arrays – storage element type, size index type index lower bound index upper bound address lower bound upper bound
Arrays – element access element type, size index type index lower bound index upper bound address lower bound i address of a[i] = address + (i-lower bound)*size
Arrays - multidimensional contiguous or not row major, column major order computed location of element
Jagged arrays Implemented as arrays of arrays, 4 index type index lower bound index upper bound address, 3 index type index lower bound index upper bound address, 7 index type index lower bound index upper bound address, 4 index type index lower bound index upper bound address, 5 index type index lower bound index upper bound address
(2) Associative Arrays - maps values accessed by keys,not indices no order of elements automatic growth of capacity operations: add/set, get, remove fast search for individual data slower for batch processing than array Java classes; Perl data structure
Associative Arrays - implementation hash tables based on key value most operations ‘near O(1)’ expanding capacity may be O(n) For a java class that combines features of array and associative array, see LinkedHashMap
(3) Records multiple elements of any type elements accessed by field name design issues: -hierarchical definition (records within records) -syntax of naming -scopes for elliptical (incomplete) reference to fields
Records - implementation a element type, size index type index lower bound index upper bound address lower bound upper bound dept array [1..4] of char 0 (offset) code address C OSC3127 dept course integer 4 type course = record dept : array[1..4] of char; code : integer; end
(4) Pascal variant records (unions) type coord = (polar, cart); point = record case rep : coord of polar: ( radians : boolean; radius : real; angle : real); cart: ( x : real; y : real); end; Note: varying space requirements discriminant field is optional (rep) type checking loopholes: Ada has similar variant record but closed these loopholes
Other unions Fortran EQUIVALENCE c union not inside records no type checking * unions do not cause type coercion - data is reinterpreted Sebesta’s c example union flextype { int intE1; float floatE1; } union flexType ell; float x; ell.intE1 = 27; x = ell.floatE1; Sebesta’s c example union flextype { int intE1; float floatE1; } union flexType ell; float x; ell.intE1 = 27; x = ell.floatE1;
(5) Sets (Pascal) defined on one (discrete) base type implementation imposes maximum size ( set of integer;- not possible) type day = (M, Tu, W, Th, F, Sa, Su); dayset = set of day; var work, wknd : dayset; today : day; today = F; work = [M, Tu, W, Th, F]; wknd = [Sa, Su, F]; if (today in work and wknd)
(6) Pointers and references references are dereferenced pointers (whatever that means) primary purpose: dynamic memory access secondary purpose: indirect addressing as in machine instructions
Pointers (and references) data type that stores an address in the format of the machine (usually 4 bytes) or a “null” a pointer must be dereferenced to get the data at the address it contains a reference is a pointer data type that is automatically dereferenced
Dereferencing example In c++: double x,y; Point p(0.0,0.0); Point *pref; pref = &p; x = p.X; y = (*pref).Y; In Java: Point2D.Double p; p = new Point2D.Double(0.0,0.0); double xCoord = p.x; Dereferencing and field access combined DereferencingField access
Pointers hold addresses Indirect addressing In c: pointer to statically allocated memory int a,b; int *iptr, *jptr; a = 100; iptr = &a; jptr = iptr; b = *jptr; int x, y, arr[4]; int *iptr; iptr = arr; arr[2] = 33; x = iptr[2]; y = *(iptr + 2); Security loophole…
Pointer arithmetic Arithmetic operations on addresses int x; int *iptr; iptr = &x; for (;;){ > iptr++; } Scan through memory starting at x
Basic dynamic memory management model: Heap manager keeps list of available memory cells “Allocate” operation transfers cell from list in heap to program “Deallocate” transfers cell from program back to list in heap Tradeoffs of fixed or variable sized cells
Problems with pointers and dynamic memory:1 Dangling reference: pointer points to de-allocated memory Point *q; Point *p = new Point(0,0); q = p; delete p; // q is dangling - reference to q should cause // an error - ‘tombstones’ will do error check
Problems with pointers and dynamic memory: 2 Memory leakage: memory cell with no reference to it Point *p = new Point(0,0); p = new Point(3,4); // memory containing Point(0,0) object // is inaccessible - counting references will help
Cause of reference problems Multiple references to a memory cell Deallocation of memory cells Where is responsibility? -automatic deallocation (garbage collection) OR -user responsibility (explicit ‘delete’)
User management of memory Dangling references can be detected as errors but not prevented - tombstones - lock and key Memory leakage is a continuing problem int *p =*q = 6; p = null; int *p =*q = 6; p = null; p 6 q p 6 q
Garbage Collection 1.Reference counting:ongoing “eager” -memory cells returned to heap as soon as all references removed. 2.Garbage collection:occasional “lazy” -let unreferenced memory cells ‘leak’ till heap is nearly empty then collect them
Reference counting: 2 p = null; 1 0 q = null; Reference count in cell Count 0 -> return cell to heap Classic problem: circular linked lists int *p = *q = 6;
Garbage Collection: (mark-sweep) 1. All cells in memory marked inaccessible(f) 2. Follow all references in program and mark cells accessible(t); f t t ‘Accessible’ marker in cell 3. Return inaccessible cells to heap f t t Classic problem: effect on program performance
A sloppy java example from Main (Data Structures) public class ObjectStack { private Object[] data; private int manyItems;.... public Object pop() { if (manyItems==0) throw new EmptyStackException(); return data[--manyItems]; //leaves reference in data }
Managing heap of variable-sized cells Necessary for objects with different space requirements Problem: tracking cell size Problem: heap defragmentation - keep blocks list in size order? - keep blocks list in sequence order?