Download presentation
Presentation is loading. Please wait.
1
1 Chapter 6: Data types A data type defines a collection of data objects and a set of predefined operations on those objects Definition: A descriptor is the collection of the attributes of a variable
2
2 Primitive Data Types Those not defined in terms of other data types 1. Integer –Almost always an exact reflection of the hardware, so the mapping is trivial –There may be as many as eight different integer types in a language
3
3 Primitive Data Types 2. Floating Point –Model real numbers, but only as approximations –Languages for scientific use support at least two floating-point types; sometimes more –Usually exactly like the hardware, but not always
4
4 IEEE Floating Point Formats Double precision Single precision
5
5 Primitive Data Types 3. Decimal –For business applications (money) –Store a fixed number of decimal digits (coded) –Advantage: accuracy –Disadvantages: limited range, wastes memory
6
6 Character String Types Values are sequences of characters Design issues: 1. Is it a primitive type or just a special kind of array? 2. Is the length of objects static or dynamic?
7
7 Character String Types Perl and JavaScript –Patterns are defined in terms of regular expressions –A very powerful facility –e.g., /[A-Za-z][A-Za-z\d]+/ Java - String class (not arrays of char ) –Strings cannot be changed (immutable) –StringBuffer is a class for changeable string objects
8
8 Character String Types String Length Options: 1. Static –fixed length 2. Limited Dynamic Length - C and C++ actual length is indicated by a null character 3. Dynamic - SNOBOL4, Perl, JavaScript
9
9 Character String Types Evaluation –Aid to writability –As a primitive type with static length, they are inexpensive to provide—why have them? –Dynamic length is nice, but is it worth the expense?
10
10 Character String Types Implementation: –Static length - compile-time descriptor –Limited dynamic length - may need a run-time descriptor for length (but not in C and C++) –Dynamic length - need run-time descriptor; allocation/deallocation is the biggest implementation problem
11
11 User-Defined Ordinal Types An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers
12
12 User-Defined Ordinal Types 1. Enumeration Types - one in which the user enumerates all of the possible values, which are symbolic constants Design Issue: Should a symbolic constant be allowed to be in more than one type definition? Ex: orange in Fruits orange in Colors
13
13 User-Defined Ordinal Types Evaluation (of enumeration types): a. Aid to readability--e.g. no need to code a color as a number b. Aid to reliability--e.g. compiler can check: i. operations (don’t allow colors to be added) ii. ranges of values (if you allow 7 colors and code them as the integers, 1..7, then could flag 9 as illegal
14
14 User-Defined Ordinal Types Implementation of user-defined ordinal types –Enumeration types are implemented as integers –Subrange types are the parent types with code inserted (by the compiler) to restrict assignments to subrange variables
15
15 Arrays Design Issues: 1. What types are legal for subscripts? 2. Are subscripting expressions in element references range checked? 3. When are subscript ranges bound? 4. When does allocation take place? 5. What is the maximum number of subscripts? 6. Can array objects be initialized? 7. Are any kind of slices allowed?
16
16 Arrays Indexing is a mapping from indices to elements map(array_name, index_value_list) an element
17
17 Arrays Subscript Types: –FORTRAN, C - integer only –Pascal - any ordinal type (integer, boolean, char, enum) –Ada - integer or enum (includes boolean and char) –Java - integer types only
18
18 Arrays Categories of arrays (based on subscript binding and binding to storage) 1. Static - range of subscripts and storage bindings are static e.g. FORTRAN 77, some arrays in Ada –Advantage: execution efficiency (no allocation or deallocation)
19
19 Arrays 2. Fixed stack dynamic - range of subscripts is statically bound, but storage is bound at elaboration time (at function call time) –e.g. Most Java locals, and C locals that are not static –Advantage: space efficiency
20
20 Arrays 3. Stack-dynamic - range and storage are dynamic (decided at run time), but fixed after initial creation on for the variable’s lifetime –e.g. Ada declare blocks declare STUFF : array (1..N) of FLOAT; begin... end; –Advantage: flexibility - size need not be known until the array is about to be used
21
21 Arrays 4. Heap-dynamic – stored on heap and sizes decided at run time. e.g. (FORTRAN 90) INTEGER, ALLOCATABLE, ARRAY (:,:) :: MAT (Declares MAT to be a dynamic 2-dim array) ALLOCATE (MAT (10,NUMBER_OF_COLS)) (Allocates MAT to have 10 rows and NUMBER_OF_COLS columns) DEALLOCATE MAT (Deallocates MAT ’s storage)
22
22 Arrays 4. Heap-dynamic (continued) –Truly dynamic: In APL, Perl, and JavaScript, arrays grow and shrink as needed –Fixed dynamic: Java, once you declare the size, it doesn’t change
23
23 Arrays Number of subscripts –FORTRAN I allowed up to three –FORTRAN 77 allows up to seven –Others - no limit Array Initialization –Usually just a list of values that are put in the array in the order in which the array elements are stored in memory
24
24 Arrays Array Operations 1. APL - many, see book (p. 240-241) 2. Ada –Assignment; RHS can be an aggregate constant or an array name –Catenation; for all single-dimensioned arrays –Relational operators (= and /= only) 3. FORTRAN 90 –Intrinsics (subprograms) for a wide variety of array operations (e.g., matrix multiplication, vector dot product)
25
25 Arrays Slices –A slice is some substructure of an array; nothing more than a referencing mechanism –Slices are only useful in languages that have array operations
26
26 Arrays Slice Examples: 1. FORTRAN 90 INTEGER MAT (1:4, 1:4) MAT(1:4, 1) - the first column MAT(2, 1:4) - the second row
27
27 Example Slices in FORTRAN 90
28
28 Arrays Implementation of Arrays –Access function maps subscript expressions to an address in the array –Row major (by rows) or column major order (by columns)
29
29 Locating an Element
30
30 Compile-Time Descriptors Single-dimensioned array Multi-dimensional array
31
31 Accessing Formulas – 1D Address(A[i]) = BegAdd + (i-lb)*size + VirtualOrigin +i*size lb: lower bound size: number of bytes of one element Virtual origin allows us to do math once, so don’t have to repeat each time. You must check for valid subscript before you use this formula, as obviously, it doesn’t care what subscript you use.
32
32 Accessing Formulas Multiple Dimensions ub i : upper bound in i th dimension lb i : lower bound in i th dimension length i = ub i –lb i +1 In row-major order Address(A[i,j]) = begAdd + size((i-lb i )*length j + j-lb j ) = VO + i*mult i + j*mult j
33
33 Address(A[i,j]) = begAdd + size((i-lb i )*length j + j-lb j ) = VO + i*mult i + j*mult j = 40 +28i+20j For Example: array of floats A[0..6, 3..7] beginning at location 100 begAdd = 100 size = 4 (if floats take 4 bytes) lb i = 0 ub i = 6 length i = 7 lb j = 3 ub j = 7 length j = 5 VO = 100 + 4*(-3)*5 = 40 mult i = 28 mult j = 20 VO40 lb i 0 ub i 6 mult i 28 lb j 3 ub j 7 mult j 20 repeated for each dimension
34
34 Accessing Formulas Multiple Dimensions In column-major order Address(A[i,j]) = begAdd + size((i-lb i ) + (j-lb j )*length i ) In 3D in row major: Addr(A[I,j,k]) = begAdd + size*((i-lb i )*length j *length k ) + (j-lb j )length k + k- lb k )
35
35 Accessing Formulas Slices Suppose we want only the second row of our previous example. We would need array descriptor to look like a normal 1D array (as when you pass the slice, the receiving function can’t be expected to treat it any differently)
36
36 Address(A[i,j]) = begAdd + size((i-lb i )*length j + j-lb j ) = VO + i*mult i + j*mult j = 40 +28i+20j For Example: array of floats A[0..6, 3..7] beginning at location 100 If we want only the second row, it is like we have hardcoded the j=2 in the accessing formula, A[0..6,2] The accessing formula is simple Just replace j with 2, adjust the VO, and remove the ub,lb, and length associated with j so 40 +28i+20j = 80+28i and the table is changed accordingly (and looks just like the descriptor for a regular 1D array) VO80 lb i 0 ub i 6 mult i 28
37
37 Associative Arrays An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys. You know this as a hash table. Design Issues: 1. What is the form of references to elements? 2. Is the size static or dynamic?
38
38 Associative Arrays Structure and Operations in Perl –Names begin with % –Literals are delimited by parentheses %hi_temps = ("Monday" => 77, "Tuesday" => 79,…); –Subscripting is done using braces and keys $hi_temps{"Wednesday"} = 83; –Elements can be removed with delete delete $hi_temps{"Tuesday"};
39
39 Records A record is a possibly heterogeneous aggregate of data elements in which the individual elements are identified by names. Like a class in C++ or Java, but doesn’t allow for coding operations in the class – it is just the grouping of data. Design Issues: 1. What is the form of references? 2. What unit operations are defined?
40
40 Records Fully qualified references must include all record names Elliptical references allow leaving out record names as long as the reference is unambiguous Pascal provides a with clause to abbreviate references
41
41 Records A compile-time descriptor for a record Needs to record where each element begins
42
42 Records Comparing records and arrays 1. Access to array elements is much slower than access to record fields, because subscripts are dynamic and hence addreses must be computed (field locations are static) 2. Dynamic subscripts could be used with record field access, but it would disallow type checking and it would be much slower
43
43 Unions A union is a type whose variables are allowed to store different type values at different times during execution Design Issues for unions: 1. What kind of type checking, if any, must be done? 2. Should unions be integrated with records?
44
44 Unions Examples: 1. FORTRAN - with EQUIVALENCE –No type checking 2. Pascal - both discriminated and nondiscriminated unions e.g. type intreal = record tagg : Boolean of true : (blint : integer); false : (blreal : real); end; –Problem with Pascal’s design: type checking is ineffective
45
45 Unions A discriminated union of three shape variables
46
46 Unions Reasons why Pascal’s unions cannot be type checked effectively: a.User can create inconsistent unions (because the tag can be individually assigned) b.The tag is optional! var blurb : intreal; x : real; blurb.tagg := true; { it is an integer } blurb.blint := 47; { ok } blurb.tagg := false; { it is a real } x := blurb.blreal; { assigns an integer to real }
47
47 Unions 3. Ada - discriminated unions Reasons they are safer than Pascal: a. Tag must be present b. It is impossible for the user to create an inconsistent union (because tag cannot be assigned by itself--All assignments to the union must include the tag value, because they are aggregate values)
48
48 Unions 5. C and C++ - free unions (no tags) –Not part of their records –No type checking of references 6. Java has neither records nor unions Evaluation - potentially unsafe in most languages (not Ada)
49
49 Sets A set is a type whose variables can store unordered collections of distinct values from some ordinal type Design Issue: –What is the maximum number of elements in any set base type?
50
50 Pointers Problems with pointers: 1. Dangling pointers (dangerous) –A pointer points to a heap-dynamic variable that has been deallocated –Creating one (with explicit deallocation): a. Allocate a heap-dynamic variable and set a pointer to point at it b. Set a second pointer to the value of the first pointer c. Deallocate the heap-dynamic variable, using the first pointer
51
51 Pointers 2. Garbage: Lost Heap-Dynamic Variables A heap-dynamic variable that is no longer referenced by any program pointer –Creating one: a. Pointer p1 is set to point to a newly created heap-dynamic variable b. p1 is later set to point to another newly created heap-dynamic variable –The process of losing heap-dynamic variables is called memory leakage
52
52 Pointers C++ Reference Types –Constant pointers that are implicitly dereferenced –Used for parameters Advantages of both pass-by-reference and pass- by-value
53
53 Pointers Evaluation of pointers: 1. Dangling pointers and dangling objects are problems, as is heap management 2. Pointers are like goto's--they widen the range of cells that can be accessed by a variable 3. Pointers or references are necessary for dynamic data structures--so we can't design a language without them
54
54 Pointers Dangling pointer problem 1. Tombstone: extra heap cell that is a pointer to the heap-dynamic variable –The actual pointer variable points only at tombstones –When heap-dynamic variable deallocated, tombstone remains but set to nil –Expensive it time and space – not actually used by any popular language
55
55 Implementing Dynamic Variables
56
56 Benefit (of system controlled storage management): ability to delay the binding of a storage segment's size and/or location reuse of a storage segment for different jobs (from system supervisor point of view) reuse of storage for different data structures increased generality, not have to specify maximum data structure size dynamic data structures recursive procedures - garbage collection is automatic Benefits of programmer controlled storage management Disadvantage: burden on programmer & may interfere with necessary system control May lead to subtle errors May interfere with system-controlled storage management Advantage: difficult for system to determine when storage may be most effectively allocated and freed
57
57 Heap management –Single-size cells vs. variable-size cells –Reference counters (eager approach) vs. garbage collection (lazy approach) 1. Reference counters: maintain a counter in every cell that store the number of pointers currently pointing at the cell –Disadvantages: space required, complications for cells connected circularly Expensive - when making a pointer assignment p=q 1.decrement count for old value of p 2.if 0, return to free storage. Check if contains references to other blocks. Could be recursive 3.do pointer assignment 4.Increment reference count for q
58
58 One-bit reference counting Another variation on reference counting, called one-bit reference counting, uses a single bit flag to indicate whether each object has either "one" or "many" references. If a reference to an object with "one" reference is removed, then the object can be recycled. If an object has "many" references, then removing references does not change this, and that object will never be recycled. It is possible to store the flag as part of the pointer to the object, so no additional space is required in each object to store the count. One-bit reference counting is effective in practice because most actual objects have a reference count of one.
59
59 2. Garbage collection: allocate until all available cells allocated; then begin gathering all garbage –Every heap cell has an extra bit used by collection algorithm –All cells initially set to garbage –All pointers traced into heap, and reachable cells marked as not garbage –All garbage cells returned to list of available cells
60
60 Disadvantages: when you need it most, it works worst (takes most time when program needs most of cells in heap)
61
61 Mark-Sweep - Java uses In a mark-sweep collection, the collector first examines the program variables; any blocks of memory pointed to are added to a list of blocks to be examined. For each block on that list, it sets a flag (the mark) on the block to show that it is still required, and also that it has been processed. It also adds to the list any blocks pointed to by that block that have not yet been marked. In this way, all blocks that can be reached by the program are marked. In the second phase, the collector sweeps all allocated memory, searching for blocks that have not been marked. If it finds any, it returns them to the allocator for reuse Can find circular references. Easy if regular use of pointers (Like in LISP) All elements must be reachable by a chain of pointers which begins outside the heap Have to be able to know where all pointers are - both inside the heap and outside. How can a chain be followed from a pointer if there is no predefined location for that pointer in the pointed-to cell?
62
62 Conservative garbage collection Although garbage collection was first invented in 1958, many languages have been designed and implemented without the possibility of garbage collection in mind. It is usually difficult to add normal garbage collection to such a system, but there is a technique, known as conservative garbage collection, that can be used. The usual problem with such a language is that it doesn't provide the collector with information about the data types, and the collector cannot therefore determine what is a pointer and what isn't. A conservative collector assumes that anything might be a pointer. It regards any data value that looks like a pointer to or into a block of allocated memory as preventing the recycling of that block. You might think that conservative garbage collection could easily perform quite poorly, leaving a lot of garbage uncollected. In practice, it does quite well, and there are refinements that improve matters further. Because references are ambiguous, some objects may be retained despite being actually unreachable. In practice, this happens rarely, and refinements such as black-listing can further reduce the odds.
63
63 Incremental Garbage Collection Older garbage collection algorithms relied on being able to start collection and continue working until the collection was complete, without interruption. This makes many interactive systems pause during collection, and makes the presence of garbage collection obtrusive. Fortunately, there are modern techniques (known as incremental collection) to allow garbage collection to be performed in a series of small steps while the program is never stopped for long. In this context, the program that uses and modifies the blocks is sometimes known as the mutator. While the collector is trying to determine which blocks of memory are reachable by the mutator, the mutator is busily allocating new blocks, modifying old blocks, and changing the set of blocks it is actually looking at.
64
64 Heap Storage Management - Variable Sized Elements Memory operations Initially one large block Free space list, as space is recovered allocate from free list: first fit, best fit, worst fit compact: maintain a length of each block recover via explicit, garbage collection or reference counts Need length of each to locate pieces and coalesce fragmentation partial compaction (coalescing of adjacent free blocks) full compaction (move blocks) Show how
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.