Download presentation
Presentation is loading. Please wait.
Published byJames Winnett Modified over 9 years ago
1
CSCI 330: Programming Language Concepts Instructor: Pranava K. Jha Data Types-II: Composite Data Types
2
Agenda 1.Records and Variant Records 2.Arrays 3.Strings 4.Sets 5.Pointers And Recursive Types 6.Lists
3
Records Allow related data of heterogeneous types to be stored and manipulated together Usually laid out contiguously Possible holes for alignment reasons Smart compilers may rearrange fields to minimize holes (C compilers promise not to) Different terms in – Algol 68, C, C++, and Common Lisp: struct – Java, C++, C#: class – Pascal: record – ML, Python, Ruby: lists (no keyword for the declaration)
4
Examples In Pascal: type two_chars = packed array [1..2] of char; type element = record name: two_chars; atomic_number: integer; atomic_weight: real; metallic: Boolean end; In C: struct element { char name[2]; int atomic_number; double atomic_weight; bool metallic; };
5
Memory Layout of Records Likely layout in memory for objects on a 32-bit machine Alignment restrictions lead to the shaded “holes.”
6
Packed Records Pascal allows the programmer to specify that a record type (or an array, set, or file type) should be packed: type element = packed record name : two_chars; atomic_number : integer; atomic_weight : real; metallic : Boolean end;
7
Memory Layout of Packed Records Likely memory layout for packed records. The atomic_number and atomic_weight fields are nonaligned, and can only be read or written via multi- instruction sequences.
8
Memory Layout of Rearranged Records Rearranging record fields to minimize holes. By sorting fields according to the size of their alignment constraint, a compiler can minimize the space devoted to holes, while keeping the fields aligned.
9
Variant Records A variant record provides two or more alternative fields or collections of fields, only one of which is valid at any given time. type element = record name : two_chars; atomic_number : integer; atomic_weight : real; metallic : Boolean; case naturally_occurring : Boolean of true : ( source : string_ptr; prevalence : real; ); false : ( lifetime : real; ) end;
10
Memory Layout of Variants Likely memory layouts for element variants. The value of the naturally occurring field (shown here with a double border) determines which of the interpretations of the remaining space is valid. Type string_ptr is assumed to be represented by a (four-byte) pointer to dynamically allocated storage.
11
Arrays Arrays are the most common and important composite data types Unlike records, which group related fields of disparate types, arrays are usually homogeneous Semantically, they can be thought of as a mapping from an index type to a component or element type A slice or section is a rectangular portion of an array.
12
Arrays Array slices(sections) in Fortran90. Much like the values in the header of an enumeration- controlled loop (Section6.5.1), a: b: c in a subscript indicates positions a, a+c, a+2c,...through b. If a or b is omitted, the corresponding bound of the array is assumed. If c is omitted, 1 is assumed. It is even possible to use negative values of c in order to select positions in reverse order. The slashes in the second subscript of the lower right example delimit an explicit list of positions. second subscript of the lower right example delimit an explicit list of positions.
13
Arrays Dimensions, Bounds, and Allocation Global lifetime, static shape: allocate space for the array in static global memory Local lifetime, static shape: space can be allocated in the subroutine’s stack frame at run time Local lifetime, shape bound at elaboration time: an extra level of indirection is required to place the space for the array in the stack frame of its subroutine (Ada, C) Arbitrary lifetime, shape bound at elaboration time: at elaboration time either space is allocated or a preexistent reference from another array is assigned (Java, C#) Arbitrary lifetime, dynamic shape: must generally be allocated from the heap. A pointer to the array still resides in the fixed-size portion of the stack frame (if local lifetime).
14
Memory Layout of Arrays Arrays in most language implementations are stored in contiguous locations in memory Like Records, arrays may contain “holes” due to alignment requirement Some languages (e.g., Pascal) allow the programmer to specify that an array be packed For multidimensional arrays, there are two layouts: row-major order and column-major order – In row-major order, consecutive locations in memory hold elements that differ by one in the final subscript (except at the ends of rows). – In column-major order, consecutive locations hold elements that differ by one in the initial subscript
15
Row- and Column-major Layout
16
Strings Strings are really just arrays of characters They are often special-cased, to give them flexibility (like polymorphism or dynamic sizing) that is not available for arrays in general. – It's easier to provide these things for strings than for arrays in general because strings are one-dimensional and (more important) non- circular.
17
Strings In some languages, strings have special status, with operations that are not available for arrays of other sorts. – It is easier to provide special features for strings than for arrays in general, because strings are one-dimensional. – Manipulation of variable-length strings is fundamental to a huge number of computer applications. Particularly powerful string facilities are found in various scripting languages such as Perl, Python and Ruby. C, Pascal, and Ada require that the length of a string-valued variable be bound no later than elaboration time, allowing the variable to be implemented as a contiguous array of characters in the current stack frame. Lisp, Icon, ML, Java, C# allow the length of a string-valued variable to change over its lifetime, requiring that the variable be implemented as a block or chain of blocks in the heap.
18
Sets A set is an unordered collection of an arbitrary number of distinct values of a common type. Introduced by Pascal, and are found in many more recent languages as well. Many ways to implement sets, including arrays, hash tables, and various forms of trees. The most common implementation employs a bit vector whose length (in bits) is the number of distinct values of the base type. – Operations on bit-vector sets can make use of fast logical instructions on most machines. – Union is bit-wise or; intersection is bit-wise and; difference is bit-wise not, followed by bit-wise and.
19
Pointers And Recursive Types A recursive type is one whose objects may contain one or more references to other objects of the type. Pointers serve two purposes: – Efficient (and sometimes intuitive) access to elaborated objects (as in C). – Dynamic creation of linked data structures, in conjunction with a heap storage manager. In languages like C, Pascal, or Ada, which use a value model of variables, recursive types require the notion of a pointer. (Pointers aren't needed with a reference model.) In some languages (e.g., Pascal, Ada 83, and Modula-3), pointers are restricted to point only to objects in the heap.
20
Pointers (contd.) A dangling reference is a live pointer that no longer points to a valid object. Two sources of dangling pointers: – A pointer in a wider scope still refers to a local object of a subroutine that has returned. – the programmer reclaims an object to which pointers still refer. Two implementation mechanisms to catch dangling pointers: – Tombstones – Locks and Keys
21
Tombstones Tombstones are a mechanism to detect dangling pointers that can appear in certain computer programming languages, e. g. C, C++ and assembly languages, and to act as a containment to their dangerous effectsdangling pointersprogramming languages CC++assembly languages – The idea is simple: Rather than have a pointer refer to an object directly, introduce an extra level of indirection. – When an object is allocated, the language run-time system allocates a tombstone. – The pointer contains the address of the tombstone; the tombstone contains the address of the object. – When the object is reclaimed, the tombstone is modified to contain a value that cannot be a valid address.
22
Every pointer is a tuple consisting of an address and a key. – Every object in the heap begins with a lock – A pointer to an object in the heap is valid only if the key in the pointer matches the lock in the object. – When the run-time system allocates a new heap object, it generates a new key value – When an object is reclaimed, its lock is changed to some arbitrary value (e.g., zero) so that the keys in any remaining pointers will not match
23
Garbage Collection The language implementation notices when objects are no longer useful and reclaim them automatically More or less essential for functional languages – delete is a very imperative sort of operation – The ability to construct and return arbitrary objects from functions requires unlimited extent and hence heap allocation to accommodate it Popular for imperative languages as well; e.g., in Clu, Cedar, Modula-3, Java, C#, and all the major scripting languages. A typical tradeoff between convenience and safety on the one hand and performance on the other.
24
Lists Defined recursively as either the empty list or a pair consisting of an object (which may be either a list or an atom) and another (shorter) list Ideally suited to programming in functional and logic languages. Several scripting languages, notably Perl and Python, provide extensive list support
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.