Intro to Data Structures and ADTs Chapter 2
Goal of Data Structures Organize data Facilitate efficient … storage retrieval manipulation Select and design appropriate data types This is the real essence of OOP of data
Simplicity Tradeoff Simplicity of data organization versus Simplicity/elegance of algorithms Simple (unsophisticated) data structure may require much work for processing data. More complex data organization May yield nicer algorithms for the basic operations
Issues Amount of data Number of accesses and use required phone book lookup (Hallsville vs. Dallas) linear search? Number of accesses and use required compiler's lookup of an identifier's type in a symbol table linear, binary, hash table? Static vs. dynamic nature of the data consider a text processor array, vector?
Abstract Data Types (ADT) Defn: collection of related data items … together with an associated set of operations Why "abstract?" Data, operations, and relations are studied independent of implementation. What not how is the focus.
Implementation of an ADT Defn: storage (data) structures which store the data items … and algorithms for the basic operations These data structures are provided in a language or built from the language constructs (user defined)
Implementation of an ADT Successful software design uses data abstraction We separate the definition of a data type from the implementation
C++ Types
Memory 2-state devices « bits 0 and 1 Organized into bytes (8 bits) and words (machine dependent — e.g., 4 bytes). Each byte (or word) has an address to store and retrieve contents of any given memory location.
Simple Data Types The most basic form of data sequences of bits values are atomic — can't be subdivided are ADTs. Implementations have: Storage structures: memory locations Algorithms: system hardware/software to do basic operations.
Simple Data Types Boolean Character values { false, true } could be stored in bits, usually use a byte operations &&, || Character byte for ASCII, EBCDIC 2 bytes for Unicode (Java) operations ==, <, >, etc. using numeric code
Simple Data Types Unsigned Integer data Signed integer Representations non-negative unsigned integers stored in base-two in a fixed number of bits Signed integer stored in a fixed number of bits Representations sign-magnitude two's complement
Sign-magnitude Representation Save one bit (usually most significant) for sign (0 = +, 1 = – ) Use base-two representation in the other bits. 88 = 0000000001011000 -88 = 1 000000001011000 Cumbersome for arithmetic computations
Two's Complement Representation For nonnegative n: Use ordinary base-two representation with leading (sign) bit 0 For n < 0 Find w-bit base-2 representation of |n| Complement each bit. Add 1
Two's Complement Representation Example: –88 88 as a 16-bit base-two number 0000000001011000 Complement this bit string 1111111110100111 Add 1 1111111110101000 WHY?
Two's Complement Representation Works well for arithmetic computations 5 + –6: 0000000000000101 +1111111111111010 What gets done to the bits to give this answer? 1111111111111111
Biased Representation Add a constant bias to the number typically 2w-1 (where w is number of bits) then find its base-two representation Example: 88 using w = 16 bits and bias of 215 = 32768 Add the bias to 88, giving 32856 Represent the result in base-two notation: 1000000001011000
Biased Representation Example -88 using w = 16 bits and bias of 215 = 32768 Add the bias to -88, giving 32680 Represent the result in base-two notation: 0111111110101000 Good for comparisons; so, it is commonly used for exponents in floating-point representation of reals.
Problems with Integer Representation Limited Capacity — a finite number of bits An operation can produce a value that requires more bits than maximum number allowed. This is called overflow . None of these is a perfect representation of (mathematical) integers Can only store a finite (sub)range of them.
Real Data Types float and double in C++ Use single precision (IEEE Floating-Point) Store: sign of mantissa in leftmost bit (0 = +, 1 = – ) biased binary rep. of exponent in next 8 bits (bias = 127) bits b2b3 . . .b24 mantissa in rightmost 23 bits. Need not store b1 — know it's 1)
Real Data Example: 22.625 = 10110.1012 Floating point form: 1.01101012 * 24
Problems with Real Representation Exponent overflow and underflow Round off error Most reals do not have terminating binary representations. Example: 0.7 = (0.10110011001100110011001100. . .)2
Problems with Real Representation Round off error may be compounded in a sequence of operations. Recall the sums of calculated currency values Be careful in comparing reals with == and !=. Instead use comparison for closeness if (abs (x – 12.34) < 0.001) …
C-Style Data Structures Arrays Single dimension int numList [30]; Multi dimension float realList [10][10]; int numTable [3][4][5]; All elements of same type Elements accessed by name and [ ] operator numList[5] name, offset, and dereference *(numlist + 5) Name of the array is a pointer constant
Arrays Arrays as parameters Note you must specify number of elements used Arrays as parameters Formal parameter void doIt (int list[ ], int count); / or void toIt (int *list, int count); Actual parameter doit (numList, numUsed); Same call for either style of parameter list declaration
Problems with C-Style Arrays Capacity cannot change. Solution 1 (non-OOP) Use a "run-time array" Construct B to have required capacity Copy elements of A into B Deallocate A Solution 2 (OOP) Use vector Later
Problems with C-Style Arrays Virtually no predefined operations for non-char arrays. The Deeper Problem: C-style arrays aren't self-contained.
Basic Principle of OOP: An object should be autonomous (self-contained) Should carry within itself all of the information needed to describe and operate upon itself.
Aggregate Data Types Predefined types not always adequate to model the problem When objects have multiple attributes When objects have collections of heterogeneous elements C++ provides structs and classes Create new types with multiple attributes
Structures Characteristics has a fixed size is ordered elements may be of different size direct access of elements by name (not index) struct Date { int month, day, year; char dayOfWeek [12]; };
FAQs about Structures structs can be nested (can contain struct objects) Access members with name of struct object dot (member selector operator) . name of struct member Date today = { 3, 4, 2005, "Tuesday"); cout << today.month;
A commercial for OOP: Two programming paradigms Procedural: ( C, FORTRAN, and Pascal ) Action-oriented — concentrates on the verbs Programmers: Identify basic tasks to solve problem Implement actions to do tasks as subprograms (procedures/functions/ subroutines) Group subprograms into programs/modules/libraries, together make up a complete system for solving the problem Object-oriented: ( C++, Java, and Smalltalk) Focuses on the nouns of problem specification Programmers: Determine objects needed for problem Determine how they should work together to solve the problem. Create types called classes made up of data members function members to operate on the data. Instances of a type (class) called objects.