Data Structure & Abstract Data Type C and Data Structures Baojian Hua
Data Types A data type consists of: A collection of data elements (a type) A set of operations on these data elements Data types in languages: predefined: any language defines a group of predefined data types (In C) int, char, float, double, … user-defined: allow programmers to define their own (new) data types (In C) structure, union, …
Data Type Examples Predefined: type: int elements: …, -2, -1, 0, 1, 2, … operations: +, -, *, /, %, … User-defined: type: complex elements: 1+3i, -5+8i, … operations: newComplex, add, sub, distance, …
Concrete Data Types (CDT) An concrete data type: both data type declarations and concrete representations are available Almost all C predefined types are CDT For instance, “ int ” is a 32-bit double-word, and +, -, …
Abstract Data Types (ADT) An abstract data type: separates data type declaration from representation separates function declaration (prototypes) from implementation Example of abstract data types in languages interfaces in Java signatures in ML (roughly) header files & typedef in C
Data Structures Data structure studies the organization of data in computers, consisting of the (abstract) data types (definition and repr ’ ) relationship between elements of this type operations on data types Algorithms: operations on data structures tradeoffs: efficiency and simplicity, etc. subtle interplay with data structure design Slogan: program = data structures+algorithm
What will this part cover? Linear structures: Linked list, stack, queue, extensible array, descriptor-based string Tree & forest: binary tree, binary search tree Graph Hash Searching
More on Modules, CDT and ADT Suppose we need a data type to represent complex number c: a data type “ complex ” elements: 3+4i, -5-8i, … operations: newComplex, add, sub, distance, … How to represent this data type in C (CDT, ADT or … )?
Complex Number // Recall the definition of a complex number c: c = x + yi, where x,y \in R, and i=sqrt(-1); // Some typical operations: complex newComplex (double x, double y); complex complexAdd (complex c1, complex c2); complex complexSub (complex c1, complex c2); complex complexMult (complex c1, complex c2); complex complexDistance (complex c1, complex c2); complex complexModus (complex c1, complex c2); complex complexDivide (complex c1, complex c2); // Next, we’d discuss several variants of rep’s: // CDT, ADT.
CDT of Complex: Interface — Types // In a file “complex.h”: #ifndef COMPLEX_H #define COMPLEX_H struct complexStruct { double x; double y; }; typedef struct complexStruct complex; complex newComplex (double x, double y); // other function prototypes are similar … #endif
Client Code // With this interface, we can write client codes // that manipulate complex numbers. File “main.c”: #include “complex.h” int main () { complex c1, c2, c3; c1 = newComplex (3.0, 4.0); c2 = newComplex (7.0, 6.0); c3 = complexAdd (c1, c2); complexOutput (c3); return 0; } Do we know c1, c2, c3’s concrete representation? How?
CDT Complex: Implementation // In a file “complex.c”: #include “complex.h” complex newComplex (double x, double y) { complex c; c.x = x; c.y = y; return c; } // other functions are similar. See Lab2
Problem #1 int main () { complex c; c = newComplex (3.0, 4.0); // Want to do this: c = c + (5+i6); // Ooooops, this is legal: c.x += 5; c.y += 6; return 0; }
Problem #2 #ifndef COMPLEX_H #define COMPLEX_H struct complexStruct { // change to a more fancy one? Anger “main”… double a[2]; }; typedef struct complexStruct complex; complex newComplex (double x, double y); // other function prototypes are similar … #endif
Problems with CDT? Operations are transparent. user code have no idea of the algorithm Good! Data representations dependence Problem #1: User code can access data directly kick away the interface safe? Problem #2: make code rigid easy to change or evolve?
ADT of Complex: Interface — Types // In file “complex.h”: #ifndef COMPLEX_H #define COMPLEX_H // note that “struct complexStruct” not given typedef struct complexStruct *complex; complex newComplex (double x, double y); // other function prototypes are similar … #endif
Client Code // With this interface, we can write client codes // that manipulate complex numbers. File “main.c”: #include “complex.h” int main () { complex c1, c2, c3; c1 = newComplex (3.0, 4.0); c2 = newComplex (7.0, 6.0); c3 = complexAdd (c1, c2); complexOutput (c3); return 0; } Can we still know c1, c2, c3’s concrete representation? Why?
ADT Complex: Implementation#1 — Types // In a file “complex.c”: #include “complex.h” // We may choose to define complex type as: struct complexStruct { double x; double y; }; // which is hidden in implementation.
ADT Complex: Implementation Continued // In a file “complex.c”: #include “complex.h” complex newComplex (double x, double y) { complex c; c = (complex)malloc (sizeof (*c)); c->x = x; c->y = y; return c; } // other functions are similar. See Lab2
ADT Summary Yes, that ’ s ADT! Algorithm is hidden Data representation is hidden user code may never access it thus, client code independent of the impl ’ See Lab2 for another data type “ nat ” CDT or ADT
Polymorphism To explain polymorphism, we start with a new data type “ tuple ” A tuple is of the form: (x, y) x A, y B (aka: A*B) A, B unknown in advance and may be different Example: A=int, B=int: (2, 3), (4, 6), (9, 7), … A=char *, B=double: ( “ Bob ”, 145.8), ( “ Alice ”, 90.5), …
Polymorphism From the data type point of view, two types: A, B operations: newTuple (x, y);// create a new tuple with x and y equals (t1, t2); // equality testing first (t); // get the first element of t second (t); // get the second element of t … How to represent this type in computers (using C)?
Monomorphic Version Next, we first consider a monomorphic tuple type called “ intTuple ” : both the first and second components are of “ int ” type (2, 3), (8, 9), … The intTuple ADT: type: intTuple elements: (2, 3), (8, 9), … Operations: tuple newNatTuple (int x, int y); int first (int t); int second (tuple t); int equals (tuple t1, tuple t2); …
“ intTuple ” CDT // in a file “intTuple.h” #ifndef INT_TUPLE_H #define INT_TUPLE_H struct intTupleStruct { int x; int y; }; typedef struct intTupleStruct intTuple; intTuple newIntTuple (int n1, int n2); int first (intTuple t); … #endif
Or the “ intTuple ” ADT // in a file “intTuple.h” #ifndef INT_TUPLE_H #define INT_TUPLE_H typedef struct intTupleStruct *intTuple; intTuple newIntTuple (int n1, int n2); int first (intTuple t); int tupleEquals (intTuple t1, intTuple t2); … #endif // We only discuss “tupleEquals ()”. All others // functions left to you.
tupleEquals() // in a file “intTuple.c” int tupleEquals (intTuple t1, intTuple t2) { return ((t1->x == t2->x) && (t1->y==t2->y)); } x y t1 x y t2
Polymorphism Now, we consider a polymorphic tuple type called “ tuple ” : “ poly ” : may take various forms Every element of tuple may be of different types (2, 3.14), ( “ 8 ”, ‘ a ’ ), ( ‘ \0 ’, 99), … The “ tuple ” ADT: type: tuple elements: (2, 3.14), ( “ 8 ”, ‘ a ’ ), ( ‘ \0 ’, 99), …
The Tuple ADT What about operations? tuple newTuple (??? x, ??? y); ??? first (tuple t); ??? second (tuple t); int equals (tuple t1, tuple t2); …
Polymorphic Type To cure this, C offers a polymorphic type “ void * ” “ void * ” is a pointer which can point to “ any ” concrete types (i.e., it ’ s compatible with any pointer type), very poly … think a box or a mask can not be used directly, use ugly cast similar to constructs in others language, such as “ Object ”
The Tuple ADT What about operations? tuple newTuple (void *x, void *y); void *first (tuple t); void *second (tuple t); int equals (tuple t1, tuple t2); …
“ tuple ” Interface // in a file “tuple.h” #ifndef TUPLE_H #define TUPLE_H typedef void *poly; typedef struct tupleStruct *tuple; tuple newTuple (poly x, poly y); poly first (tuple t); poly second (tuple t); int equals (tuple t1, tuple t2); #endif TUPLE_H
Client Code // in a file “main.c” #include “complex.h” // need the ADT version #include “tuple.h” int main () { complex c1 = newComplex (1.0, 2.0); int *ip = (int *)malloc (sizeof (*i)); tuple t1 = newTuple (c1, ip); return 0; }
“ tuple ” ADT Implementation // in a file “tuple.c” #include #include “tuple.h” struct tupleStruct { poly x; poly y; }; tuple newTuple (poly x, poly y) { tuple t = (tuple)malloc (sizeof (*t)); t->x = x; t->y = y; return t; } x y t
“ tuple ” ADT Implementation // in a file “tuple.c” #include #include “tuple.h” struct tuple { poly x; poly y; }; poly first (tuple t) { return t->x; } x y t
Client Code #include “complex.h” // ADT version #include “tuple.h” int main () { complex c1 = newComplex (1.0, 2.0); int *ip = (int *)malloc (sizeof (*i)); tuple t1 = newTuple (c1, ip); complex c2 = (complex)first (t1); // type cast return 0; }
“ equals ” ? struct tupleStruct { poly x; poly y; }; // The #1 try: int equals (tuple t1, tuple t2) { return ((t1->x == t2->x) && (t1->y == t2->y)); // Wrong!! }
“ equals ” ? struct tuple { poly x; poly y; }; // The #2 try: int equals (tuple t1, tuple t2) { return (*(t1->x) == *(t2->x) && *(t1->y) == *(t2->y)); // Problem? }
“ equals ” ? struct tuple { poly x; poly y; }; // The #3 try: int equals (tuple t1, tuple t2) { return (equalsXXX (t1->x, t2->x) && equalsYYY (t1->y, t2->y)); // but what are “equalsXXX” and “equalsYYY”? }
Function as Arguments // So in the body of “equals” function, instead // of guessing the types of t->x and t->y, we // require the callers of “equals” supply the // necessary equality testing functions. // The #4 try: typedef int (*tf)(poly, poly); int equals (tuple t1, tuple t2, tf eqx, tf eqy) { return (eqx (t1->x, t2->x) && eqy (t1->y, t2->y)); }
Change to “ tuple ” Interface // in file “tuple.h” #ifndef TUPLE_H #define TUPLE_H typedef void *poly; typedef int (*tf)(poly, poly); typedef struct tuple *tuple; tuple newTuple (poly x, poly y); poly first (tuple t); poly second (tuple t); int equals (tuple t1, tuple t2, tf eqx, tf eqy); #endif TUPLE_H
Client Code // in file “main.c” #include “complex.h” #include “tuple.h” int main () { complex c = newComplex (1.0, 2.0); int *ip = (int *)malloc (sizeof (int)); tuple t1 = …, t2 = …; equals (t1, t2, complexEquals, intEquals); return 0; }
Moral void* serves as polymorphic type in C mask all pointer types (think Object type in Java) Pros: code reuse: write once, used in arbitrary context we ’ d see more examples later in this course Cons: Polymorphism doesn ’ t come for free boxed data: data heap-allocated (to cope with void *) no static or runtime checking (at least in C) clumsy code extra function pointer arguments
Data Carrying Functions Why we can NOT make use of data, such as passed as function arguments, when it ’ s of type “ void * ” ? Better idea: Make data carry functions themselves, instead of make external function calls such kind of data called objects
Function Pointer in Data int equals (tuple t1, tuple t2) { // note that if t1->x or t1->y has carried the // equality testing functions, then the code // could just be written: return (t1->x->equals (t1->x, t2->x) && t1->y->equals (t1->y, t2->y)); } equals …… equals_y x y t1 equals …… equals_x
Function Pointer in Data // To cope with this, we should modify other // modules. For instance, the “complex” ADT: struct complexStruct { int (*equals) (poly, poly); double a[2]; }; complex newComplex (double x, double y) { complex c = (complex)malloc (sizeof (*c)); c->equals = complexEquals; …; return n; } x n equals y
The Call int equals (tuple t1, tuple t2) { return (t1->x->equals (t1->x, t2->x) && t1->y->equals (t1->y,t2->y)); } a[0] equals a[0] t2 a[1] x y t1 x y
Client Code // in file “main.c” #include “complex.h” #include “tuple.h” int main () { complex c1 = newComplex (1.0, 2.0); complex c2 = newComplex (1.0, 2.0); tuple t1 = newTuple (c1, c2); tuple t2 = newTuple (c1, c2); equals (t1, t2); // dirty simple! :-P return 0; }
Object Data elements with function pointers is the simplest form of objects object = virtual functions + private data With such facilities, we can in principal model object oriented programming In fact, early C++ compilers compiles to C That ’ s partly why I don ’ t love object-oriented languages See Lab #2 for a more production-quality implementation of objects
Summary Abstract data types enable modular programming clear separation between interface and implementation interface and implementation should design and evolve together Polymorphism enables code reuse Object = data + function pointers