91.102 - Computing II Modularity, Information Hiding and Abstract Data Types. Difficulty: Programs that solve “Real World” problems can get very large.

91.102 - Computing II Modularity, Information Hiding and Abstract Data Types. Difficulty: Programs that solve “Real World” problems can get very large - up to millions of lines of code. Nobody can understand or remember that many lines of code. Problem: How do we make it possible for normal human beings to contribute to such large endeavors? Note: I did not say “How do we remove the difficulty”, because we do not KNOW how to really remove it. All we have is some techniques that help us get close enough to have - much of the time - usable programs within feasible budgets and time requirements.

91.102 - Computing II There are two major ideas: 1) Break the task up into well-defined pieces; 2) Hide the implementation details of the pieces whenever possible. What do they imply?

The first idea requires that we find how to “glue together” the pieces, after we have decided on what the pieces are: the finished “small products” must be put together in such a way that the “big product” can be delivered. The second idea allows us to recover from occasional bad implementation decisions without having to start all over again. A side benefit is that good implementations might be reusable by other projects, cutting their costs down. 91.102 - Computing II

What is the mechanism we provide to implement the policies just described? The MODULE. This consists, physically, of two files: Interface File. Implementation File. The Interface File provides the PUBLIC information, i.e. the information a USER needs to make use of the functionality. The Implementation File contains the PRIVATE information, i.e. the implementation of the functionality advertised in the Interface File.

91.102 - Computing II Implementation Private Information Code File Interface Public Information Header File User Program Or Other Module

91.102 - Computing II What IS a Module? A set of declarations placed into service inside a program. (A quote from the text…) In somewhat more detail, a module is a unit of organization of a software system that A) packages together a collection of entities (data and operations) that provide a set of capabilities useful in the solution of a class of problems; B) carefully controls what external users of the module can see and use.

91.102 - Computing II A common characteristic of Modules through all the languages that support this mechanism is Separate Compilation. This simply means that collections of functions and data can be compiled independently of one another and of any program that may use them. Any changes to the user program, or to any of the collections require only recompilation of a minimal number of modules. This may not be important with programs of a few hundred or a few thousand lines, but is crucial to the development of software that takes hundreds of thousands or millions of lines to deliver.

91.102 - Computing II How we create a Module: ModuleInterface.h A file that contains all the entities that must be visible to the user of the module: any constants, type definitions, variable definitions, and functions (i.e., function prototypes) that the user’s program is allowed to have explicit access to, to use or modify - depending on the entity. ModuleImplementation.c A file that contains all the private entities: the implementation code for the functions, and all those constants, variables and functions which the module user may NOT have direct access to.

#include "ModuleInterface.h" Implementation Private Information Code File ModuleInterface.h #include "ModuleInterface.h" User program 91.102 - Computing II

How we use a Module: The USER PROGRAM must request the interface file via an include directive. Example: #include /* include system file */ /* …. Other system inclusions …. */ #include“ModuleInterface.h” /* include non-system module */ /* …. Other modules …. */ /* …. User Program …. */

91.102 - Computing II An Example: Priority Queues. A priority queue is a finite collection of items of the same type, each associated with a number called “the priority of the item”, for which the following operations are defined: 1) Initialization - returns an empty PQ. 2) Check for Empty - is the PQ empty or not? 3) Check for Full - is there any more room in the PQ? 4) Insert a new item into an existing (not Full) PQ. 5) If the PQ is not Empty, remove from it an item X of highest priority.

91.102 - Computing II Notice that we have said NOTHING YET about implementation: the Priority Queue is an ABSTRACT DATA TYPE. The implementor will have to decide on implementation details. A good “user interface” (Module Interface) - and a good design - will make the implementation details invisible to the user.

What are Priority Queues used for? Every time you run a task on a computer, the task ends up on some kind of priority queue. The Operating System manages a number of such queues, to provide all kinds of services: printing your output, scheduling your task to be run, saving your files to disk, etc.. Any time you attempt to use a “shared resource”, you end up on some kind of PQ, waiting for your turn… So they are important and they are very common. 91.102 - Computing II

Remember the list functions: Head, Tail and Cons ? They allowed us to construct a list starting with an empty list and items from some universe. The construction added one item at a time. We could find the item at head of the list by just applying the function Head to the list; we could find the remainder of the list by applying the function Tail. In particular: Tail(Cons(&info, L)) = L Head(Cons(&info, L)) = &info And Cons(Head(L), Tail(L)) returns a list with the same contents as L.

91.102 - Computing II We could, by analogy, introduce functions PQCons, PQHead and PQTail. What should they do? Let PQ be a variable pointing to the Priority Queue PQHead(PQ) = &highestPriorityItemInPQ PQTail(PQ) = &aPriorityQueue which contains all the items in PQ EXCEPT FOR the highest priority one. Cons(&info, PQ) = PQ', (the address of) a new priority queue, containing all the old elements plus the new one.

The main thing to observe is that the order of insertion is unrelated to the order of extraction. The order of extraction depends on a property of the information field which we call the PRIORITY. 91.102 - Computing II

You may observe that a LIST - as defined by Head, Tail and Cons - is just a priority queue in which the latest item inserted has priority higher than that of any item already in the priority queue... There are a number of reasons why this is NOT the most convenient way to look at Priority Queues - we will now turn to a slightly more conventional approach. We also interested in constructing MODULES rather than just showing how to write a few functions...

91.102 - Computing II An immediate problem is: who decides the type of the objects that will make up the priority queue? It should be obvious from the previous discussions that NO module designer can cover ALL the possible types of objects that could be put into a priority queue - i.e., for which the notion of priority could make sense. This decision must be made by the application programmer who is the module user: only she can know what she is trying to prioritize… PQInterface.h needs to contain an “include directive” to provide the necessary type definitions : this is true for C - it need not be true for other languages.

91.102 - Computing II // PQInterface.h #include “PQUserTypes.h” // defines PQItem // See next two slides for choices for here... extern void PQInitialize(PriorityQueue *); //init empty extern bool PQEmpty(PriorityQueue *); // check if empty extern bool PQFull(PriorityQueue *); // check if full Extern intPQSize(PriorityQueue *); // how many items extern void PQInsert(PQItem, PriorityQueue *);// insert extern PQItem PQRemove(PriorityQueue *);//remove highest This tells the user what the “user interface” is: the functions, the types of objects expected as function parameters, and the types of objects returned by the functions.

91.102 - Computing II PQInterface.h - Linked List Implementation. typedefstructPQNodeTag { PQItemNodeItem; structPQNodeTag*Link; }PQListNode; typedefstruct{ intCount; PQListNode*ItemList; }PriorityQueue;

91.102 - Computing II PQInterface.h - Array Implementation. typedefPQItemPQArray[MAXCOUNT]; typedefstruct{ intCount; PQArrayItemArray; }PriorityQueue;

91.102 - Computing II PQUserTypes.h - User decision on type of objects in Priority Queue and maximum size of queue expected.. #defineMAXCOUNT10//Just 10? typedefintPQItem;//User defined /* this is where we stop the explicit user knowledge*/ Although YOU don’t need to know any more, your program - actually the compiler compiling your program - does. This is why more detail about the representation is available in the other header file (interface part of the module).

91.102 - Computing II Example of use: sorting an array. //Define the types typedef intPQItem; typedef PQItem SortingArray[10]; //Declare the array SortingArray A; //Define the sorting function voidPriorityQueueSort(SortingArray A) {int I; PriorityQueue PQ; Initialize(&PQ); for(i = 0; i < 10; ++i) PQInsert(A[i], &PQ); for(i = 9; i >= 0; --i) A[i] = PQRemove(&PQ); }

91.102 - Computing II Problem: how can we perform comparisons if we don’t know WHAT kind of items we need to compare and HOW we can compare them? The Priority Queue, AS GIVEN, does not require the definition of a user provided comparison function, and does not accept such a function as a parameter to either the Initialize, Insert or Remove functions. It appears to require such a comparison as a “built in”, which would seriously limit the generality of the Module. We leave this with the statement that it can be done within C - we won’t pursue this at this point, because it would further complicate our discussion.

What are the trade-offs? 1) The linked-list implementation uses only the space it needs, while the array one must allocate all of its space at the beginning. 2) The Sorted linked list implementation is less efficient than the Unsorted array one at inserting an item. 3) The Sorted - or Unsorted - array implementation is less efficient than the Sorted linked list one at removing an item. 91.102 - Computing II

The idea of modularization can be extended to what is called Work Breakdown Structure : the division of a software project into subprojects, tasks, subtasks, deliverables, etc. The example given by the text is that of the design and implementation of a simple calculator.

In this case, the decisions are “fairly simple”: there is a reasonably clear “user interface module” and a reasonably clear “computation module”. The functions of the two are easily separable. The interface BETWEEN the two can consist of strings of characters: the user interface sends the string containing an expression to the compute engine, which determines the legality of the expression, translates from character form to one suitable for arithmetic, performs the arithmetic, translates the result into character form and returns the result string, to be displayed. 91.102 - Computing II

/*CalculatorModuleInterface.h */ char *Expression, *Value; extern void InitializeAndDisplayCalculator(void); extern void GetAndProcessOneEvent(void); extern int UserSubmittedAnExpression(void); extern void Display(char *); extern int UserWantsToQuit(void); extern void ShutDown(void); /*YourCalculatorModuleInterface.h */ extern char *Evaluate(char *);

#include #include “CalculatorModuleInterface.h” #include “YourCalculatorMOduleInterface.h” int main(void) { InitializeAndDisplayCalculator(); do { GetAndProcessOneEvent(); if (USerSubmittedAnExpression()){ Value = Evaluate(Expression); Display(Value); } } while (!UserWantsToQuit()); ShutDown(); } 91.102 - Computing II

More Ideas about Information Hiding and Modularization. We cannot "really" implement "Abstract Data Types" - since our implementing them requires we make representational decisions based on multiple considerations. A reasonable question is : How close can we get to a representation independent notation so that our "approximation" to an abstract data type is as good as we can manage?

91.102 - Computing II Example: Three Implementations of LINKED LISTS. First Implementation: based on C pointers as links: LItemLink ItemLinkItem Second Implementation: array of Node (Info and Link) structs: x1x3x4x2 125 0 1 2 3 4 5 6 7 8 9 L = 0.Item.Link

91.102 - Computing II Third Implementation: array of Info AND array of Link x1x3x4x2 125 0 1 2 3 4 5 6 7 8 9 L = 0 Item Link In the second and third implementation, the Link is just an integer used to index into the array.

91.102 - Computing II Some of these implementations involve pointer variables, some involve structs, some involve integers. How can we design an interface that will work THE SAME WAY regardless of the underlying implementation? 1) It can’t explicitly deal with the underlying pointer variables; 2) It can’t explicitly deal with the underlying structs; 3) It can’t explicitly deal with the underlying arrays.

4) We assume the Item field can be managed as a struct; (early FORTRAN had no structs, so the multiple fields of a struct would have required one full array each) 5) We have to introduce a NULL that can be made meaningful in all three representations: call it null (lower case - no conflict with the built-in) and be prepared to initialize it to the correct value for each implementation. 91.102 - Computing II

The textbook provides a set of functions, all ready for us. Question: how did the author get those functions? Miraculous inspiration? The author is SO experienced that he was able to figure them out by just having the problem presented? Neither alternative: pick a function that you think is fairly representative of the kind of operation you will need to support, code it in all three representations and see what THAT tells you. If that’s not enough, pick a function that will require manipulation of most of the representation features you missed on the first pass and try again. If you need several successive tries, so be it...

91.102 - Computing II A Reverse for Normal Linked Lists. Use a function where the new list is returned in the single parameter passed by reference, since this is the accepted way to manage two-way communication via the parameter list. L is a NodePointer which could be either a true pointer or an integer used as an index into the array.

void Reverse(NodePointer *L) /* L is the address of a pointer to a Node */ {NodePointer R, N; R = null;// NULL in this case while(*L != null) { // there is something N = *L;// save it *L = (*L)->Link;// find the next one N->Link = R;// re-link the saved one R = N; // new head of part-reversed list } *L = R; // new head of reversed list } 91.102 - Computing II LItemLink ItemLinkItem

91.102 - Computing II A Reverse for Linked Lists as arrays of struct. The NodePointer is just an integer; the array index of the next structure. We need something like: NodeListMemory[MAXPOINTER];// for the array of nodes

void Reverse(NodePointer *L) /* L is the address of a pointer to a Node */ {NodePointer R, N; R = null; // -1 in this case while(*L != null) { // there is something N = *L; // save it *L = ListMemory[*L].Link; // next one ListMemory[N].Link = R;// re-link saved one R = N; // new head of part-reversed list } *L = R;// new head of reversed list } 91.102 - Computing II x1x3x4x2 125 0 1 2 3 4 5 6 7 8 9 L = 0.Item.Link

91.102 - Computing II A Reverse for Linked Lists as double arrays. The NodePointer is just an integer. We also need something like: ListItemItem[MAXPOINTER];// for the array of items NodePointerLink[MAXPOINTER];// for the array of links

void Reverse(NodePointer *L) /* L is the address of a pointer to a Node */ {NodePointer R, N; R = null; // -1 in this case while(*L != null) { // there is something N = *L;// save it *L = Link[*L]; // find the next one Link[N] = R;/ re-link the saved one R = N; // new head of part-reversed list } *L = R;// new head of reversed list } 91.102 - Computing II x1x3x4x2 125 0 1 2 3 4 5 6 7 8 9 L = 0 Item Link

91.102 - Computing II Reverse differs in only two lines from definition to definition: Normal Linked Lists: *L = (*L)->Link;// get the next one N->Link = R;// re-link the saved one Array of Nodes: *L = ListMemory[*L].Link; // get the next one ListMemory[N].Link = R;// re-link the saved one Double Array: *L = Link[*L];// get the next one Link[N] = R;// re-link the saved one

91.102 - Computing II They differ in the syntax for GETTING the value of the link and for SETTING the value of a link. Those two operations are candidates for “hiding”: introduce intermediate functions that hide the details. Getting the Link: NodePointer GetLink(NodePointer N) {return(N->Link);} // normal linked lists NodePointer GetLink(NodePointer N) {return(ListMemory[N].Link);} // arrays of struct NodePointer GetLink(NodePointer N) {return(Link[N]);} // double arrays

91.102 - Computing II Setting the Link: void SetLink(NodePointer N, NodePointer L) {N->Link = L;} /* normal linked lists */ void SetLink(NodePointer N, NodePointer L) { ListMemory[N].Link = R;} /* arrays of struct */ void SetLink(NodePointer N, NodePointer L) { Link[N] = L; } /* double arrays */

91.102 - Computing II Another area of potential problems is in the allocation and deallocation of nodes. In the Normal Linked List version we could define (in preparation of the fact that ALL implementations will need the same function calls): void AllocateNewNode(NodePointer *N); {*N = (NodePointer)malloc(sizeof(Node));} void FreeNode(NodePointer N) {free(N);} /* no safety - YOU set to null */ Where the OS takes care of managing space...

91.102 - Computing II When lists are implemented via arrays, we must keep track of which array elements are in use and which are free. NodePointer Avail;// points to the head of the FREE LIST void AllocateNewNode(NodePointer *N); {*N = Avail;/* for arrays of struct */ Avail = ListMemory[Avail].Link; } void AllocateNewNode(NodePointer *N); {*N = Avail;/* for double arrays */ Avail = Link[Avail]; }

91.102 - Computing II Unfortunately, this requires an initialization: Avail = 0; for(I = 0; I < MAXPOINTER - 1; I++) ListMemory[I].Link = I + 1; ListMemory[MAXPOINTER - 1].Link = null; Or for(I = 0; I < MAXPOINTER - 1; I++) Link [I] = I + 1; Link [MAXPOINTER - 1] = null;

91.102 - Computing II 1234576null98 0 1 2 3 4 5 6 7 8 9.Item.Link Avail = 0; /* and all the nodes are empty */ 1234576null98 0 1 2 3 4 5 6 7 8 9 Item Link Calls to AllocateNewNode will simply return the first free node.

91.102 - Computing II Now to FREE nodes: attach the node to the current head of the FREE list and update Avail. void FreeNode(NodePointer N) {ListMemory[N].Link = Avail; Avail = N; } Or: void FreeNode(NodePointer N) {Link [N] = Avail; Avail = N; }

We are, essentially, done: a bit more cleanup, a few more functions, and we have a successful interface. The header files must contain some of the information about all this, but all the user needs to do is include the correct header files and the program will run correctly, regardless of the underlying list implementation. The user must provide definitions for ItemType (only the user knows what will be put into the lists) MAXPOINTER (in the case of the array implementations - again, only the user will have any idea of how big the lists can become) 91.102 - Computing II

Header File for Normal Lists: typedef ItemTypeListItem; typedef structNodeTag { ListItemItem; structNodeTag*Link; }Node; typedefNode*NodePointer; LItemLink ItemLinkItem

Header File for Parallel Arrays: typedefintNodePointer; typedefItemTypeListItem; NodePointerAvail; ListItemItem[MAXPOINTER]; NodePointerLink[MAXPOINTER]; 91.102 - Computing II x1x3x4x2 125 L = 0 Item Link

91.102 - Computing II Header File for Array of node struct: typedefintNodePointer; typedefItemTypeListItem; typedefstruct { ListItemItem; NodePointerLink; }Node; NodePointerAvail; NodeListMemory[MAXPOINTER]; x1x3x4x2 125 0 1 2 3 4 5 6 7 8 9 L = 0.Item.Link

91.102 - Computing II We now have (let's imagine we have completed all the work) three distinct implementations of lists. How can we use them? First of all, we need to set up the header file where ItemType is defined - without it, there is not much we can do. We also need to define functions that are specific to the exact ItemType we need: reading functions, printing functions, assignments, etc… Those will be made available as an Implementation File.

91.102 - Computing II Header File for items of type int: ItemInterface.h typedefintItemType; extern voidPrintItem(ItemType *); externvoidAssignItem(ItemType *, ItemType); Implementation File for items of type int: ItemImplementation.c #include #include"ItemInterface.h" voidPrintItem(ItemType *i) {printf("%d", *i);} voidAssignItem(ItemType *Left, ItemType Right) {*Left= Right;}

91.102 - Computing II Header File for items of type AirportCode: ItemInterface.h typedefcharItemType[4]; extern voidPrintItem(ItemType *); externvoidAssignItem(ItemType *, ItemType); Implementation File for AirportCodes: ItemImplementation.c #include #include"ItemInterface.h" voidPrintItem(ItemType *i) {printf("%s", *i);} voidAssignItem(ItemType *Left, ItemType Right) {*Left= Right;} // Careful - what does this do??!!

91.102 - Computing II Types and Specialized Functions: int Types and Specialized Functions: AirportCodes Parallel ArraysArrays of StructsLinked Lists Main Program

91.102 - Computing II Abstraction: Procedural Abstraction - or how to replace a (long) sequence of operations (the HOW) by a NAME and an interface. This is embodied in the idea of FUNCTION with a well defined parameter list and a well defined return type. Data Abstraction - or how to separate the details of the WHAT from the details of the HOW while hiding both. This is embodied in the idea of Abstract Data Type where details of representation AND manipulation are hidden from the user.

91.102 - Computing II The combination of these two ideas - extended to the best of our understanding - has provided many of the tools that allow us to manage the design and construction of today's large programs. They allow us to replace "spaghetti bowl" programs, where almost every item depends on some other item, with clean, hierarchically structured programs where the dependencies are minimized and the interfaces among modules are well specified. The latter claim is only approximated in reality: it is the ATTEMPT at approximating it, and the considerable efforts expended towards the approximation, that makes large programs (e.g.: 11.5 M lines of Windows 95) possible at all.

91.102 - Computing II Modularity, Information Hiding and Abstract Data Types. Difficulty: Programs that solve “Real World” problems can get very large.

Similar presentations

Presentation on theme: "91.102 - Computing II Modularity, Information Hiding and Abstract Data Types. Difficulty: Programs that solve “Real World” problems can get very large."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

91.102 - Computing II Modularity, Information Hiding and Abstract Data Types. Difficulty: Programs that solve “Real World” problems can get very large.

Similar presentations

Presentation on theme: "91.102 - Computing II Modularity, Information Hiding and Abstract Data Types. Difficulty: Programs that solve “Real World” problems can get very large."— Presentation transcript:

Similar presentations

About project

Feedback