Data Abstraction Gang Qian Department of Computer Science University of Central Oklahoma.

Data Abstraction Gang Qian Department of Computer Science University of Central Oklahoma

Objectives Specification of Data Abstractions Implementation Issues Abstraction Function and Rep Invariant Designing Issues

Motivations of Data Abstraction Allows us to extend the programming language with new data types Allows us to focus on the behaviors of data objects rather than the implementation of data objects Incorporates abstractions both by parameterization and by specification  Abstraction by parameterization is achieved the same way as procedures  Abstraction by specification is achieved by making the operations part of the new data type

Implementation of a data type is mainly to select a storage representation for the data/objects  Without data abstraction, all programs that use the data type should be implemented based on the storage representation Not easy to modify  If we combine data types and operations, users only need to use the operations, without knowing the storage representation

Specification for Data Abstraction The focus of the specification is to explain the operations of a data type Our specification is based on class in Java, but the same idea can be used if a language employs a different mechanism Each class defines a type name and the follows:  Constructors  Instance methods (or methods) As opposed to static methods or procedure

Data Abstraction Specification Template /** OVERVIEW: A brief description of the behavior of the type’s objects goes here. Mutability. Bounded or not for collection types */ visibility class dname { /** specs for constructors */ /** specs for methods */ }

Notes:  dname is the class name  The visibility of most classes is public  The overview part of the specification describe the data abstraction in terms of “well-understood” concepts  All constructors and methods that appear in the specification should be public  Since constructors and methods are just special procedures, they use the same notation as stand-alone procedures REQUIRES, MODIFIES and EFFECTS Still need to be very careful about exceptions Usually no static method May use this to reference the current object in the specification

Example: IntSet /** OVERVIEW: IntSets are mutable, unbounded sets of integers. A typical IntSet is {x1,...,xn}. */ public class IntSet { /** EFFECTS: Constructor. Initializes this to be empty. */ public IntSet () /** MODIFIES: this EFFECTS: Adds x to the elements of this, i.e., this_post = this + { x }. */ public void insert (int x) (continued on next slide)

/** MODIFIES: this EFEECTS: Removes x from this, i.e., this_post = this — { x }. */ public void remove (int x) /** EFFECTS: If x is in this returns true else returns false. */ public boolean isIn (int x) /** EEEECTS: Returns the cardinality of this. */ public int size () /** EFFECTS: If this is empty, throws EmptyException; else returns an arbitrary element of this */ public int choose () throws EmptyException }

Note:  The object is referred to as this in the specification  Since a constructor always modifies this, we do not have to include a MODIFIES clause for it The modification is transparent to the user anyway  Mutator: methods that modifies this insert and remove Note the use of this_post  Observer: Return info about the state of the object  Method choose is underdetermined  EmptyException checked or unchecked?

 Method insert does not throw an exception if there is a duplicate int in the set Method remove has a similar situation It depends on the application May provide additional methods that can throw exceptions  insertNonDup and removeIfIn  The specification of IntSet requires that users know the mathematical concept of sets A problem with informal specification It is usually reasonable to expect such knowledge from the users If some concepts are not well-known, more descriptions and/or explanations are needed, including the use of examples, figures or other tools

Example: Poly /** OVERVIEW: Polys are immutable polynomials with integer coefficients. A typical Poly is c0 + c1x + c2x^2 +... */ public class Poly { /** EFFECTS: Constructor. Initializes this to be the zero polynomial. */ public Poly () /** EFFECTS: If n < 0 throws NegativeExponentException; else initializes this to be the Poly cx^n. */ public Poly (int c, int n) throws NegativeExponentException (continued on next slide)

/** EFFECTS: Returns the degree of this, i.e., the largest exponent with a non-zero coefficient. Returns 0 if this is the zero Poly. */ public int degree () /** EFFECTS: Returns the coefficient of the term of this whose exponent is d. */ public int coeff (int d) /** EFFECTS: If q is null throws NullPointerException; else returns the Poly this + q. */ public Poly add (Poly q) throws NullPointerException (continued on next slide)

/** EFFECTS: If q is null throws NullPointerException; else returns the Poly this * q. */ public Po]y mul (Poly q) throws NullPointerException /** EFFECTS: If q is null throws NullPointerException; else returns the Poly this — q. */ public Poly sub (Poly q) throws NullPointerException /** EFFECTS: Returns the Poly — this. */ public Poly minus () }

Note:  Poly is an immutable class There is no mutator methods  NegativeExponentException : checked or unchecked?

Using Data Abstractions Programs should be written solely based on the specification of the data abstraction  Implementation of the data abstraction should NOT be utilized by the using code

/** EFFECTS: if p is null throws NullPointerException; else returns the Poly obtained by differentiating p. */ public static Poly diff (Poly p) throws NullPointerException { Poly q = new Poly (); for (int i = 1; i <= p.degree( ); i++) q = q.add(new Poly(p.coeff(i) * i, i — 1)); return q; }

/** EFFECTS: if a is null throws NullPointerException; else returns a set containing an entry for each distinct element of a. */ public static IntSet getElements (int[] a) throws NullPointerException { IntSet s = new IntSet(); for (int i = 0; i < a.length; i++) s.insert(a[i]); return s; }

Implementing Data Abstraction Select a representation or rep to store the state of an object  In Java, rep is the set of instance variables in the class  E.g., you may use an array to implement a set. Then the array is the rep of the set object Constructors and methods of the object should operate based on the rep The rep should support all operations of the object  Usually a rep may not support all operations efficiently  Therefore, multiple implementation of the same data type may be needed

Example:  We may use ArrayList as the representation of IntSet objects The elements in an IntSet object can be stored in an ArrayList  We can choose between two representations: Allow duplicate element values of the set in ArrayList Let each element of the set occur exactly once in ArrayList  The 1 st way is better for insert  The 2 nd way is better for remove and isIn, since the array list is shorter If isIn is more frequently used than other operations, then the 2 nd way is favorable

Implementing Data Abstraction in Java A representation typically has a number of instance variables  The constructors and methods access and manipulate the instance variables From an implementation point of view, objects have both methods and instance variables However, as a data abstraction, instance variables should be invisible ( private ) to users  It is generally a bad idea to make instance variables public  Record data types are an exception E.g., LinkedListNode

Example: Implementing IntSet /** OVERVIEW: IntSets are unbounded, mutable sets of integers. A typical IntSet is {x1,...,xn}. */ public class IntSet { private Vector els; // the rep /** EFFECTS: Constructor. Initializes this to be empty. */ public IntSet () { els = new Vector (); } (continued on next slide)

/** MODIFIES: this EFFECTS: Adds x to the elements of this. */ public void insert (int x) { Integer y = x; if (getIndex(y) < 0) els.add(y); } /** MODIFIES: this EFFECTS: Removes x from this. */ public void remove (int x) { int i = getIndex(x); if (i < 0) return; els.set(i, els.lastElement( )); els.remove(els.size() - 1); } (continued on next slide)

/** EFFECTS: Returns true if x is in this; else returns false. */ public boolean isIn (int x) { return getIndex(x) > 0; } /** EFFECTS: If x is in this returns the index where x appears; else returns -1. */ private int getIndex (Integer x) { for (int i = 0; i < els.size(); i++) if (x.equals(els.get(i))) return i; return -1; } /** EFFECTS: Returns the cardinality of this. */ public int size () { return els.size(); } (continued on next slide)

/** EFFECTS: If this is empty throws EmptyException; else returns an arbitrary element of this. */ public int choose () throws EmptyException { if (els.size() == 0) throw new EmptyException( " IntSet.choose " ); return els.lastElement(); }

Note:  Why does getIndex not use exceptions?  Method insert guarantees the uniqueness of elements in the vector els This condition is essential to the implementation of methods size and remove  Implementation using an int array is ok but less favorable  Underdetermined method choose gets a determined implementation

Example: Implementing Poly Since Poly is immutable, array can be used as its rep (coefficient array)  The i th element of the array stores the coefficient of the i th exponent Make sense only if the poly is dense Example 1: Dense Example 2: Sparse 1001st element

Example: Implementing Poly  The zero Poly can be represented by either an empty array or a one-element array containing zero The latter is used in the implementation  For convenience, an instance variable is used to store the degree of the Poly

/** OVERVIEW: Polys are immutable polynomials with integer coefficients. A typical Poly is c0 + c1x + c2x^2 +... */ public class Poly { private int[] trms; private int deg; /** EFFECTS: Constructor. Initializes this to be the zero polynomial */ public Poly () { trms = new int[1]; deg = 0; } (continued on next slide)

/** EFFECTS: If n < 0 throws NegativeExponentException; else initializes this to be the Poly cx^n */ public Poly (int c, int n) throws NegativeExponentException { if (n < 0) throw new NegativeExponentException( " Poly(int, int) constructor " ); if (c == 0) { trms = new int[1]; deg = 0; return; } trms = new int[n + 1]; for (int i = 0; i < n; i++) trms[i] = 0; trms[n] = c; deg = n; } (continued on next slide)

/** EFFECTS: initialize this to be the poly 0x^n */ private Poly (int n) { trms = new int[n+1]; deg = n; } /** EFFECTS: Returns the degree of this, i.e., the largest exponent with a non-zero coefficient. Returns 0 if this is the zero Poly. */ public int degree () { return deg; } /** EFFECTS: Returns the coefficient of the term of this whose exponent is d */ public int coeff (int d) { if (d deg) return 0; else return trms[d]; } (continued on next slide)

/** EFFECTS: If q is null throws NullPointerException; else returns the Poly this - q */ public Poly sub (Poly q) throws NullPointerException { return add(q.minus()); } /** EFFECTS: Returns the Poly - this. */ public Poly minus () { Poly r = new Poly(deg); for (int i = 0; i < deg; i++) r.trms[i] = - trms[i]; // Note here return r; } //... // See textbook p. 92 and WebCT for complete code }

Records A record is a collection of fields  E.g., struct in C/C++  Java does not have struct. A class has to be used  Visibility of instance variables in a record class can be either public or package visible /** Overview: A record type */ class Pair { int coeff; int exp; Pair(int c, int n) { coeff = c; exp = n; } } No specification is needed for record other than to indicate that it is a record type

Additional Methods Class Object is the ancestor of all Java classes  Depending on the class, some methods of Object need to be overridden  We will discuss equals, clone and toString Method equals  Conceptually, two objects are equal if they are behaviorally equivalent Behaviorally equivalent: Two objects are undistinguishable by using any sequence of calls to the objects’ methods

 For mutable objects, all distinct objects are distinguishable Objects are equal only when they are the same object  Immutable objects with the same state are equivalent  The equals method of Object tests whether two objects are the same object For mutable objects, there is no need to override the equals method of Object The equals method of immutable objects should be overridden

 Similarity is a weaker equality notion Two objects are similar if they are not distinguishable by using any observers of their type If necessary, you may implement a similar method  similar and equals are the same for immutable objects  similar is weaker than equals for mutable types IntSet s = new IntSet(); IntSet t = new IntSet(); if (s.similar(t))...; else...;

Method clone  clone creates a new object that is a copy of the object on which clone is invoked  The clone method of Object assigns the instance variables of the old object to those of the new one It may create a sharing problem if any of the instance variables is a reference  E.g., IntSet, Poly (immutable, so acceptable)  The default implementation of clone is usually acceptable for immutable types If the default clone can be used, it can be inherited by putting implements Cloneable in the class header  clone needs to be implemented for mutable types

/** OVERVIEW:... */ public class Poly implements Cloneable { public boolean equals (Poly q) { // Optimized if (q == null || deg != q.deg) return false; for (int i = 0; i <= deg; i++) if (trms[i] != q.trms[i]) return false; return true; } public boolean equals (Object z) { if (!(z instanceof Poly)) return false; return equals((Poly) z); }  The definitions of the equals method can be deemed as a template The first equals method is overloaded, while the second one is the overriding method, since it has the same signature as that in class Object

/** OVERVIEW:... */ public class IntSet {... private IntSet (Vector v) { els = new Vector (); for (int i = 0; i < v.size(); i++) els.add(v.get(i)); } public Object clone ( ) { return new IntSet(els); }  Note: There is no specification for clone or equals since their meanings are well-understood

 CloneNotSupportedException is thrown if clone is called on an object that neither implements Cloneable nor declare its own clone method  The signature of clone of a subtype is identical to the signature of clone for Object Object clone(); A cast is needed when clone is used IntSet t = (IntSet) s.clone(); May not be so for user-defined generic classes  Discussed in Polymorphism

Method toString  Produces a string that represents the current state of its object and indicates its type E.g., IntSet: {1, 7, 3} E.g., Ploy: 2 + 3x + 5x^2  toString of Object only provides type name and its hash code  It is advisable to provide a customized toString method for each new type

 toString method for IntSet public String toString () { if (els.size() == 0) return "IntSet: { } "; String s = "IntSet: {" + els.elementAt(0); for (int i = 1; i < els.size(); i++) s = s + ", " + els.elementAt(i); return s + "} "; }

Aids to Understanding Implementations Abstraction function describes the implementer's choice of a particular representation for the data type  About how instance variable values are mapped to the state of the abstract object that they represent Rep invariant describes the common assumptions on which constructors and methods are implemented  It allows the implementation of each operation without worrying about those of the others

Abstraction function and rep invariant captures why the code is the way it is  E.g., choose and size of IntSet  They are valuable to both implementers and other readers of the code But not the user of the data abstraction  Note that they are NOT specification Written by implementers

Understand the application requirementsDesign and Write SpecificationUnderstand the specificationDecide the repWrite AF and RIImplement the constructors and methods Designer Implementer

Abstraction Function The implementation of a data abstraction decides a relationship between the rep and the abstract objects The relationship can be defined as a function called the abstraction function (AF)  It maps from the instance variables (rep of an object) to the abstract object being represented  E.g., the IntSet uses a vector els  Specifically, AF maps from the concrete values of the instance variables to the abstract state of the abstract object

The following example shows the mapping of concrete states of a real object to abstract states of the abstract object  E.g.: Vector : [1, 2] maps to an Integer set {1, 2} Apparently, AFs are often many-to-one mappings

AF should be described in a comment in the implementation of an abstract object Since informal specification is used, the range of an AF is not mathematically defined To overcome the problem, we give a description of a typical abstract object in the specification  E.g., in IntSet, we have “ A typical IntSet is {x1,...,xn} “  E.g., in Poly, we have “ A typical Poly is c0 + c1x + c2x^2 +... ”

Example: Based on the typical abstract IntSet object, we can write the AF for IntSet as follows: // The abstraction function is // AF(c) = { c.els[i] | 0 <= i < c.els.size }  The notation {x | p(x) } describes the set of all x such that the predicate p(x) is true  Note that convenient abbreviations are used c.els[i] stands for c.els.get(i) It is fine as long as the readers can clearly understand what it means

Note that you can also choose to write the abstraction function in plain English  E.g., AF of IntSet implementation can be “All elements in the rep els correspond to the elements in the abstract IntSet.”

Example: // A typical Poly is c0 + c1x + c2x^2 +... // The abstraction function is: // AF(c) = c0 + c1x + x2x^2 +... // where // ci = c.trms[i] if 0 <= i < c.trms.size // ci = 0 otheriwse  Or in plain English: The elements in the rep trms correspond to the coefficients of the polynomial object. The index of each element/coefficient in trms corresponds to the exponent of each term in the polynomial

You do not need to provide an abstraction function for a record type  A record type provides no abstraction over its rep –- both the real object and abstract object is a collection of fields that correspond to each other

Representation Invariant Not all syntactically correct values of instance variables are semantically correct to represent the state of the abstract object  E.g., if we do not allow duplicate values in Vector els of an IntSet. The els ’s containing duplicate values are not legitimate representations of the IntSet, although the compiler will accept it Representation (Rep) invariant is a statement of a property that all legitimate objects satisfy  A rep invariant is a predicate that is true of legitimate objects  If it is violated, then the object is corruptted

Example: for IntSet, we have: // The rep invariant is // I(c) = c.els != null && // for all int i, j, 0 < i, j < c.els.size && // i != j => c.els[i] != c.els[j]  The rep invariant is written using predicate calculus notation

Predicate calculus notation:  &&: and, conjunction  ||: or, disjunction  =>: implication  for all: universal quantifier  there exists: existential quantifier

You may also choose to write the rep invariant in an informal way using plain English Example: for IntSet, we have: // The rep invariant is: // I(c) = c.els != null && // there are no duplicates in c.els

Example: Consider an alternative representation of IntSet that consists of an array of 100 boolean values plus a Vector private boolean[] els = new boolean[100]; private Vector otherEls; private int sz;  Based on the above rep, if an integer i between 0 and 99 is in the set, we just set els[i] to be true  All integers > 99 are stored in otherEls  For efficiency purpose, we store the size of the set in sz  This will be an efficient rep if almost all integers that appear are between 0 and 99

 The Abstraction function: // The abstraction function is // AF(c) = { c.otherEls[i] | // 0 <= i < c.otherEls.size } // + { j | 0 <= j < 100 && c.els[j] }  The rep invariant: // The rep invariant is // I(c) = c.els != null && otherEls != null && // all elements in c.otherEls are not in the // range 0 to 99 && there are no duplicates in // c.otherEls && c.sz = c.otherEls.size + // (count of true entries in c.els)

 The Abstraction function in plain English: // The abstraction function: // The set of int in the IntSet are the union // of the indices of true elements in els // and all the elements in otherEls

 Note that sz is redundant  Whenever there is redundant information in the rep, the relationship of the redundant info to the rest of the rep should be explained in the rep invariant Example: Poly // The rep invariant is // I(c) = c.trms != null && c.trms.length >= 1 && // c.deg = c.trms.length- 1 && // c.deg > 0 => c.trms[deg] != 0

If all syntactically correct states of the concrete object are legal representations, we simply have: // The rep invariant is // I(c) = true  It is so for all record types Since using code can access the rep directly, there is no way for the implementation code of a record type to constrain it  Thus, rep invariants need not to be given for record types They must be given for all other types It helps the implementers and code readers

It is always possible that there is some strong relationship among the fields. That relationship should be expressed in the rep invariant of the using code  Assume that another type of rep is used for Poly : class Pair { int exp; int coeff; } /** Overview: A record type */ public class Poly { Pair[] trms; // only used for non-zero terms... }  Then we have rep invariant as follows: // for all elements e of c.trms, // e.exp >= 0 and e.coeff != 0

Implementing the Abstraction Function and Rep Invariant Besides providing the abstraction function and rep invariant as comments, you usually also provide methods to implement them  Not for record type The toString method is used to implement the abstraction function The method that checks the rep invariant is called repOk

repOk specification: /** EFFECTS: Returns true if the rep invariant holds for this; otherwise returns false */ public boolean repOk()  repOk is public so that using code can use it  Since the specification of repOk is clear and always the same, it is not necessary to write it

Examples // for Poly: public boolean repOk() { if (trms == null || deg != trms.length - 1 || trms.length == 0) return false; if (deg == 0) return true; return trms[deg] != 0; } //for IntSet public boolean repOk() { if (els == null) return false; for (int i = 0; i < els.size(); i++) { Integer x = els.get(i); for (int j = i + 1; j < els.size(); j++) if (x.equals(els.get(j))) return false; } return true; }

There are two ways to use repOk  Using code can call it to check the implementation  Call it inside constructors and methods that modify the rep Call right before they return  If repOk is costly, they can be disabled when the program is in production

Discussion  AF and RI are NOT the specification of the data abstraction. They are written by the implementers rather than designers  Rep invariant holds whenever an object is used outside its implementation It need not hold all the time in an object operation However, it must be true whenever the operations return to their callers  The abstraction function only makes sense when the rep invariant holds

 A rep invariant should express all constraints on which the object operations depend You may imagine that the operations are to be implemented by different people The rep invariant should be sufficient to support the scenario  When implementing a data abstraction, AF and RI should be completed before the implementation of any operation  All operations of the data abstraction should be implemented such that RI is preserved

Properties of Data Abstraction Implementation The rep of an immutable abstraction need not to be immutable  E.g., Poly Benevolent Side Effects are rep modifications that are not visible outside the implementation  Example: Change the order of the elements in elms in an IntSet  Example: Suppose rational numbers are represented as a pair of integers (fraction): int num, denom;

The abstraction function is: // A typical rational is n / d // The abstraction function is // AF(c) = c.num / c.denom Given the rep, there are several issues:  Zero denominator? No  Negative rational? Negative Numerator  Reduced form (no common term)? No Based on the decision, we have a rep invariant: // The rep invariant is // c.denom > 0

However, to test equality of two rationals, a reduced form is needed (common factors are removed)  See code in WebCT or on Page 110 in the textbook The equals method will compute the reduced form of two rationals first, before decide the equality The reduced form of rep replaces the original rep but the abstract object is the same  Benevolent side effect Benevolent side effects are often performed for efficiency reasons  They are possible whenever the abstraction function is many-to-one

Exposing the Rep  It is very important that the rep of a data abstraction cannot be modified outside its implementation  Even if all instance variables are declared private, it is still possible to expose the rep /** EFFECTS: Returns a vector containing the elements of this, each exactly once, in arbitrary order */ public Vector allEls () { return els; }

 Exposing the rep is an implementation error: A method returns a mutable object in the rep A constructor or method makes a mutable argument object part of the rep /** EFFECTS: If elms is null throws NullPointerException; else initializes this to contain all elements in elms */ public IntSet (Vector elms) throws NullPointerException { if (elms == null) throw new NullPointerException(“IntSet.IntSet(Vectors)”); els = elms; }

Design Issues Mutability  In general, a type should be immutable if its objects would naturally have unchanging values E.g., mathematical objects such as Poly, Rational, etc.  A type should usually be mutable if it is modeling something from the real world, where it is natural for values of objects to change overtime E.g., Employee, IntSet, etc.

 There is a trade-off between efficiency and safety Immutable abstractions are safer: no problem for sharing; Immutable abstractions are less efficient: objects may be created and discarded frequently  Mutability is a property of the type and not of its implementation Implementation should support the property

Operation Categories  Creators: Operations that create objects of their types from scratch All creators are constructors  Producers: Take objects of their type as inputs and create other objects of their type E.g., add of Poly  Mutators: Modifies objects of their type Only for mutable types  Observers: Take objects of their type as inputs and return results of other types

 Creators usually create some but not all objects of the data type E.g., Poly constructors only create single-term polynomials, while IntSet constructor only creates the empty set Other objects are created by producers or mutators  E.g., add of Poly, insert of IntSet, etc  Mutators play the same role in mutable types as what producers play in immutable ones  A mutable type can have producers as well as mutators E.g., clone of IntSet  Sometimes observers are combined with producers or mutators E.g., we may have a chooseAndRemove method for IntSet

Adequacy  A data type is adequate if it provides enough operations so that everything users need to do with its objects can be done both conveniently and with reasonable efficiency There is no precise definition of adequacy A not adequate type can be told  E.g., an IntSet without isIn

 A basic notion of adequacy can be obtained by considering the operation categories In general, for immutable types, it must have creators, observers and producers For mutable types, it must have creators, observers and mutators  A type must be fully populated Using its creators, mutators and producers, it must be possible to obtain every possible abstract object state  A type that is intended for general use must have a rich enough set of operations for its intended uses

 But do not offer irrelevant operations E.g., sum or sort for IntSet  If a type is adequate, its operations can be augmented by standalone procedures that are outside the type’s implementation (i.e., static methods of some other class)

Locality and Modifiability Revisited Locality: the ability to reason about a module by just looking at its specification  It requires that the rep be modified only within its type’s implementation Modifiability: the ability to re-implement an abstraction without having to modify any using code  All access to a rep must occur within its implementation

Data Abstraction Gang Qian Department of Computer Science University of Central Oklahoma.

Similar presentations

Presentation on theme: "Data Abstraction Gang Qian Department of Computer Science University of Central Oklahoma."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Abstraction Gang Qian Department of Computer Science University of Central Oklahoma.

Similar presentations

Presentation on theme: "Data Abstraction Gang Qian Department of Computer Science University of Central Oklahoma."— Presentation transcript:

Similar presentations

About project

Feedback