Automatic Extraction of Object-Oriented Component Interfaces John Whaley Michael C. Martin Monica S. Lam Computer Systems Laboratory Stanford University July 24, 2002
July 24, 2002ISSTA 2002Slide 2 Motivation Component programming is widespread. Interface specifications are important! Misunderstanding the API is a common source of error Misunderstanding the API is a common source of error Ideally, we want formal specifications. However, many components don’t have any specifications, formal or informal! However, many components don’t have any specifications, formal or informal! Our goal: automatic generation of interface specifications For large, object-oriented programs For large, object-oriented programs Partial specifications Partial specifications
July 24, 2002ISSTA 2002Slide 3 Why Automatic Extraction? Documentation Based on the actual code, so no divergence Based on the actual code, so no divergence Rules for static or dynamic checkers Find errors in API usage Find errors in API usage Find API bugs Discrepancy between code & intended API Discrepancy between code & intended API Dynamic extraction: Evaluation of test coverage Evaluation of test coverage
July 24, 2002ISSTA 2002Slide 4 Overview Component Model Product of Finite State Machines Product of Finite State Machines Static Analysis Dynamic Analysis and Checker Implemented for Java Analyzed >1 million lines of code Java class libraries Java class libraries Java 2 Enterprise Edition Java 2 Enterprise Edition Java network libraries Java network libraries joeq virtual machine joeq virtual machine
July 24, 2002ISSTA 2002Slide 5 Example: File Use a Finite State Machine (FSM) to express ordering constraints. openS TART read write close E ND
July 24, 2002ISSTA 2002Slide 6 A Simple OO Component Model Each object follows an FSM model. One state per method, plus S TART & E ND states. Method call causes a transition to a new state. open S TART read write close E ND m1 ; m2 is legal, new state is m2 m1m2
July 24, 2002ISSTA 2002Slide 7 Problem 1 An object has two fields, a and b. Each field must be set before being read. Solution: a product of FSMs, one for each field. set_a get_a set_b get_b S TART set_aget_aset_bget_bE ND S TART set_a get_b
July 24, 2002ISSTA 2002Slide 8 Splitting by fields set_a get_a set_b get_b Separate by fields into different, independent submodels. S TART set_aget_aset_bget_bE ND S TART set_bget_bE ND S TART set_a get_a E ND
July 24, 2002ISSTA 2002Slide 9 Problem 2 getFileDescriptor is state-preserving. Solution: distinguish between state-modifying and state-preserving. start S TART create E ND connect close getFileDescriptor S TART getFileDescriptorconnect Model for Socket
July 24, 2002ISSTA 2002Slide 10 State-preserving methods start S TART create E ND connect close S TART getFileDescriptor m1 is state-modifying m2 is state-preserving m1 ; m2 is legal, new state is m1 m1m2
July 24, 2002ISSTA 2002Slide 11 Summary of Model Product of FSMs Per-thread, per-instance Per-thread, per-instance One submodel per field Interprocedural mod-ref analysis Interprocedural mod-ref analysis Identifies methods belonging to submodelIdentifies methods belonging to submodel Separates state-modifying and state-preserving methods.Separates state-modifying and state-preserving methods. One submodel per Java interface Implementation not required. Implementation not required.
July 24, 2002ISSTA 2002Slide 12 Extraction Techniques StaticDynamic For all possible program executions For one particular program execution Conservative Exact (for that execution) Analyze implementation Analyze component usage Detect illegal transitions Detect legal transitions Superset of ideal model (upper bound) Subset of ideal model (lower bound)
July 24, 2002ISSTA 2002Slide 13 Static Model Extractor Defensive programming Implementation throws exceptions (user or system defined) on illegal input. Implementation throws exceptions (user or system defined) on illegal input. public void connect() { connection = new Socket(); } public void read() { if (connection == null) throw new IOException(); } S TART connectread connection
July 24, 2002ISSTA 2002Slide 14 Detecting Illegal Transitions Only support simple predicates Comparisons with constants, implicit null pointer checks Comparisons with constants, implicit null pointer checks Find pairs such that: Source must execute: Source must execute: field = const ;field = const ; Target must execute: Target must execute: if (field == const) throw exception;if (field == const) throw exception;
July 24, 2002ISSTA 2002Slide 15 Algorithm Source method: Constant propagation Constant at exit node Constant at exit node Target method: Control dependence Throw of exception is control dependent on predicate Throw of exception is control dependent on predicate
July 24, 2002ISSTA 2002Slide 16 Dynamic Extractor Goal: find the legal transitions that occur during an execution of the program Java bytecode instrumentation For each thread, each instance of a class: Track last state-modifying method for each submodel. Track last state-modifying method for each submodel. Same mechanism for dynamic checking Instead of adding to model, flag exception. Instead of adding to model, flag exception.
July 24, 2002ISSTA 2002Slide 17 Experiences We applied our tool to several real-life applications. ProgramDescription Lines of code Java.net Networking library 12,000 Java libraries General purpose library 300,000 J2EE Business platform 900,000 joeq Java virtual machine 65,000
July 24, 2002ISSTA 2002Slide 18 Automatic documentation java.util.AbstractList.ListItr slice on lastRet field (static) next, previous S TART set remove add
July 24, 2002ISSTA 2002Slide 19 start S TART begin E ND suspend resume rollbackcommit Automatic documentation J2EE TransactionManager (dynamic)
July 24, 2002ISSTA 2002Slide 20 Test coverage increaseRecursionDepth decreaseRecursionDepth increaseRecursionDepth simpleWriteObject S TART E ND J2EE IIOPOutputStream (dynamic) No self-edges implies a max recursion depth of 1
July 24, 2002ISSTA 2002Slide 21 Upper/lower bound of model start S TART create E ND connect available getInputStream getOutputStream close SocketImpl model (dynamic) (+static) getFileDescriptor
July 24, 2002ISSTA 2002Slide 22 Finding API bugs Applied our tool to the joeq virtual machine S TART load prepare compile Expected API for jq_Method: S TART load prepare compile Actual API for jq_Method: setOffset
July 24, 2002ISSTA 2002Slide 23 Related Work Dynamic Daikon (Ernst99) Daikon (Ernst99) DIDUCE (Hangal02) DIDUCE (Hangal02) K-limited FSM extraction (Reiss01) K-limited FSM extraction (Reiss01) Machine-learning (Ammons02) Machine-learning (Ammons02) Static Metal (Engler00) Metal (Engler00) Vault (DeLine01), NIL, Hermes (Strom86) Vault (DeLine01), NIL, Hermes (Strom86) SLAM toolkit (Ball01) SLAM toolkit (Ball01) ESC (Detlefs98) ESC (Detlefs98) ESC + Daikon (Flanagan01, Nimmer02)
July 24, 2002ISSTA 2002Slide 24 Conclusion Product of FSM Model is simple, but useful Model is simple, but useful Upper/lower bound: static/dynamic Useful for: Documentation generation Documentation generation Test coverage Test coverage Rules for automatic checkers Rules for automatic checkers Finding API bugs Finding API bugs