Public PhD Defense A Formal Foundation for Object-Oriented Software Evolution Tom Mens DWIS / DINF Faculty of Sciences Vrije Universiteit Brussel September 24, 1999
Ph.D. Thesis “A formal foundation for reuse contracts allows us to deal with software evolution in a domain-independent and scalable way.”
Overview of Presentation - Introduction Motivation Reuse Contracts Example Graph Rewriting Formalism Reuse Contract Formalism Scalability Issues Domain-Independence Experiments Conclusion
Motivation Address lack of tool support for software evolution CASE tools, IDEs, version control tools Provide formal foundation for reuse contracts clear and unambiguous definitions better insight in evolution process facilitate tool support improve scalability Joint research effort methodology aspect [Steyaert&al96, Lucas97] tool aspect [DeHondt98] formal aspect It is important to address the lack of support for evolution in current software development environments and CASE tools because software evolution is unavoidable and necessary in all phases of the software life-cycle. Interesting formal properties: confluence, parallel and sequential independence, etc... Until now, research in graph rewriting has mostly focussed on theoretical results. This dissertation gives a practical application area of many of these theoretical results.
Motivation ctd. Lack of formal approaches to software evolution very few formal approaches available always domain-specific architectural evolution evolving specifications usually focus on anticipated changes e.g. run-time software reconfiguration Need for domain-independent approach to unanticipated software evolution In the context of evolution, one of the most important benefits of reuse contracts is that they provide help when merging parallel evolutions to the same software artifact. In this respect, they can be considered as a kind of merge tool. Many commercial merge tools and research prototypes can be found. They can be distinguished based on the following characteristics: 2 way versus 3 way merging. 3 way merging is the more powerful, and is also the approach taken by RCs. Textual, syntactic or semantic merging. Semantic merging allows to detect the largest number of possible conflicts. Evolution conflicts of RCs are a kind of semantic conflicts, while applicability conflicts are syntactic conflicts. state-based merging only compares the results without knowing how they are achieved. Operation-based merging also looks at the primitive operations that have been applied to achieve the result. This makes them more flexible and allows to detect more conflicts. Again, RCs belong to the latter category.
When upgrading to new versions of software Reuse Contracts Document unanticipated evolution in a disciplined way Allow to detect conflicts When upgrading to new versions of software When merging parallel evolutions of same software artifact (collaborative software development) evolution merge conflict upgrade reuse conflict We have chosen for reuse contracts because they have already illustrated their practical use, and because they have already been defined for different kinds of software artifacts (software requirements in OBA, class inheritance hierarchies in Smalltalk and Java, collaboration diagrams in UML). Document unanticipated reuse and evolution in a disciplined way. By formally documenting unanticipated evolution, RCs make it possible to solve evolution problems. Allow to detect conflicts. Two kinds of conflicts can be distinguished: Upgrade conflicts occur when a certain software artifact, which is already reused or incrementally modified (e.g., by means of an inheritance mechanism), is upgraded to a new version. This focus has been taken in [Lucas97] and earlier papers on reuse contracts (so-called “base class exchange”) Merge conflicts occur when the same software artifact is changed in parallel by different software developers for different reasons, and inconsistencies turn up when these parallel changes are merged. This often occurs in collaborative software development, where many developers simultaneously work on the same code. This is the focus taken in this dissertation. Allow to assess the impact of a change: To assess the impact of a change, a “what-if” scenario is followed. The change is made, and then we see how much conflicts arise because of this change. If the conflicts turn out to be too severe, or if the number of conflicts is too large, one can decide not to make the change. Provide help when actually making the change Based on the contract type we obtain more specific information about the actual conflict that occurs, which makes it easier to see how the conflict can be resolved. Lightweight approach simple ideas, easy to implement & customise
Example Evolution of UML class diagrams Point Circle Triangle Geo center vertices Circle radius Triangle 1 3 Geo area() circumference() Point center vertices Circle radius area() circumference() Triangle 1 3 {*radius2}
Overview of Presentation - Introduction Graph Rewriting Formalism Graphs Graph Rewriting Application Conditions Reuse Contract Formalism Scalability Issues Domain-Independence Experiments Conclusion
Graphs Example: UML class diagram Internal Graph Representation intersects(c: Circle) radius Circle distanceTo(p: Point) x y Point Triangle area() circumference() Geo center vertices 3 Example: UML class diagram G Triangle «class» Circle «class» «isa» intersects «operation» «assoc» center radius «attribute» «hasa» vertices {3} Point «class» Geo «class» area «operation» circumference «operation» x «attribute» distanceTo «operation» y «attribute» Internal Graph Representation Node types: «class» «attribute» «operation» «interface» Edge types: «assoc» «hasa» (aggregation) «isa» (generalisation) «implements»
Type Graph Node & edge subtype hierarchy Additional constraints needed implements nested operation attribute interface class assoc, hasa, isa isa uses invokes v e node type edge type Node & edge subtype hierarchy Additional constraints needed e.g. inheritance hierarchy is acyclic Need to specify additional domain-specific well-formedness constraints: «isa»-edges must connect nodes of the same type «isa»-hierarchy should be acyclic
(Conditional) Graph Rewriting represents evolution of arbitrary software artifacts Algebraic single-pushout approach L’ area «operation» radius «attribute» C s use application conditions: more expressive and concise L area «operation» radius «attribute» P m R «uses» G Circle «class» circumference «operation» H Circle «class» area «operation» «uses» circumference «operation» radius «attribute» pushout construction We use a category-theoretical, or algebraic, approach towards graph rewriting. More specifically, we have chosen for the single-pushout approach. An alternative would have been the double-pushout approach, but productions in the SPO are significantly simpler, substantially reducing many proofs. Moreover, SPO is more general than DPO [Lowe93]. A disadvantage of SPO over DPO is that there can be unintuitive side-effects: dangling edge conflict, identification conflict. However, our primitive productions will be chosen in such a way that these side-effects will not occur.
Overview of Presentation - Introduction Graph Rewriting Formalism Reuse Contract Formalism Detecting merge conflicts Primitive contract types Applicability conflicts Evolution conflicts Scalability Issues Domain-Independence Experiments Conclusion
Detecting Merge Conflicts Two kinds of merge conflicts Structural or applicability conflicts Behavioural or evolution conflicts Conservative approach: only detect conflict warnings Approach Provide general formal definition in terms of graph rewriting formalism Complete fine-grained characterisation As already mentioned before, the idea of RCs is to use the contract type (which documents the modifications that are being made in a precise way) to detect conflicts when merging parallel modifications of the same sofware artifact. Two kinds of merge conflicts can be distinguished: Applicability conflicts correspond to structural or semantic problems. Only (a subset of) these conflicts can be detected by commercially available merge tools. Evolution conflicts are more important since they correspond to behavioural or semantic inconsistencies. This cannot be detected by commercially available merge tools. For both kinds of conflicts we can provide a general formal definition based on RCs, as well as a complete fine-grained characterisation based on the contract types that are involved. In this way, conflict tables can be set up to detect the conflicts more efficiently and to give detailed feedback about the kind of conflict that has occurred.
Primitive Contract Types Use a restricted set of possible graph productions Extension Cancellation Refinement Coarsening NodeRetyping EdgeRetyping Refinement (e,area,radius,«uses») R area «operation» radius «attribute» «uses» L Cancellation (area,«operation») «operation» area We have defined an orthogonal set of primitive contract types. The possible modifications correspond to: adding/removing a node in a graph adding/removing an edge in a graph changing the type of a node or edge in a graph
Structural Conflicts Applicability conflict if P1 and P2 not parallel independent Gives rise to ill-formed result graph (syntactic conflict) G Circle «class» area «operation» circumference «operation» radius «attribute» «uses» G1 Circle «class» area «operation» circumference «operation» radius «attribute» «uses» Refinement (e,area,radius,«uses») P1 Cancellation (area,«operation») Undefined source conflict Refinement Cancellation P2 Applicability conflicts are conflicts that would lead to an ill-formed result graph when two parallel modifications of the same software artifact are merged. Using conditional graph rewriting these conflicts can be detected very easily, if one of the two productions is not applicable after the other. Formally this means that P1 and P2 are not parallel independent. <<uses>> G2 Circle «class» circumference «operation» radius «attribute» «uses»
Applicability Conflict Table Complete fine-grained characterisation of applicability conflicts AC3 Extend (v,) Cancel (v,) Refine (e,v,w,) Refine (e,u,v,) Coarsen (e,v,w,) Coarsen (e,u,v,) Nretype (v,,1) ERetype (e,v,w,,1) ERetype (e,u,v,,1) Extension (v,) AC1 Cancellation (v,) AC2 AC4 AC9 Refinement (e,v,w,) AC5 Refinement (e,u,v,) Coarsening (e,v,w,) AC6 AC10 if = Coarsening (e,u,v,) NodeRetype (v,,2) AC7 EdgeRetype (e,v,w,,2) AC8 if = EdgeRetype (e,u,v,,2) The example of the undefined source conflict on the previous slide occurs when a Refinement is combined with a Cancellation. In the symmetric conflict table above it is shown by means of filled red rectangles (conflict AC3).
Alternative Conflict Table w e v v w e type(v)= type(e,v,w)=t AC1 v AC2 AC3 AC9 v AC3 AC5 v w e AC6 AC10 v w e AC9 AC7 type(v)= AC10 AC8 type(e,v,w)=t
Evolution Conflicts Pullback construction Pushout construction L1 R1 G area «operation» radius «attribute» R1 «uses» G Circle «class» circumfer «operation» G1 m1 Refinement (e,area,radius,«uses») L area «operation» Pullback construction L2 area «operation» circumfer «operation» R2 «uses» G2 Circle «class» radius «attribute» Refinement (e,area,circumference,«uses») m2 H Circle «class» area «operation» circumfer «operation» radius «attribute» «uses» Pushout construction Formally, an evolution conflict can be detected between two primitive productions P1 and P2 if the pullback of m1 and m2 is not empty. This indicates potential problems in the pushout of P1* and P2*. In the example, area is a point of interaction between both independent evolutions P1* and P2*. It leads to two different (maybe incompatible) paths from area to radius. Hence it gives rise to a potential evolution conflict. “Potential” because the conflict depends on the precise behaviour that is associated with the graph (i.e. The software artifact). Because the evolution conflicts that are detected are only potential, reuse contracts can only give an upper bound approximation of the conflicts. The more domain-specific information is known, the better the approximation of the conflicts will be.
Evolution Conflict Detection Finer-grained characterisation Compare pairs of primitive contract types double reachability conflict cycle introduction conflict Detect graph patterns in result of merge more general more scalable «uses» area radius {evolver 1} {evolver 2} Detect graph patterns: The pattern that is detected should correspond to newly introduced edges! (This can be achieved my making use of modification tags.)
Overview of Presentation - Introduction Graph Rewriting Formalism Reuse Contract Formalism Scalability Issues Composite Contract Types Normalisation Domain-Independence Experiments Conclusion
Composite Contract Types G0 Triangle «class» area «operation» Circle «class» «assoc» center circumference «operation» radius «attribute» «hasa» vertices Point «class» G1 Triangle «class» area «operation» Circle «class» circumference «operation» «assoc» center radius «attribute» «hasa» vertices Point «class» Geo «class» «isa» CreateSuperclass (Geo,[Circle,Triangle]) Extension(Geo,«class») Refinement(,Circle,Geo,«isa») Refinement(,Triangle,Geo,«isa») Coarsening(center,Circle,Point,«assoc») Coarsening(center,Triangle,Point,«assoc») Refinement(center,Geo,Point,«assoc»)
Composite Contract Types ctd. can be domain-independent can be domain-specific (e.g. CreateSuperclass) are defined as composite productions Conflicts can be detected directly using alternative applicability conflict table using graph pattern approach Advantages more practical in use atomic, more efficient remove unnecessary conflict warnings Composite contract types are more intuitive in use: software developers can specify often recurring combinations of primitive contract types which have a more intuitive meaning than the corresponding sequence of primitive contract types. Composite contract types are a kind of atomic transactions. Either they are applied as a whole, or they are not applied at all. This would not be the case when considering them as mere sequences of primitive contract types. Composite contract types are more efficient, because all the application preconditions of its primitive constituents can be translated into usually less application conditions of the composite contract types. Idem dito for postconditions. Composite contract types sometimes allow us to reduce conflicts. For certain composite contract types (such as Factorisation in the dissertation), certain evolution conflicts at the lowest level may be disregarded because of the way the primitive contract types are combined at a higher level. Sometimes evolution conflicts with primitive contract types in the beginning of the sequence become irrelevant because of other primitive contract types later in the sequence.
Normalisation algorithm Remove redundancy in evolution sequence remove redundant couples Extension; Cancellation absorb couples of primitive contract types Refinement; EdgeRetyping = Refinement Rearrange primitive contract types based on sequential independence canonical form Extensions; NodeRetypings; Refinements, ... The normalisation algorithm uses a kind of bubble-sort algorithm -> O(n^2)
Normalisation ctd. Advantages To do compacts evolution sequence (reduces complexity) removes unnecessary conflict warnings makes evolution process easier to understand finding specific modifications comparing differences between parallel evolutions To do improve efficiency of normalisation algorithm rely on canonical form for merging normalised evolution sequences
Overview of Presentation - Introduction Graph Rewriting Formalism Reuse Contract Formalism Scalability Issues Domain-Independence Possible Customisations Customising the formalism Case studies Experiments Conclusion
Possible Customisations class collaborations extension and generalisation of [Lucas97] similar to UML collaboration diagrams [Mens&al99] UML class diagrams software architectures [Romero99] others other UML diagrams non-OO paradigms?
Case study: UML class diagrams Specify type graph & type constraints Specify domain-specific modifications Primitive contract types Extension: AddOperation, AddAttribute, AddClassifier Cancellation: DropOperation, DropAttribute, DropClassifier Refinement: AddGeneralisation, AddAssociation Composite contract types (e.g. CreateSuperclass) Fine-tune conflict detection type graph gives rise to new applicability conflicts wf-constraints capture some evolution conflict warnings e.g. cycle introduction for inheritance-edges use domain-specific knowledge to ignore conflict warnings e.g. ignore cycle introduction for associations
Domain-specific normalisation AddClass(B); AddClass(A); AddOperation(A.m); DropClass(B); AddAttribute(A.a); DropOperation(A.m) Domain-specific customisation AddClass(A); AddAttribute(A.a) translation translation Extension(B,«class»); Extension(A,«class»); Extension(A.m,«operation»); Cancellation(B,«class»); Extension(A.a,«attribute»); Cancellation(A.m,«operation») normalisation Extension(A,«class»); Extension(A.a, «attribute») Domain-independent framework (not for composite contract types)
Case study: software architectures need to detect more high-level conflicts introduce derived edges in «gate» out «link» Parser «component» Pipe2 «connector» Semantor Coder Pipe3 SequentialCompiler «architecture» Lexer Pipe1 «binding» «pipe»
Overview of Presentation - Introduction Graph Rewriting Formalism Reuse Contract Formalism Scalability Issues Domain-Independence Experiments Conclusion
Experiments Implementation of basic formalism Scalability primitive contract types, conflict detection implemented in PROLOG rapid prototyping expressing conflict detection rules directly unification mechanism for detecting graph patterns use SOUL to access and reason about Smalltalk code Scalability normalisation algorithm no validation of composite contract types lack of adequate (large-scale) industry case Basic formalism Our original experiments were implemented in Mathematica (including the normalisation algo), but later we switched to PROLOG because it turned out to be more suited for our purposes. Scalability experiments: We were unable to do any satisfactory experiments concerning scalability and composite contract types because we didn’t have an adequate case study in which to perform our experiments. The case with class diagrams at WANG Global turned out not to be a good one because the evolution steps performed there were too small to be useful. Therefore, further experiments are needed to find out how we can reduce the number of relevant conflicts to a manageable number
Experiments ctd. Domain-specific customisation customisations of PROLOG framework UML class diagrams small experiments with UML CASE tools identify basic conflicts for class diagrams (based on small changes in industry case study) software architectures [Romero99] Further experiments needed develop and implement efficient algorithms validate scalability aspects integrate in CASE tool and version control tool
RCs for version control Current-day version control systems use version graph textual merging only research prototypes for structural/behavioural merging Next generation version control tools Use normalisation to compact version graph to compare between alternative variants easily refactor commonalities in different variants More sophisticated merge tools structural and behavioural merging domain-independent More general kind of refactoring: instead of refactoring commonalities between subclasses to a common superclass within the same framework, we can scale this up by trying to refactor commonalities between different customisations or variants of the same framework!
Overview of Presentation - Introduction Graph Rewriting Formalism Reuse Contract Formalism Scalability Issues Domain-Independence Experiments Conclusion Contribution Future Work
Contribution Domain-independent formalism for evolution can be applied in all phases of software life-cycle Reuse contract formalism enables formal distinction between structural & behavioural conflicts complete fine-grained characterisation of conflicts scalability (composite contract types, normalisation) Simple yet general model for evolution easy to implement in tools simple ideas with large practical impact Better support for evolution in tools CASE tools, IDEs, version control systems
Future Work Focus on conflict resolution More scalability issues techniques to reduce potential conflicts use more sophisticated conflict detection techniques in presence of composite contract types modify normalisation algorithm factorisation algorithm for generating composite contract types formal properties (e.g. commutativity, inverse, ...) Co-evolution between different models / between model and metamodel Enhancing underlying graph formalism nested hyperedges, encapsulation mechanism, parameterisation mechanism, more complex application conditions, ... extension needs to preserve formal properties Focus on conflict resolution In our dissertation we only looked at techniques for conflict detection. However, an equally important topic is to resolve the conflicts in a semi-automated way after they have been detected. Scalability issues Another important issue is how the normalisation algorithm can be changed to work in presence of composite contract types? Techniques to reduce number of detected conflicts Impact analysis techniques make use of sophisticqted search algorithms that take more factors into account than just plain dependencies. They can make use of heuristics that suggest which paths could be avoided or use stochastic probabilities to determine the likelihood of an impact. A conflict never comes alone. Often, a problem situation can give rise to a whole bunch of conflicts. By solving the problem, all of these conflicts are solved at the same time. This is for example the case with transitive closure conflicts. A first-order conflict gives rise to a second-order conflict somewhere else, etc... By using application conditions to specify evolution conditions (on a graph) and evolution invariants (on the rewriting system), many of the evolution conflicts can be avoided since the set of possible evolutions that can be made to a graph is reduced.