Download presentation
1
CASE/Re-factoring and program slicing
COMP 319 © University of Liverpool COMP319
2
CASE tool construction
File level - programming environment Language level - program representation, compiling, testing Work flow level - stages in the software engineering process itself: specification, design, development, verification, validation, management. CASE tools construction responds to three kinds of issue. File related issues and the physical structures of the SE environment – data, programs and algorithms, maths and statistics. Thus we have file comparators, program versioning, and line counting and program execution timing tools. It is impossible to generalise on how these types of tool are constructed. The language issues deal with program representation (pictures and text) and transformation into object, executable, optimisable forms, etc. With these tools because the nature of programs as more or less sequential blocks of diagrams or text, it is possible to comment on how they are constructed. We will concentrate on this aspect of CASE tools. Finally, the entire software engineering process workload and its work flow is represented and captured. Typically the processes are modelled as graphs, and the construction of tools, closely follow this representation. Often the graph is no more than a script used to automate a sequence of steps that capture processes and tasks that need to be performed. COMP319 © University of Liverpool COMP319
3
Program Language Level
Tool construction at the language level exploits form – which is usually either: Grammar – to capture the notion of text based instructions Graph – to deal with the concept of sequence When dealing with CASE tools for programming, two program language issues are exploited. 1. Language Grammar: The “language” basis for software engineering comes from an attempt to capture the notion of written instructions (to a machine) in a formal way. They capture the basic type of machine operation crudely summarised as iteration, condition, and process. The formality is structured for human consumption by conceiving these instructions within an algebra or grammar consisting of an alphabet (of ‘structured bits’ – to give numbers and symbols), and operations or productions. Language rewriting (compiling, production system, etc) then permits one representation to be converted into another using the rules of the grammar. (We return to this later) 2. Languages also lead to a representation of the instruction sequence itself. Turing in his famous work (Turing, 1953 ….) was the first to note that this was not straightforward, did not need to be temporally linear (but was easier to deal with if it were), and was similar to the mathematical process abstraction tool – the graph. Graphs come in various forms but are essentially nodes, labels, and arcs linking nodes. And so we find that graphs (nets) of various kinds pepper computer science and help us represent the underlying language processes that software engineering involve. Brooks in NSB, notes that the complexity of the programming task was because it did not lend itself to two or even three dimensional representation and that “… several … graphs …” may be needed to represent the different activities; “… flow of control, flow of data, dependency, time sequence, name space relationships, hierarchy …” [MMM p 185]. COMP319 Software Engineering II COMP319
4
© University of Liverpool
Refactoring Why CASE? not alter functionality (must be correct) cover all instances (i.e. be complete) Keep code tidy and to standard format Be quick COMP319 © University of Liverpool
5
© University of Liverpool
Re-factor types Encapsulate field public int getLength() { return(length); } Re-name method, field String pw - String password COMP319 © University of Liverpool
6
Generalisation of type
class Customer { } class Person { class Customer extends Person { COMP319 © University of Liverpool
7
Code breaking up re-factors
Extract method void setLength(int length) { if (length<0) { throw (new BadArgumentException()); } this.length=length; void validateLength(int length) { validateLength(length); COMP319 © University of Liverpool
8
© University of Liverpool
Graphs A diagram depicting a network Points at the end of arcs Nodes at the junction of arcs Regions enclosed by arcs Used for representing: Solid figures (vertices, edges, faces) Electrical circuits Relationships between entities Graph or network theory The idea that things that are related can be depicted by drawing a network has been with us for a long time with maps of road, rail, and river communication being typical examples. The formal use of graphs consisting of points, nodes, arcs and areas comes from topology (the study of those properties of geometric configurations that are preserved under continuous transformation) and one thinks of Euler’s formula: F+V-E=2 relating faces, vertexes and edges. And indeed much of the languages of graphs comes from this relationship to geometry and solid figures (edge synonymous with arc, etc). The computer science usage has its origins in electrical circuits and the abstraction of that idea to show not only flow of current/data, but also to show flow of control and change of state. COMP319 © University of Liverpool COMP319
9
© University of Liverpool
Dependence graphs All computing systems have dependencies Control dependence 1 method calling another Data dependence 1 expression effecting another A=B*2 C=A*4 Control/data dependence If (age<=18) { println(“Age invalid”); These are unavoidable in computing, they are everywhere, but rarely described as such. Their essential feature is that they capture time. The current state is dependent on a sequence that has happened in the past. And this is true for every sequence state in the past. The secret is the ability to capture this in graphical form and/or to be able to exploit the structure. However, time need not be linear; and like the London Tube Map distance (or time to get between stations) is not represented at all – sequence is what is being shown. COMP319 © University of Liverpool COMP319
10
Program dependency graphs
Term appears in a paper by Kuck, Muraoka & Chen (1981) although the idea is in Turing’s early description of “algorithms” in 1936 Captures sequence/time between entities (compare connection/distance) Control Dependence Data Dependence However, it is with Program Dependence Graphs that we are primarily concerned. The term was introduced by Kuck et al [Kuck, D.J., Muraoka, Y., & Chen, S.C. (1981) Dependence graphs and computer optimisation. Conference Record of the 8th ACM Symposium on Principles of Programming Languages. pp ] Program dependence graphs capture time as a sequence of entities. When the entities are control states we have control dependence; when data states (e.g. of variables) we have data dependence. When both control and data dependence are captured in one graph depicting a single procedure we have a program dependency graph and where many procedures or modules are connected we have the notion of a system dependency graph. As noted before dependency graphs generally are used in many parts of computer science e.g. program compilation following the syntax tree derivation stage, and used implicitly in user make systems. However, it is PDGs which will mainly concern us here. COMP319 © University of Liverpool COMP319
11
Example Program and its Dependence Graph
Most example PDG’s work with a language restricted to scalar variables (arrays etc make the diagrams more complicated), assignment statements, if-then-else statements, while loops, output statements. Although input statements are possible, procedures are assumed to run from an initial state so variables can be used without being defined (they all have to have an initial value). “The program dependence graph (or PDG) for a program P, denoted by Gp, is a directed graph whose vertices are connected by several kinds of edge. The vertices in Gp represent the assignment statement and predicates of P. In addition, Gp includes a special Entry vertex, and includes one Initial definition vertex for every variable x that may be used before being defined. (This vertex represents an assignment to the variable from the initial state.)” In this procedure we calculate the circumference and area of a circle, given the radius and value for pi. COMP319 © University of Liverpool COMP319
12
Example Program and its Dependence Graph
The edges represent control dependence (straight lines) and data dependences (curved lines). Control Dependence: The source of a control dependence edge is always either the Entry vertex or a predicate vertex and so in this example all ‘statements’ are in the inner circle surrounding the entry point; control dependence edges are labelled either “true” or “false” (omitted here for clarity). If the program component at vertex v (v -> w) is evaluated during the program and matches the label “true” or “false”, then assuming that the program eventually terminates normally, the component represented by w will eventually execute. If it does not match, then the component represented by w may never execute. (By definition the Entry vertex always evaluates to true). Control dependence edges reflect the nesting structure of the program. One between a while and every statement in the block (none here) all labelled “true”. One between an if condition (if DEBUG) and the statements in the then block labelled “true” and one from the condition and all the statements in the else block labelled “false”. Data Dependence: These edges are of two kinds, flow dependence (solid lines) and def-order dependence (dotted). Flow dependence edges represent possible flow values; i.e. there is a flow dependence edge v -> w if vertex v represents a program component that assigns a value to some variable x, vertex w represents a component that uses the variable x. Def-order dependence edges deal with the same variable given alternative values e.g. rad=3 and rad=4 where it is possible for program flow to get from one to the other, as here if debug is true. There are as many def-order edges as there are program components (witness components) that use them; here - area=P*(rad*rad), and circ=2*P*rad and the edge is labelled with the witness node number (omitted). COMP319 © University of Liverpool COMP319
13
Dependency graph usage
Optimisation Multiple independent statements can run in parallel Code that never runs can be removed Boolean skip=true; If (!skip) then Loop invariance For (k=1;k<max_items;k++) { sum=sum+a*b; } COMP319 © University of Liverpool
14
© University of Liverpool
Program slicing Interactive method for Debugging Program understanding Program maintenance Program reduction technique (highlighter) A demonstration of SDG allowing: Control flow analysis Data flow analysis It is an interactive technique. It has also been used as a technique for supporting reuse, parallelisation of code, software re-factoring the calculation of metrics, reverse engineering, and program integration. The system dependence graph produced allows you to analyse what code effects what variables. These stem from the way slices can capture both control and data flow and thus allow us to analyse both control and data flow. COMP319 © University of Liverpool COMP319
15
© University of Liverpool
How does it work Choose v a variable or set of variables Choose n a point of interest Using the dependence graph the slice v at n is constructed The slice v at n can be compiled and studied separately Slices may be forward or backward from n Program slicing works by finding the parts of a program that affect a chosen set of variables at some chosen point in a program. A slice is constructed by deleting the parts of the program that are irrelevant to those values. The point of interest is usually identified by annotating the program with line numbers which identify each primitive statement and each branch point. The term ‘slicing criterion’ is used for the point of interest together with the set of variables whose value the slice must preserve. For a slicing criterion consisting of a variable v and a point of interest n, the slice is constructed for v at n. Having picked a slicing criterion one of two forms of slice can be constructed: a backward slice or a forward slice. The former contains the statements of the program which can have some effect on the slicing criterion, whereas a forward slice contains those statements of the program which are affected by the slicing criterion. Backward slices can assist a developer by helping to locate the parts of the program which contain a bug. Forward slicing can be used to predict the parts of a program that will be affected by a modification COMP319 © University of Liverpool COMP319
16
Backward Slicing Original program Backward Slice x = 1; y = 2;
z = y-2; r = x; z = x+y; /* the slice point is the end of the program */. Backward Slice A backward slice – simply a version of the original program with some parts missing – can be compiled and executed. An important property of any backward slice is that it preserves the effect of the original program on the variable chosen at the selected point of interest within the program. A backward slice on z at the end of the program can be built to focus attention on this aspect of the fragment. x = 1; y = 2; z = x+y; COMP319
17
© University of Liverpool
Debugging with a slice Pass = 0 ; Fail = 0 ; Count = 0 ; while (!eof()) { TotalMarks=0; scanf("%d",Marks); if (Marks >= 40) Pass = Pass + 1; if (Marks < 40) Fail = Fail + 1; Count = Count + 1; TotalMarks = TotalMarks+Marks ; } printf("Out of %d, %d passed and %d failed\n", Count, Pass, Fail) ; average = TotalMarks/Count; /* point of interest */ printf("The average was %d\n",average) ; PassRate = Pass/Count*100 ; printf("This is a pass rate of %d\n",PassRate) ; Sadly when the program is executed the average is extremely low; a bug … COMP319 © University of Liverpool COMP319
18
Bug location with backward slicing
while (!eof()) { TotalMarks = 0; scanf("%d",Marks); Count = Count + 1; TotalMarks = TotalMarks+Marks; } average = TotalMarks/Count; printf("The average was %d\n",average) ; COMP319 © University of Liverpool
19
© University of Liverpool
Forward Slicing Original program x = 1; /* considering changing this line */ y = 3; p = x + y ; z = y -2 ; if(p==0) r++ ; Forward Slice Suppose the first line of the program needs altering. As this line assigns a value to the variable x, any subsequent part of the program which ultimately depends upon the value of x may behave differently after the modification. The forward slice contains the lines of the program which are affected by a change to the first line. Notice that the line r++; has to be included in the forward slice, as its execution is controlled by the predicate p==0 which is affected by the slicing criterion. /* Change to first line will affect */ p = x + y ; if(p==0) r++ ; COMP319 © University of Liverpool COMP319
20
© University of Liverpool
Maintenance - Example n = 0; product = 1; sum = 1; scanf("%d",&x) ; while (x >= 0) { sum = sum + x; product = product * x ; n = n + 1; scanf("%d",&x); } average = (sum - 1) / n ; printf("The total is %d\n",sum) ; printf("The product is %d\n",product) ; printf("The average is %d\n",average) ; COMP319 © University of Liverpool
21
Maintenance – backward slice
sum = 1; scanf("%d",&x) ; while (x >= 0) { sum = sum + x; scanf("%d",&x); } printf("The total is %d\n",sum) ; … it occurs because sum was initialised to 1 rather than zero and so is always 1 more than it should be. In maintenance, we want to see what effect changing the initial value from 1 to 0 will have; this is the “ripple” effect associated with correcting this bug. COMP319 © University of Liverpool COMP319
22
Maintenance – forward slice
n = 0; product = 1; sum = 0; scanf("%d",&x) ; while (x >= 0) { sum = sum + x; /* AFFECTED */ product = product * x ; n = n + 1; scanf("%d",&x); } Average = (sum - 1) / n ; /* AFFECTED */ printf("The total is %d\n",sum) ; /* AFFECTED */ printf("The product is %d\n",product) ; printf("The average is %d\n",average) ; /* AFFECTED */ COMP319 © University of Liverpool
23
© University of Liverpool
Types of slicing Static – described above. Slices are constructed at compile time Dynamic slicing where slices are constructed once the input is known Conditional slicing done at breakpoints during execution Inter-modular slicing - complex systems Once the basic idea of slicing is present we can use it in a number of ways. Static slicing, involving backward and forward slicing has been described above. Dynamic slicing is more tricky and depends on having knowledge of the input data. Conditional slicing is what most slicing software provides and is essentially the idea that you have an interactive slicing program that pauses at interest points (breakpoints) and at that point an appropriate program slice is constructed. Most software now allows inter-modular slicing, although this begins to fail when slices are thick and involve several variables and recursive searching to resolve dependencies. COMP319 © University of Liverpool COMP319
24
The Horowitz, Prins, & Rep (HPR) algorithm (merging)
Step 1. Determine changed and preserved slices e.g. adding a diameter calculation Step 2. Form the merged graph. Using the idea of ‘graph union’ Step 3. Test for interference i.e the merged graph preserves all the slices of all the variants. Step 4. Construct source from the merged graph Slicing is a powerful because one can use slices in a number of ways. For example Susan Horowitz et al [Horowitz, Susan, Prins, J. & Reps, T. (1990) [Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems 12(1) pp 26-60] developed an algorithm which can be used to solve the “Program Integration Problem”. This is where we seek to merge two or more variants of a base program if they are compatible. If they are not compatible the idea is to identify the source of the interference. By hand this is done using a “diff” program to extract the differences, then a laborious line by line comparison between the variants. The HPR algorithm provides a slicing solution which is remarkably elegant – described in Horowitz & Reps (1990), a summary of the method is described in one of their subsequent papers which is available as the course resource HR90.pdf. This was subsequently improved in the YHR algorithm by Yang, Horowitz and Reps (1992) in which limited slicing (partial slices which actually change the affected point) is employed (see HR90.pdf.) The key to the HPR algorithms is that slices can be merged and as in debugging, incompatible slices can be identified. COMP319 © University of Liverpool COMP319
25
© University of Liverpool
Why richer constructs? More useful slicing SDG and inter-module slicing SDG from parse trees Other methods of generating an SDG Calls and variable scope handling Pointers, aliases, classes Why richer constructs? Because these take slicing out of the laboratory and into the real world. The single procedure Program Dependency Graph has stepped up to become the System Dependency Graph and implies inter-modular slicing. However, this introduces a whole new raft of issues to be handled. This includes how to generate the SDG (use the compiler’s parse tree seems to be the best bet, but better methods are always welcome), how to handle the scope of variables (e.g. how to handle global declarations – by considering the variable a parameter to any procedure that uses it), and how to handle procedure calls and parameters (adding edge types to the dependency graph). Finally (well at least on this slide) research is underway to be able to handle pointer systems (e.g. in C), aliases (for both data and procedure constructs), and more complex data constructs such as classes (which are usually implemented as pointer structures). COMP319 © University of Liverpool COMP319
26
© University of Liverpool
JSlice Java slicing software developed by National University of Singapore Performs dynamic slicing Uses a compressed trace which records Flow control instructions Data manipulation JVM Kaffe Clean room implementation of Java COMP319 © University of Liverpool
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.