Interprocedural Shape Analysis for Recursive Programs Noam Rinetzky Mooly Sagiv
Shape Analysis Static program analysis Determines information about dynamically allocated storage –A pointer variable is not NULL –Two data structures are disjoint The algorithm is Conservative
Applications of Shape Analysis Cleanness –Dor, Rodeh, Sagiv [SAS2000] Parallelization –Assmann, Weinhardt [PMMPC93] –Hendren, Nicolau [TPDS90] –Larus, Hilfinger [PLDI88]
Current State Good Intraprocedural analyses Sagiv, Reps, Wilhelm [TOPLAS 1998] –Analyze body of list manipulation procedures: reverse, insert, delete –Expensive, imprecise interprocedural analyses of recursive procedures
Main Results Interprocedural shape analysis algorithm for programs manipulating linked lists –Handles recursive procedures Prototype implementation –Successfully analyzed several list manipulating procedures insert, delete, reverse, reverse_append –Properties verified An a-cyclic list remains a-cyclic No memory leaks No NULL dereference
Running Example typedef struct List { int data ; struct List* n ; } *L ; L create(int s) { L t=NULL; if (s <= 0) return NULL; t = (L) malloc(sizeof(*L)); t data = s ; l 2 : t n = create(s-1); return t; } void main() { L r = NULL; int k; … l 1 : r = create(k); }
Selected Memory States exit k=3 r = NULL void main() { L r = NULL; int k; … l 1 : r = create(k); }
L create(int s) { L t=NULL; if (s <= 0) return NULL; t = (L) malloc(sizeof(*L)); t d = s ; l 2 : t n = create(s-1); return t; } l 1 s=3 t Selected Memory States l 2 s=0 t = NULL l 2 s=1 t l 2 s=2 t exit k=3 r = NULL 3 NULL 2 1
L create(int s) { L t=NULL; if (s <= 0) return NULL; t = (L) malloc(sizeof(*L)); t d = s ; l 2 : t n = create(s-1); return t; } l 1 s=3 t Selected Memory States l 2 s=1 t l 2 s=2 t exit k=3 r = NULL 3 NULL 2 1
L create(int s) { L t=NULL; if (s <= 0) return NULL; t = (L) malloc(sizeof(*L)); t d = s ; l 2 : t n = create(s-1); return t; } l 1 s=3 t Selected Memory States l 2 s=2 t exit k=3 r = NULL 3 NULL 2 1
L create(int s) { L t=NULL; if (s <= 0) return NULL; t = (L) malloc(sizeof(*L)); t d = s ; l 2 : t n = create(s-1); return t; } l 1 s=3 t Selected Memory States exit k=3 r = NULL 32 1 NULL
Selected Memory States exit k=3 r 32 1 NULL void main() { L r = NULL; int k; … l 1 : r = create(k); }
Where is the Challenge ? Dynamic allocation –Unbounded number of objects Recursion –Unbounded number of activation records Properties of: –Invisible instances of local variables –Dynamically allocated objects l 1 s=3 l 2 s=0 l 2 s=1 l 2 s=2 exit k=3 3 NULL 2 1 r = NULL t t t t = NULL
Our Approach Reduce the interprocedural problem shape analysis problem to an intraprocedural problem Program with procedures Program without procedures Represent the activation record stack as a linked list: Control Information Invisible instances of local variables Explicit manipulation of the stack
Our Algorithm Abstract Interpretation –Concrete Semantics: Concrete representation of memory states Effect of program statements –Abstract Semantics: Abstract representation of memory states Transfer functions Finds abstract representation of memory states at every program point
Concrete Memory Descriptors cs exit cs l1 cs l2 top cs l2 pr t t t l 1 s=3 t l 2 s=0 t = NULL l 2 s1 t l 2 s=2 t exit k=3 r = NULL 3 NULL 2 1
Concrete Memory Descriptors Relationships between memory elements: value of local variables: t, r n-successor: n invoked by: pr cs exit cs l1 cs l2 top cs l2 pr t t t Properties of memory elements: “type”: stack, heap “visibility”: top “call-site”: exit, cs l 1, cs l 2
Bounding the Representation Concrete Memory Descriptors represent memory states –Every object is represented uniquely Abstract Memory Descriptors –Conservatively represent Concrete Memory Descriptors –A bounded representation
3-Valued Properties TrueFalse top t Don’t Know top=1/2 t
Abstraction cs exit cs l1 cs l2 cs l2, top pr t t t t t cs exit cs l1 cs l2, top pr cs l2 pr
Bounding the Representation Summarize nodes according to their unary properties Join values of relationships Convert a Concrete Memory Descriptor of arbitrary size into an Abstract Memory Descriptor of bounded size Does the Abstract Memory Descriptor contain enough information?
Problem cs l2, top cs l2 exit pr t cs l1 pr t exit cs l1 cs l2 cs l2, top pr t t t
Observing Properties of Invisible Variables Explicitly track universal properties of invisible-variables –Different invisible instances of t cannot point to the same heap cell Instrumentation properties –Track derived properties of memory elements
Some Instrumentation Properties Pointed-to by an invisible instance of t Pointed by more than one invisible instance of t t is not NULL
Memory Descriptors with Instrumentation exit cs l1 cs l2 cs l2, top pr t t t cs l2, top cs l2 exit pr cs l1 pr t t
Problem - solved cs l2, top cs l2 exit pr cs l1 pr t t exit cs l1 cs l2 cs l2, top pr t t cs l2, top t
Why Does It Work Shape analysis handles linked list quite precisely (Sagiv, Reps, Wilhelm [TOPLAS98]) Utilize the (intraprocedural) 3-valued logic framework of Sagiv, Reps and Wilhelm [POPL99] to analyze the resulting intraprocedural problem
Prototype Implementation Implemented in TVLA [Lev-Ami, Sagiv SAS 2000] Analyzed some recursive list manipulating programs Verified cleanness properties: –No memory leaks –No NULL dereferences
Prototype Implementation Procedure create delAll insert delete search append reverse reverse_append reverse_append _r Running example Time (sec) Number of (3VL) Structures
Conclusion Need to know more than potential values of invisible variables Tracking properties of invisible variables helps to overcome the (necessary) imprecision summarization of their values Instrumentation –Generic Sharing by different instances of a local variable –List specific
Conclusion Storing the call-site enable to improve information propagation to return-sites Shows how the intraprocedural framework of Sagiv, Reps and Wilhelm can be used for interprocedural analyses Analysis of a complex data structure
Limitations Small programs No mutual recursion (Implementation) Predefined instrumentation library Easy to use, no need for user intervention –Might not be good for all programs
Further Work Scaling the algorithm –Distinguishing between “relevant context” and “irrelevant” context –Analysis of programs manipulating Abstract Data Types
The End Interprocedural shape analysis for recursive programs Noam rinetzky and Mooly Sagiv Compiler Construction