Modular Heap Analysis Of Higher Order Programs Ravichandhran Madhavan + * Ganesan Ramalingam * Kapil Vaswani * * Microsoft Research India + EPFL, Switzerland
Goal 1: Analyze Modularly Compute succinct summaries for procedures Summaries: total functions approximating the relational semantics Input State Output States
Goal 2: Track Heap Information The summary of a procedure should capture the transformation of the input mutable heap Goal 3: Analyze HO programs Should be able to summarize higher order procedures Input state includes data as well as code
Challenge Indirect procedure calls esp. Call backs Virtual method calls, function pointer calls, lambda expressions Foo(PTR* p, FP* fp) { *p = (**fp)(0); } Count() { iter = this.iterator(); i = 0; while(iter.HasNext()) { iter.next(); i++; }
Challenge All widely used languages support Higher Order constructs But how do existing modular analyses handle them ?
A Common Hack Estimate the targets of the indirect calls through an inexpensive analysis E.g. CHA, RTI analysis for OO programs Light weight pointer analysis … Construct a conservative call graph Analyze bottom up
Limitations of the Hack Over-approximated targets A call-graph is necessarily context insensitive for HO programs A C B D E Bs context As context
Limitations of the Hack Inability to construct client independent summaries Foo(FP* fp) { (*fp)(…); } m1() { … } C1() { Foo(m1); } m2() { … } C2() { Foo(m2) } Resolved to m1
Limitations of the Hack Reuse of summaries possible only within an analysis Need to analyze libraries together with clients Need to reanalyze libraries for each new client Doesnt allow library compositional analysis
Our approach Use existing techniques for summarizing first-order code segments: [Whaley, Salcianu, Rinard, OOPSLA 99, VMCAI 04] [Madhavan et al., SAS 11] Retain the call backs in the summaries
Our approach Perform as much simplification as possible without the knowledge of the calling context Eliminate fully resolved calls from the summaries Enables efficient library compositional analysis
Illustration *fp(a,b)
Illustration *fp(a,b)
Illustration *fp(a,b) 7 1
Illustration 3 5 *fp(a,b) 7 1
Exploiting Local Context 3 5 *fp(a,b) Frame Rule
Exploiting Local Context 3 5 *fp(a,b) Frame Rule
Flow Insensitive Abstraction 3 5 *fp(a,b)
Flow Insensitive Abstraction
Composition Operation ID
Composition Operation ID
Composition Operation
Handling Direct Calls Handle direct calls via summary composition Call backs in the callee are inlined in the caller
Indirect call Resolution B A
Indirect Call Resolution A B C
A B C ….. Fixed point
Eliminating resolved calls *fp1 … Resolved calls Foo *fp2 … Bar
Experimental Evaluation Applied to Purity/Side-effects Analysis for C# libraries Every method is classified as: Pure – No side-effects Conditionally Pure – Purity depends on the calling context Impure – Has side-effects Impure and Incomplete – Has side-effects and can have more depending on the calling context
Experimental Results BenchmarkLOCPureC-PureImpureI-ImpureTime DocX 10K~ 1 min FB APIs2.2%32% Data Disp.57% Test APIs Json Libs Quickgraph Refactory libs30%8% Utility Libs32%8% PDF libs28.4% GPS libs 250K~ 2 hrs 10 – 20% 15 – 30% 20 – 30% 2 – 27 min
Analysis Statistics BenchmarkUnresolved Calls Non Escaping Abs. Objects DocX FB APIs9% Data Disp. Test APIs Json Libs7.3 Quickgraph Refactory libs Utility Libs PDF libs37% GPS libs5.9 2 – 4 10 – 33 %
Comparison with CHA CG based Bottom up Analysis BenchmarkTime# of SCCsAvg. Scc size DocX12x0NA FB APIs11x3x1.5x Data Disp.6x Test APIs6x2x1.25x Json Libs2x6x Quickgraph11x33x Refactory libs1.4x5.6x Utility Libs30x4x12x PDF libs2x3.5x1.5x
Conclusion A principled approach Formalized as an Abstract Interpretation A generic theory agnostic to the underlying compositional heap analysis Goto for a hands-on experiencewww.rise4fun.com/seal