Hongtao Yu Wei Huo ZhaoQing Zhang XiaoBing Feng

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Semantics Static semantics Dynamic semantics attribute grammars
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Programming Languages and Paradigms The C Programming Language.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Graph Coverage for Design Elements 1.  Use of data abstraction and object oriented software has increased importance on modularity and reuse.  Therefore.
Parameterized Object Sensitivity for Points-to Analysis for Java Presented By: - Anand Bahety Dan Bucatanschi.
Interprocedural analysis © Marcelo d’Amorim 2010.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Program analysis Mooly Sagiv html://
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
1 Control Flow Analysis Mooly Sagiv Tel Aviv University Textbook Chapter 3
Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
Run time vs. Compile time
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Intraprocedural Points-to Analysis Flow functions:
Overview of program analysis Mooly Sagiv html://
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Reps Horwitz and Sagiv 95 (RHS) Another approach to context-sensitive interprocedural analysis Express the problem as a graph reachability query Works.
1 Pointers, Dynamic Data, and Reference Types Review on Pointers Reference Variables Dynamic Memory Allocation –The new operator –The delete operator –Dynamic.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
1 Procedural Concept The main program coordinates calls to procedures and hands over appropriate data as parameters.
Procedure Optimizations and Interprocedural Analysis Chapter 15, 19 Mooly Sagiv.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND.
Control Flow Resolution in Dynamic Language Author: Štěpán Šindelář Supervisor: Filip Zavoral, Ph.D.
Fast Points-to Analysis for Languages with Structured Types Michael Jung and Sorin A. Huss Integrated Circuits and Systems Lab. Department of Computer.
Slide: 1 Copyright © AdaCore Subprograms Presented by Quentin Ochem university.adacore.com.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta
Level by Level: Making Flow- and Context-Sensitive Pointer Analysis Scalable for Millions of Lines of Code Hongtao Yu Zhaoqing Zhang Xiaobing Feng Wei.
Data-Flow Analysis (Chapter 8). Outline What is Data-Flow Analysis? Structure of an optimizing compiler An example: Reaching Definitions Basic Concepts:
An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.
ESEC/FSE-99 1 Data-Flow Analysis of Program Fragments Atanas Rountev 1 Barbara G. Ryder 1 William Landi 2 1 Department of Computer Science, Rutgers University.
CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University.
PLC '06 Experience in Testing Compiler Optimizers Using Comparison Checking Masataka Sassa and Daijiro Sudo Dept. of Mathematical and Computing Sciences.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
1 Software Testing & Quality Assurance Lecture 13 Created by: Paulo Alencar Modified by: Frank Xu.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
Inter-procedural analysis
Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology.
Ghent University Veerle Desmet Lieven Eeckhout Koen De Bosschere Using Decision Trees to Improve Program-Based and Profile-Based Static Branch Prediction.
Manuel Fahndrich Jakob Rehof Manuvir Das
Functional Programming
Memory Management Functions
Control Flow Testing Handouts
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 4 Control Flow Testing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compositional Pointer and Escape Analysis for Java Programs
Outline of the Chapter Basic Idea Outline of Control Flow Testing
Interprocedural Analysis Chapter 19
Programmazione I a.a. 2017/2018.
Chapter 9 :: Subroutines and Control Abstraction
Paul Ammann & Jeff Offutt
Chap. 8 :: Subroutines and Control Abstraction
Chap. 8 :: Subroutines and Control Abstraction
Pointers, Dynamic Data, and Reference Types
Chapter 6 Intermediate-Code Generation
Demand-Driven Context-Sensitive Alias Analysis for Java
Data Flow Analysis Compiler Design
Pointer analysis.
Languages and Compilers (SProg og Oversættere)
Hongtao Yu Wei Huo ZhaoQing Zhang XiaoBing Feng
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Pointers, Dynamic Data, and Reference Types
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presentation transcript:

Aggressive Program Analysis Framework for Static Error Checking in Open64 Hongtao Yu Wei Huo ZhaoQing Zhang XiaoBing Feng Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences { htyu, huowei, zqzhang, fxb }@ict.ac.cn

Outline Introduction Framework Field-sensitive pointer analysis Flow- and context-sensitive dataflow analysis Conclusion

Introduction Open64 is a high performance compiler However, existed scalar analysis of Open64 is not precise enough to serve for static error checking. The original interprocedural framework is Flow-insensitive Field-insensitive Context-insensitive for some problems

Improvement Our aim is to improve the interprocedural phase, to gain more precision in analysis To gain flow-sensitivity, we integrate the intraprocedural analysis phase into interprocedural phase To gain context-sensitivity, transfer functions modeling procedure effects are computed for each procedure To gain field-sensitivity, fields are distinguished by <base, offset, size>.

Error checking Several error checking problems can be abstract as a common dataflow problem and be solved using the fix point theory on semi-lattices Uninitialized reference Null pointer deference File I/O behavior

Contributions We have designed and implemented: A flow- and context-sensitive interprocedural framework, under which kinds of error checking are performed. Two efficient field-sensitive pointer analysis. One is a directly implementation of Steensgaard, B. Points-to Analysis by Type Inference of Programs with Structures and Unions. In Proceedings of the Proceedings of the 6th International Conference on Compiler Construction, 1996. The other is an improvement of it

Interprocedural Analyzer Framework IPL summay phase IPA_LINK Interprocedural Analyzer FICI pointer analysis FICS pointer analysis Interprocedural control flow optimization Build Call Graph Construct SSA Form for each procedure FICI = Flow- and Context-insensitive. FICS= Flow-insensitive but Context-sensitive. FSCS= Flow- and FICS and FSCS have not been implemented yet. FSCS pointer analysis Static error checker

Pointer analysis architecture Employs several field-sensitive algorithms that differ in precision and efficiency. Pointer analysis are performed in the increasing order of precision. The first is a field-sensitive unification-based pointer analysis. Each analysis is performed on the base of the former analysis so that we can obtain higher efficiency than performing the analysis separately. Up to now, only the field-sensitive unification-based pointer analysis has been implemented.

Interprocedural control flow optimization Dead Function Elimination (DFE) Delete uninvoked functions Fake Control Flow Elimination (FCFE) Recognizes the program points where control flow must terminate Flow- and context-sensitive problem Based on Gated Single Assignment Form (GSA).

Fake Control Flow Elimination Taken from HyperSAT-1.7 #define error(str) report(__FILE__, __LINE__, str, ERROR) void parseClusterFile(preproc_t *p, char *name) { ….. tmp_fsize = ftell(fp); if (tmp_fsize > INT_MAX) { L1: error("File too large!"); } else { fsize = (size_t)tmp_fsize; } rewind(fp); buf = xmalloc(fsize); void report(char *file, unsigned line, char *str, char sev) { if (sev == ERROR) { … L2: exit(EXIT_UNKNOWN); } else if (sev == WARRNING) { printf(…); } else if (sev == NOTE) { tmp_fsize = ftell(fp); B1 tmp_fsize > INT_MAX B2 N B3 Y fsize = (size_t) tmp_fsize error ("File too large!") B4 rewind(fp); buf = xmalloc(fsize) B5

Fake Control Flow Elimination Taken from HyperSAT-1.7 #define error(str) report(__FILE__, __LINE__, str, ERROR) void parseClusterFile(preproc_t *p, char *name) { ….. tmp_fsize = ftell(fp); if (tmp_fsize > INT_MAX) { L1: error("File too large!"); } else { fsize = (size_t)tmp_fsize; } rewind(fp); buf = xmalloc(fsize); void report(char *file, unsigned line, char *str, char sev) { if (sev == ERROR) { … L2: exit(EXIT_UNKNOWN); } else if (sev == WARRNING) { printf(…); } else if (sev == NOTE) { tmp_fsize = ftell(fp); B1 tmp_fsize > INT_MAX B2 N B3 Y fsize = (size_t) tmp_fsize rewind(fp); buf = xmalloc(fsize) B5 exit

Unification-based pointer analysis We have implemented two efficient field-sensitive pointer analysis. One is a directly implementation of Steensgaard, B. Points-to Analysis by Type Inference of Programs with Structures and Unions. In Proceedings of the Proceedings of the 6th International Conference on Compiler Construction, 1996. The other is an improvement of it

Steensgaard Classification Example KLOC Field Ops Field-insensitive Steensgaard Classification Field-sensitive Classes Max Min Average Time (secs) mcf 0.9 562 7 527 1 80.28 0.02 34 376 16.52 bzip2 2.4 87 2 69 18 43.5 0.03 gzip 3.6 244 4 222 61 0.05 37 72 6.59 parser 7.4 2375 2115 3 131.94 0.13 2003 38.93 0.11 vpr 8.7 1649 23 1408 71.69 0.17 159 955 10.37 0.14 crafty 9.7 830 6 268 30 138.33 0.5 74 11.21 0.22 twolf 15 6457 0.48 80 4495 80.71 0.23 vortex 25 8062 90 7631 89.57 1.17 274 7616 29.42 0.60 gap 28 11480 9 11459 1275.5 1.04 134 11023 85.67

Experiment explanation Example : The benchmark name ; KLOC: The size of the benchmark (line numbers counted by kilo lines); Field OPs: The number of indirect memory access to fields of structural objects; Classes: The number of alias classes of the total Field OPs above; memory access operations in the same alias class are regarded as aliased Max: the maximal number of Field OPs in the same alias class; Min: the minimal number of Field OPs in the same alias class; Average: the average number of Field OPs in the same alias class Time: the time for analyzing the benchmark;

Example KLOC Field Ops Field-sensitive Steensgaard Classification Aggressively Field-sensitive Classification Classes Max Min Average Time (secs) mcf 0.9 562 34 376 1 16.52 0.02 69 39 8.14 bzip2 2.4 87 2 18 43.5 0.03 7 26 6 12.42 gzip 3.6 244 37 72 6.59 0.05 44 49 5.54 parser 7.4 2375 61 2003 38.93 0.11 88 1989 26.98 0.13 The main idea of improvement is to consider memory layout in high-level analysis in order to precisely distinguish fields of structure objects.

Field-sensitive Steensgaard Classification τ1 τ7 τ2 τ3 τ4 τ5 τ6 τ8 τ9 τ10 τ11 int i1, *i2, **i3, **i4; float f1, **f2; struct { int a, *b, *c; } s1, *s2; struct { int d, *e; float f, *g } s3, *s4; s2 = &s1; s4 = &s3; f2 = &s4->g; *f2 = &f1; i3 = &s2->b; i4 = &s2->c; *i4 = &i1; i2 = (int*) s2; i2 = (int*) s4; i2: τ1 s1.a: τ8 s2: τ2 s3.d: τ8 s4: τ3 s1.b: τ9 i3: τ4 s3.e: τ9 i4: τ5 s1.c: τ10 f2: τ6 s3.f: τ10 s1: τ7 s3.g: τ10 s3: τ7 i1: τ11 f1: τ11

Aggressive Field-sensitive Classification i2: τ1 s1.a: τ8 s2: τ2 s3.d: τ8 s4: τ3 s1.b: τ9 i3: τ4 s3.e: τ9 i4: τ5 s1.c: τ10 f2: τ6 s3.f: τ11 s1: τ7 s3.g: τ11 s3: τ7 i1: τ12 f1: τ13 τ1 τ7 τ2 τ3 τ4 τ5 τ6 τ8 τ9 τ10 τ11 τ12 τ13

(b) The improved type hierarchy Another Improvement Improving the type hierarchy object simple struct blank object simple struct blank (a) Original Type hierarchy (b) The improved type hierarchy We make a change for the type hierarchy and corresponding change in type system that enables the result of joining simple type and struct type is another struct type with only fields possibly overlapped with the scalar joined.

Flow- and context-sensitive dataflow analysis A transfer function evaluator Computes transfer functions for each procedure Traverse the procedure call graph in a bottom-up order, from callees to callers. To handle recursions, we reduced the call graph to a SCC-DAG in each SCC. A dataflow value propagator A data flow value is an element of the semi-lattice Propagates dataflow value from the entry to the exit of each procedure’s local CFG in a top-down manner Traverse the procedure call graph in a top-down manner

Transfer functions A single transfer function has the form x1…xn is a list of formal-in parameters, either a declared formal parameter or a location whose value at the procedure entry may be accessed by the procedure or the procedures it invokes y is a formal-out parameter, include not only the return value of this procedure but also all the locations whose value at the procedure exit may be accessed out of the procedure. The function body of f gives the mapping relations between inputs x1…xn and the output y.

Dataflow value propagator At a procedure entry Perform the “meet” operation on the data flow values from difference call sites. At each callsite Propagate value of each actual-in parameter to the corresponding formal-in parameter of the callee Obtain the value of each formal-out parameter by applying the transfer function and propagate it to the corresponding actual-out parameter. Need to perform iterations for loops and recursions

Checking uninitialized reference Abstract the task as solving a dataflow problem Any memory object in any reference site has a dataflow value “define”, “may define” or “undefine”. To determine which values they have. A memory object is initialized on all paths from the program entry to the current reference site, the memory object in this site has the value “define”. If on some path the memory object is not initialized, the vaule will be “undefine”. The initialization through indirect memory operations (e.g. the deference of pointers) results in a value “may define”.

Checking uninitialized reference (2) First compute a transfer function for each procedure, the transfer function says that which global variables are modified and how to be modified by this procedure. If on every path in the procedure form entry to exit a global variable is modified by direct assignments, the variable is “must mod”. if in every path variable is modified by either direct or indirect assignments, the variable is “may mod”. Otherwise the variable must not be modified on some of the paths, so it is “may not mod” The transfer function evaluator performs an interprocedural modified side effect analysis to the whole program.

Checking uninitialized reference (3) Then propagates the modification property at procedure entry to the exit. The “must mod” and “may mod” value of a transfer function are both regarded as “define” The “may not mod” value is regarded as “undefine” Only propagate for scalar variables that are either local or global currently.

Example Kloc Time (sec) Reports Bugs FP Rate mcf 0.9 0.01 3 0% bzip2 2.4 0.05 bftpd-2.3 2.8 0.08 2 1 50% gzip 3.6 0.11 HyperSAT-1.7 6 100% parser 7.4 0.7 TOTAL 23.1 1.06 7 5 28.6% Time: the time for checking the benchmark; Reports: the total number of warnings produced on the benchmark; Bugs: the number of true bugs found; FP Rate: the false positive rate; Compare with GCC with the warning option –Wuninitialized. No warnings are reported by GCC, because GCC can only check uninitialized reference for auto variables intraprocedurally.

Conclusion We have introduced our work of constructing an aggressive framework for program analysis in order to do error checking in Open64. We Integrates intraprocedural analysis into interprocedural phase in order to do flow- and context-sensitive whole program analysis. Improves the original alias analysis to be field-sensitive and compared the three unification-based methods. An error checking instance, checking uninitialized reference is displayed.

Thank You ! www.themegallery.com