Mechanizing Program Analysis With Chord Mayur Naik Intel Labs Berkeley.

Slides:



Advertisements
Similar presentations
1 Lecture 5 Towards a Verifying Compiler: Multithreading Wolfram Schulte Microsoft Research Formal Methods 2006 Race Conditions, Locks, Deadlocks, Invariants,
Advertisements

Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Path-Sensitive Analysis for Linear Arithmetic and Uninterpreted Functions SAS 2004 Sumit Gulwani George Necula EECS Department University of California,
Global Value Numbering using Random Interpretation Sumit Gulwani George C. Necula CS Department University of California, Berkeley.
Tom Sugden EPCC OGSA-DAI Future Directions OGSA-DAI User's Forum GridWorld 2006, Washington DC 14 September 2006.
Eclipse TPTP TPTP Heap and Thread Profilers High-level Design Rev 1.0 Asaf Yaffe July, 2006.
MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.
1 Copyright © 2005, Oracle. All rights reserved. Introducing the Java and Oracle Platforms.
6 Copyright © 2005, Oracle. All rights reserved. Building Applications with Oracle JDeveloper 10g.
Vulnerability Evaluation for Securely Offloading Mobile Apps in the Cloud He Zhu, Changcheng Huang and James Yan Department of Systems and Computer Engineering,
Reliable Scripting Using Push Logic Push Logic David Greaves, Daniel Gordon University of Cambridge Computer Laboratory Reliable Scripting.
Database System Concepts and Architecture
Operating System Structures
University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra
Chapter 8 Improving the User Interface
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Pallavi Joshi  Chang-Seo Park  Koushik Sen  Mayur Naik ‡  Par Lab, EECS, UC Berkeley‡
Mantis: Automatic Performance Prediction for Smartphone Applications Byung-Gon Chun Microsoft Yongin Kwon, Sangmin Lee, Hayoon Yi, Donghyun Kwon, Seungjun.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Modified from Silberschatz, Galvin and Gagne ©2009 Lecture 7 Chapter 4: Threads (cont)
Concurrency CS 510: Programming Languages David Walker.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Synchronization in Java Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Address Book in JAVA By What is Address Book Address Book is book or database used for storing entries called contacts Each contact.
Java Security. Topics Intro to the Java Sandbox Language Level Security Run Time Security Evolution of Security Sandbox Models The Security Manager.
Christopher Jeffers August 2012
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
Introduction Overview Static analysis Memory analysis Kernel integrity checking Implementation and evaluation Limitations and future work Conclusions.
An Introduction to Software Architecture
CCS APPS CODE COVERAGE. CCS APPS Code Coverage Definition: –The amount of code within a program that is exercised Uses: –Important for discovering code.
CS 363 Comparative Programming Languages
Magnetic Field Measurement System as Part of a Software Family Jerzy M. Nogiec Joe DiMarco Fermilab.
Introduction to Hadoop and HDFS
Clone-Cloud. Motivation With the increasing use of mobile devices, mobile applications with richer functionalities are becoming ubiquitous But mobile.
1 Chord: An Extensible Program Analysis Framework Using CnC Mayur Naik Intel Labs Berkeley.
Pallavi Joshi* Mayur Naik † Koushik Sen* David Gay ‡ *UC Berkeley † Intel Labs Berkeley ‡ Google Inc.
Mantis: Automatic Performance Prediction for Smartphone Applications Yongin Kwon, Sangmin Lee, Hayoon Yi, Donghyun Kwon, Seungjun Yang, Byung-Gon Chun,
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Copyright © Mohamed Nuzrath Java Programming :: Syllabus & Chapters :: Prepared & Presented By :: Mohamed Nuzrath [ Major In Programming ] NCC Programme.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
Introduction Why are virtual machines interesting?
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
Marcelo R.N. Mendes. What is FINCoS? A Java-based set of tools for data generation, load submission, and performance measurement of event processing systems;
Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)
Data-Centric Systems Lab. A Virtual Cloud Computing Provider for Mobile Devices Gonzalo Huerta-Canepa presenter 김영진.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Computer System Structures
Introduction to threads
Android Mobile Application Development
Muen Policy & Toolchain
Advanced Compiler Design
Multithreaded Programming in Java
runtime verification Brief Overview Grigore Rosu
CMPE419 Mobile Application Development
Modified by H. Schulzrinne 02/15/10 Chapter 4: Threads.
Shared Memory Programming
An Introduction to Software Architecture
Advanced Compiler Design
CMPE419 Mobile Application Development
JIT Compiler Design Maxine Virtual Machine Dhwani Pandya
Outcome of the Lecture Upon completion of this lecture you will be able to understand Fundamentals and Characteristics of Java Language Basic Terminology.
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

Mechanizing Program Analysis With Chord Mayur Naik Intel Labs Berkeley

About Chord … An extensible static/dynamic analysis framework for Java Started in 2006 as static “Checker of Races and Deadlocks” Portable: mostly written in Java, works on Java bytecode independent of OS, JVM, Java version works at least on Linux, MacOS, Windows/Cygwin few dependencies (e.g. not Eclipse-based) Open-source, available at http://code.google.com/p/jchord Primarily used in Intel Labs and academia by researchers in program analysis, systems, and machine learning for applying program analyses to parallel/cloud computing problems for advancing program analyses driven by these applications

Research Using Chord Application to Parallel Computing Application to Cloud Computing static deadlock checker (ICSE’09) M. Naik, C. Park, D. Gay, K. Sen Mantis: estimating performance and resource usage of systems software B. Chun, L. Huang, M. Naik, P. Maniatis static race checker (PLDI’06, POPL’07) M. Naik, A. Aiken, J. Whaley CloneCloud: partitioning and migration of apps between phone and cloud B. Chun, S. Ihm, P. Maniatis, M. Naik static atomic set serializability checker Z. Lai, S. Cheung, M. Naik CheckMate: generalized dynamic deadlock checker (FSE’10) P. Joshi, K. Sen, M. Naik, D. Gay debugging configuration options in systems software (e.g. Hadoop) A. Rabkin, R. Katz Advanced Program Analyses dynamically evaluating precision of static heap abstractions (OOPSLA’10) P. Liang, O. Tripp, M. Naik, M. Sagiv Scalable client-driven static heap analyses (e.g. points-to, thread-escape) M. Naik, M. Sagiv, Z. Anderson, D. Gay

Mantis: Estimating Program Running Time* offline component program input feature instrumentor instrumented program profiler feature values, running time feature schemas program bytecode feature evaluation costs dynamic analysis component running time function over chosen features static program slicer model generator static analysis component final feature evaluator (executable slice) running time function over final features program input estimated running time online component *Joint work with B. Chun, S. Ihm, P. Maniatis (Intel)

⇒ mechanize program analysis Primary Goal of Chord Enable users to productively prototype a broad class of program analyses ⇒ mechanize program analysis

Kinds of Program Analyses in Chord static analysis written imperatively in Java dynamic analysis written imperatively in Java seamlessly integrated! static or dynamic analysis written declaratively in Datalog and solved using BDDs

Static vs. Dynamic Uses of Chord = only static = only dynamic = static + dynamic Application to Parallel Computing Application to Cloud Computing static deadlock checker (ICSE’09) M. Naik, C. Park, D. Gay, K. Sen Mantis: estimating performance and resource usage of systems software B. Chun, L. Huang, M. Naik, P. Maniatis static race checker (PLDI’06, POPL’07) M. Naik, A. Aiken, J. Whaley CloneCloud: partitioning and migration of apps between phone and cloud B. Chun, S. Ihm, P. Maniatis, M. Naik static atomic set serializability checker Z. Lai, S. C. Cheung, M. Naik CheckMate: generalized dynamic deadlock checker (FSE’10) P. Joshi, K. Sen, M. Naik, D. Gay debugging configuration options in systems software (e.g. Hadoop) A. Rabkin, R. Katz Advanced Program Analyses dynamically evaluating precision of static heap abstractions (OOPSLA’10) P. Liang, O. Tripp, M. Naik, M. Sagiv Scalable client-driven static heap analyses (e.g. points-to, thread-escape) M. Naik, M. Sagiv, Z. Anderson, D. Gay

Unusual Uses of Dynamic Analysis Guide choice of approximation aspects of static analysis obtain lower bounds on precision of different approximation aspects by simulating each of them dynamically Optimize static analysis property fails on run ⇒ do not attempt to prove it holds on all runs Guess abstraction to be used by static analysis property holds on run ⇒ generalize reason why it holds to all runs dynamically evaluating precision of static heap abstractions (OOPSLA’10) P. Liang, O. Tripp, M. Naik, M. Sagiv Scalable client-driven static heap analyses (e.g. points-to, thread-escape) M. Naik, M. Sagiv, Z. Anderson, D. Gay

Leveraging Dynamic Analysis for Static Analysis* j Parameterize given sound, precise, but non-scalable whole-program analysis with an abstraction hint Obtain abstraction hint by path- program analysis Obtain path program by running program on some input Simulate analysis instantiated using most precise abstraction hint on path program Group queries having same abstraction hint Use multiple path programs for improved precision and scalability input data Dj for W program execution monitoring path program Pj path-program analysis abstraction A ┴ ┴ counterex. proof k abstraction hint inferrer I whole program W abstraction hint Hk whole-program analysis abstraction Ak program query Qi proof counterex. i Qi ⊢ W Qi ⊬ W *Joint work with M. Sagiv, Z. Anderson, D. Gay

Our Thread-Escape Analysis j input data Dj for W Flow-sensitive, top-down summary- based context-sensitive analysis sound and precise not scalable: O(2^(|H|2.|F|)) contexts/method O(|P|.2^(|H|2.|F|)) abstract heaps Abstraction hint Hk = set of object allocation sites in program W that are relevant to query Qi program execution monitoring path program Pj path-program analysis abstraction A ┴ ┴ counterex. proof k abstraction hint inferrer I whole program W abstraction hint Hk whole-program analysis abstraction Ak program query Qi proof counterex. i Qi ⊢ W Qi ⊬ W

Abstraction Hint for Our Thread-Escape Analysis Hk = { h3, h4 } f3 v3 h3 h4 v4 h5 f1 h1 h2 v1 v2 g at p3: Ak = v1 = new h1 v2 = new h2 v1.f1 = v2 p1: … v2.f2 … g = v1 p2: … v2.f2 … if (*) v3 = new h3 v4 = new h4 v3.f3 = v4 else v4 = new h5 p3: … v4.f4 … v1 = new h v2 = new h v1.f1 = v2 p1: … v2.f2 … g = v1 p2: … v2.f2 … if (*) v3 = new h3 v4 = new h4 v3.f3 = v4 else v4 = new h p3: … v4.f4 … W = f1 g v1 v2 f3 v3 h3 h4 v4 at p3:

Our Thread-Escape Analysis j input data Dj for W Flow-sensitive, top-down summary- based context-sensitive analysis sound and precise not scalable: O(2^(|H|2.|F|)) contexts/method O(|P|.2^(|H|2.|F|)) abstract heaps Abstraction hint Hk = set of object allocation sites in program W that are relevant to query Qi For our benchmarks: average |H| = 2600 average |Hk| = 3.2 our approach is scalable! program execution monitoring path program Pj path-program analysis abstraction A ┴ ┴ counterex. proof k abstraction hint inferrer I whole program W abstraction hint Hk whole-program analysis abstraction Ak program query Qi proof counterex. i Qi ⊢ W Qi ⊬ W

Dynamic Analysis Implementation Space for Java Implement inside a JVM Use JVMTI Instrument bytecode at load-time Instrument bytecode offline (used in Chord) Portability  dependency on specific version of specific JVM  not supported by some JVMs (e.g. Android)  Efficiency   Flexibility  no support for what is doable by bytecode instru.  can only change method bytecode after class loaded Other issues not trivial to modify production JVM event handing code must be written in C/C++ must run program twice to find which classes to instru. bytecode verifier may fail at runtime even using -Xverify:none (except IBM J9 VM)

Architecture of Dynamic Analysis in Chord Analysis writer specifies kinds of events and code to handle them: Analysis writer chooses kind of event handling: enter/leave method m t before/after method call i t o getfield/putfield e t b f o enter quad p t enter/leave/iteration loop w t thread start/join/wait/notify i t o enter basic block b t new/newarray h t o acquire/release lock l t o online, in JVM running instru. program Pro: can inspect state Con: either exclude JDK from instru. or do not use it in event handling code, to avoid correctness and performance issues offline, in separate JVM after JVM running instru. program finishes Con: infeasible for long- running programs generating lots of events since all events stored in a file on disk online, in separate JVM in parallel with JVM running instru. program Best option: uses buffered POSIX pipe to communicate events between event- generating JVM and event-handling JVM

Example Datalog Analysis .include “E.dom” .include “F.dom” .include “T.dom” .bddvarorder E0xE1_T0_T1_F0 field(e:E0, f:F0) input write(e:E0) input reach(t:T0, e:E0) input alias(e1:E0, e2:E1) input escape(e:E0) input unguarded(t1:T0, e1:E0, t2:T1, e2:E1) input hasWrite(e1:E0, e2:E1) candidate(e1:E0, e2:E1) datarace(t1:T0, e1:E0, t2:T1, e2:E1) output hasWrite(e1, e2) :- write(e1). hasWrite(e1, e2) :- write(e2). candidate(e1, e2) :- field(e1,f), field(e2, f), hasWrite(e1, e2), e1 <= e2. datarace(t1, e1, t2, e2) :- candidate(e1, e2), reach(t1, e1), reach(t2, e2), alias(e1, e2), escape(e1), escape(e2), unguarded(t1, e1, t2, e2). program domains BDD variable ordering input, intermediate, output program relations represented as BDDs analysis constraints (Horn Clauses) solved via BDD operations

Pros and Cons of Datalog/BDDs Good for rapidly crafting initial versions of an analysis with focus on false positive/negative rate instead of scalability initial versions tend to have intolerable false positive/negative rate Good for analyses … whose constraint solving strategy is not obvious (e.g. best known alternative is chaotic iteration) involving data with lots of redundancy and large as to be impossible to compute/store/read using Java if represented explicitly (e.g. cloning-based analyses) involving few simple rules (e.g. transitive closure) Bad for analyses … with more complicated formulations (e.g. summary-based analyses) over domains not known exactly in advance (i.e. on-the-fly analyses) involving many interdependent rules (e.g. points-to analyses) Unintuitive effects of BDDs on performance (e.g. smaller non- uniform k values in k-CFA worse than larger uniform k values)

Expressing Analysis Dependencies Using CnC* … Cn step instance ti is “enabled” when tag ti arrives in T get’s block until an item with tag ti arrives in each of C1, …, Cn analysis is performed an item with tag ti is put in each of P1, …, Pm data collections c1i = C1.get(ti); … cni = Cn.get(ti); p1i…pmi = analysis(c1i…cni); P1.put(ti, p1i); … Pm.put(ti, pmi); P1 … Pm T step collection control collection *Joint work with V. Sarkar and Habanero team (Rice U.)

Example Datalog Analysis Using CnC … Cn c1i = C1.get(ti); … cni = Cn.get(ti); p1i…pmi = analysis(c1i…cni); P1.put(ti, p1i); … Pm.put(ti, pmi); .include “D1.dom” .include “D2.dom” R1(d1:D1) input R12(d1:D1, d2:D2) input R2(d2:D2) output R2(d2) :- R1(d1), R12(d1,d2). P1 … Pm T

Example Datalog Analysis Using CnC domain D1 relation R12 domain D2 D1i = D1.get(programi); D2i = D2.get(programi); R1i = R1.get(programi); R12i = R12.get(programi); R2i(d2) :- R1i(d1), R12i(d1, d2). R2.put(programi, R2i); .include “D1.dom” .include “D2.dom” R1(d1:D1) input R12(d1:D1, d2:D2) input R2(d2:D2) output R2(d2) :- R1(d1), R12(d1,d2). relatio n R1 relation R2 program

Seamless Integration of Analyses in Chord example program analysis program quadcode domain D1 analysis relation R12 analysis domain D2 analysis bytecode to quadcode (joeq) domain D1 relation R12 domain D2 program bytecode dynamic analysis relatio n R1 Datalog analysis relation R2 static analysis bytecode instrumentor (javassist) bddbddb BuDDy program inputs Java program CnC/Habanero Java Runtime program source analysis result in HTML analysis result in XML Java2HTML saxon XSLT

Executing an Analysis in Chord starts, blocks on D1 resumes, runs to finish starts, runs to finish CnC/Habanero Java Runtime bytecode to quadcode (joeq) bytecode instrumentor (javassist) saxon XSLT bddbddb BuDDy Java2HTML static analysis Datalog analysis dynamic analysis program bytecode domain D1 relation R12 relatio n R1 domain D2 relation R2 analysis result in XML analysis result in HTML program source program quadcode relation R12 analysis program inputs domain D1 analysis domain D2 analysis example program analysis Java program starts, runs to finish      resumes, runs to finish starts, blocks on D1 starts, blocks on D1, D2, R1, R12 resumes, runs to finish user demands this to run resumes, runs to finish starts, blocks on R2, D2

Benefits of Using CnC in Chord Modularity analyses (steps) are written independently Flexibility analyses can be made to interact in powerful ways with other analyses (by specifying data/control dependencies) Efficiency analyses are executed in demand-driven fashion results computed by each analysis are automatically cached for reuse by other analyses without re-computation independent analyses are automatically executed in parallel Reliability CnC’s “dynamic single assignment” property ensures result is same regardless of order in which analyses are executed

Intended Audience of Chord Researchers prototyping program analysis algorithms analysis specialists Initial focus Current focus Researchers with limited program analysis background prototyping systems having program analysis parts Ultimate goal system builders Users with no background in program analysis using it as a black box programmers

Classification of Chord Uses = only program analysis = program analysis + systems = program analysis + ML Application to Parallel Computing Application to Cloud Computing static deadlock checker (ICSE’09) M. Naik, C. Park, D. Gay, K. Sen Mantis: estimating performance and resource usage of systems software B. Chun, L. Huang, M. Naik, P. Maniatis static race checker (PLDI’06, POPL’07) M. Naik, A. Aiken, J. Whaley CloneCloud: partitioning and migration of apps between phone and cloud B. Chun, S. Ihm, P. Maniatis, M. Naik static atomic set serializability checker Z. Lai, S. Cheung, M. Naik CheckMate: generalized dynamic deadlock checker (FSE’10) P. Joshi, K. Sen, M. Naik, D. Gay debugging configuration options in systems software (e.g. Hadoop) A. Rabkin, R. Katz Advanced Program Analyses dynamically evaluating precision of static heap abstractions (OOPSLA’10) P. Liang, O. Tripp, M. Naik, M. Sagiv Scalable client-driven static heap analyses (e.g. points-to, thread-escape) M. Naik, M. Sagiv, Z. Anderson, D. Gay

Why Cater to Non-Specialists? Gain fresh perspectives for program analysis New program analysis problems e.g. Mantis project: estimating program execution time on given input (in contrast to WCET and asymptotic worst case bounds) New variants of known program analysis problems e.g. Mantis project: new definitions of program slice: executable and approximate (in contrast to debuggable and exact) Others (esp. systems) need program analysis solutions Program analysis needs solutions from others (esp. ML) Experiment for each area: see if its “systematic” solutions are necessary to solve problems in other areas e.g. ML solutions used in program analysis are heuristics

Chord Usage Statistics 3,881 visits came from 961 cities (Oct 1, 2008 – May 18, 2010)

Acknowledgments Intel Labs Berkeley UC Berkeley Tel-Aviv U. Byung-Gon Chun David Gay Ling Huang Petros Maniatis UC Berkeley Koushik Sen Pallavi Joshi Chang-Seo Park Zachary Anderson Percy Liang Ariel Rabkin Tel-Aviv U. Mooly Sagiv Omer Tripp CnC/Habanero team at Rice U. Vivek Sarkar Kath Knobe (Intel) Zoran Budimlic Michael Burke Dragos Sbirlea Alina Simion Sagnak Tasirlar Open-source software in Chord joeq and bddbddb, by John Whaley javassist, by Shigeru Chiba