Continuously Reasoning about Programs

Slides:

Advertisements

Similar presentations

An Abstract Interpretation Framework for Refactoring P. Cousot, NYU, ENS, CNRS, INRIA R. Cousot, ENS, CNRS, INRIA F. Logozzo, M. Barnett, Microsoft Research.

Advertisements

Mining Specifications Glenn Ammons, Dept. Computer Science University of Wisconsin Rastislav Bodik, Computer Science Division University of California,

System Integration Verification and Validation

Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.

Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring Edward J. Schwartz *, JongHyup Lee ✝, Maverick.

A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.

Abhinn Kothari, 2009CS10172 Parth Jaiswal 2009CS10205 Group: 3 Supervisor : Huzur Saran.

David Brumley, Pongsin Poosankam, Dawn Song and Jiang Zheng Presented by Nimrod Partush.

 delivers evidence that a solution developed achieves the purpose for which it was designed.  The purpose of evaluation is to demonstrate the utility,

Using Statically Computed Invariants Inside the Predicate Abstraction and Refinement Loop Himanshu Jain Franjo Ivančić Aarti Gupta Ilya Shlyakhter Chao.

Static Analysis of Embedded C Code John Regehr University of Utah Joint work with Nathan Cooprider.

Software Testing and Quality Assurance

An Integrated Framework for Dependable Revivable Architectures Using Multi-core Processors Weiding Shi, Hsien-Hsin S. Lee, Laura Falk, and Mrinmoy Ghosh.

From Miles to Millimeter: On-chip Communication Networks Alessandro Pinto U.C. Berkeley

Overview of program analysis Mooly Sagiv html://

Software Process and Product Metrics

ADL Slide 1 December 15, 2009 Evidence-Centered Design and Cisco’s Packet Tracer Simulation-Based Assessment Robert J. Mislevy Professor, Measurement &

Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.

A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi.

Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.

XFindBugs: eXtended FindBugs for AspectJ Haihao Shen, Sai Zhang, Jianjun Zhao, Jianhong Fang, Shiyuan Yao Software Theory and Practice Group (STAP) Shanghai.

Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.

The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.

1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.

Dependency Tracking in software systems Presented by: Ashgan Fararooy.

1 On Querying Historical Evolving Graph Sequences Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren,

Change Impact Analysis for AspectJ Programs Sai Zhang, Zhongxian Gu, Yu Lin and Jianjun Zhao Shanghai Jiao Tong University.

Which Configuration Option Should I Change? Sai Zhang, Michael D. Ernst University of Washington Presented by: Kıvanç Muşlu.

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

Automatic Identification of Bug-Introducing Changes. Presenter: Haroon Malik.

Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.

1 A Plethora of Paths Eric Larson May 18, 2009 Seattle University.

Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd.

CPSC 873 John D. McGregor Session 9 Testing Vocabulary.

® IBM Software Group © 2009 IBM Corporation Essentials of Modeling with the IBM Rational Software Architect, V7.5 Module 15: Traceability and Static Analysis.

Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

AXUG Partner Showcase – Introducing Preactor

Software Configuration Management

Software Testing Basics

Security Issues Formalization

John D. McGregor Session 9 Testing Vocabulary

Testing and Debugging PPT By :Dr. R. Mall.

Matching Logic An Alternative to Hoare/Floyd Logic

Combining Logical and Probabilistic Reasoning in Program Analysis

MCTS Guide to Microsoft Windows 7

APEx: Automated Inference of Error Specifications for C APIs

Secure Software Development: Theory and Practice

Online Subpath Profiling

Speaker’s Name, SAP Month 00, 2017

John D. McGregor Session 9 Testing Vocabulary

A Stitch in Time Saves Nine

Supporting Fault-Tolerance in Streaming Grid Applications

Zhen Huang, Mariana D’Angelo, Dhaval Miyani, David Lie

Abstractions for Model Checking SDN Controllers

High Coverage Detection of Input-Related Security Faults

John D. McGregor Session 9 Testing Vocabulary

Program Slicing Baishakhi Ray University of Virginia

CodePeer Update Arnaud Charlet CodePeer Update Arnaud Charlet

Geometry checking tools

AdaCore Technologies for Cyber Security

CodePeer Update Arnaud Charlet CodePeer Update Arnaud Charlet

All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

Regression testing Tor Stållhane.

CSC-682 Advanced Computer Security

Working with the Compute Block

Model Checking and Its Applications

SOFTWARE ENGINEERING INSTITUTE

Sampling Dynamic Dataflow Analyses

Presentation transcript:

Continuously Reasoning about Programs using Differential Bayesian Inference Kihong Heo · Mukund Raghothaman · Xujie Si · Mayur Naik University of Pennsylvania

Traditional Goals of Static Analyses Solutions: Designing better abstractions Developing efficient algorithms Learning bug patterns Accuracy Running time Memory use

Verifying Continuously Evolving Programs Programs continuously being updated: new features, bug fixes Individual commits only touch small portion of entire program Old Version Source code #include <stdio int *y = &x; int x = ⋯; int main() { } *y = ⋯; Alarms 𝐴 old … Analysis Δ Make speech box font bigger “Does this commit introduce any bugs?!” New Version Source code #include <stdio int *y = &x; int x = ⋯; int main() { } *y = ⋯; Analysis Alarms 𝐴 new …

Verifying Continuously Evolving Programs “We only display results for most analyses on changed lines by default; this keeps analysis results relevant to the code review at hand.” ―Sadowski et al., ICSE 2015 “… is the ability to analyze a commit rather than the entire codebase. This functionality can help developers assess the quality and impact of a change.” ―Christakis et al., ASE 2016 “The vast majority of Infer’s impact to this point is attributable to continuous reasoning at diff time.” ―O’Hearn, LICS 2018 “… verification must continue to work with low effort as developers change the code. Neither of these approaches would work for Amazon as s2n is under continuous development.” ―Chudnov et al., CAV 2018

Central Question of this Talk 𝑃 old #include <stdi int main() { int x = ⋯; int *y = &x; *y = ⋯; } Analysis 𝐴 old Alarm(⋯) ⋮ Δ 𝑃 new 𝐴 new Outline: Problem description Differential derivation graph Experimental effectiveness Conclusion Outline: Problem description Differential derivation graph Experimental effectiveness Conclusion How do we prioritize alarms in 𝐴 new by relevance to Δ? TODO: Fix Delta spacing Which alarms are relevant to Δ, and which are not? Furthermore, identifying the shades of grey in between.

Tentative Solution: Syntactic Masking Problem #1: Might miss real bugs! What if we only report “syntactically new” alarms? 𝐴 new ∖ 𝐴 old Problem #1: Might miss real bugs! Problem #2: How to prioritize alarms? Problem #3: How to transfer feedback? 𝑃 old #include <stdi int main() { int x = ⋯; int *y = &x; *y = ⋯; } Analysis 𝐴 old Alarm(⋯) ⋮ Δ 𝑃 new 𝐴 new How do we prioritize alarms in 𝐴 new by relevance to Δ?

Tentative Solution: Syntactic Masking Problem #1: Might miss real bugs! What if we only report “syntactically new” alarms? 𝐴 new ∖ 𝐴 old 𝑃 old #include <stdi int main() { int x = ⋯; int *y = &x; *y = ⋯; } Analysis 𝐴 old Alarm(⋯) ⋮ Δ 𝑃 new 𝐴 new Use diff to identify common syntactic elements O1: int getSize() { O2: int samples = readInt(); O3: int size = min(samples, 529200); O4: int extra = 32; O5: int total = size + extra; O6: return total; O7: } O8: malloc(2 * getSize()); N1: int getSize() { N2: int samples = readInt(); N3: int size = min(samples, 529200); O4: int extra = 32; N6: int total = size + extra; N7: return total; N8: } N9: malloc(2 * getSize()); O1: int getSize() { O2: int samples = readInt(); O3: int size = min(samples, 529200); O4: int extra = 32; O5: int total = size + extra; O6: return total; O7: } O8: malloc(2 * getSize()); N1: int getSize() { N2: int samples = readInt(); N3: int size = min(samples, 529200); N4: int shift = readInt(); N5: int extra = 32 + shift; N6: int total = size + extra; N7: return total; N8: } N9: malloc(2 * getSize()); - + - ✔ Alarm(O8) Alarm(N9) Alarm(O8) Alarm(N9)

Question: Which alarms do we report? Solution: Consider full “alarm provenance” Solution: Prioritize alarms with new derivation trees Common to both versions Unique to new version edge(N2, N3) path(N2, N3) edge(N3, N6) path(N2, N6) edge(N6, N7) path(N2, N7) edge(N7, N9) path(N2, N9) Src(N2) Sink(N9) Alarm(N9) edge(N4, N5) path(N4, N3) edge(N3, N6) path(N4, N6) edge(N6, N7) path(N4, N7) edge(N7, N9) path(N4, N9) Src(N4) Sink(N9) Alarm(N9) - N1: int getSize() { N2: int samples = readInt(); N3: int size = min(samples, 529200); O4: int extra = 32; N6: int total = size + extra; N7: return total; N8: } N9: malloc(2 * getSize()); Alarm(N9) ✔ + N4: int shift = readInt(); N5: int extra = 32 + shift;

Contributions Solution: Consider full “alarm provenance” Solution: Prioritize alarms with new derivation trees Contribution 1: Algorithm which Contribution 1: 1. identifies alarms with new derivation trees, Contribution 1: 2. interactively ranks alarms by relevance, and Contribution 1: 3. enables feedback transfer across versions Contribution 2: Publicly available implementation named Drake

Drake System Architecture Differential Derivation Graph Old Version Source code #include <stdi int main() { int x = ⋯; int *y = &x; } *y = ⋯; Derivation graph Partially labelled alarms ? … ✘ … ✔ … Analysis ✔ … ✘ … Δ Marginal Inference Ranked alarms … New Version Source code #include <stdi int x = ⋯; int main() { int *y = &x; *y = ⋯; } Analysis Derivation graph Bingo (PLDI 2018)

Our Approach Outline: Outline: Problem description 𝑃 old #include <stdi int main() { int x = ⋯; int *y = &x; *y = ⋯; } Analysis 𝐴 old Alarm(⋯) ⋮ 𝑃 new #include <stdi int main() { int x = ⋯; int *y = &x; *y = ⋯; } Analysis Alarm(⋯) ⋮ 𝐴 new Δ Outline: Problem description Differential derivation graph Experimental effectiveness Conclusion Outline: Problem description Differential derivation graph Experimental effectiveness Conclusion How do we prioritize alarms in 𝐴 new by relevance to Δ? Which alarms are relevant to Δ, and which are not? Furthermore, identifying the shades of grey in between.

The Differential Derivation Graph Old Version “Explanations” obtained by instrumenting the analysis Source code #include <stdi int main() { int x = ⋯; int *y = &x; } *y = ⋯; Derivation graph Partially labelled alarms ? … ✘ … ✔ … Analysis ✔ … ✘ … Δ Marginal Inference FC->DG Ranked alarms … New Version Source code #include <stdi int x = ⋯; int main() { int *y = &x; *y = ⋯; } Analysis Derivation graph Bingo (PLDI 2018)

Prioritizing Alarms with Changed Provenance Solution: Consider full “alarm provenance” Solution: Prioritize alarms with new derivation trees edge(N2, N3) path(N2, N3) edge(N3, N6) path(N2, N6) edge(N6, N7) path(N2, N7) edge(N7, N9) path(N2, N9) Src(N2) Sink(N9) Alarm(N9) path(N2, N3) edge(N3, N6) R path(N2, N6) path(N2, N3) edge(N3, N6) path(N2, N6) Inference rule path( 𝑙 1 , 𝑙 3 ) :- path( 𝑙 1 , 𝑙 2 ), edge( 𝑙 2 , 𝑙 3 )

Prioritizing Alarms with Changed Provenance Solution: Consider full “alarm provenance” Solution: Prioritize alarms with new derivation trees path(N2, N3) edge(N3, N6) R path(N2, N6) pathα(N2, N3) pathβ(N2, N3) edgeα(N3, N6) edgeβ(N3, N6) Rαβ Rαα Rββ Rβα Derived using exclusively old means pathα(N2, N6) pathβ(N2, N6) pathβ indicates use of new element Split every conclusion reached by the analysis α: Derivations using only elements common to both β: Derivations using at least one element of new version β: Derivable iff path(N2, N6) admits at least one new tree

Prioritizing Alarms with Changed Provenance Solution: Consider full “alarm provenance” Solution: Prioritize alarms with new derivation trees Marginal Inference Ranked alarms … ✔ … ✘ … Bingo (PLDI 2018) path(N2, N3) edge(N3, N6) R path(N2, N6) Split every conclusion reached by the analysis α: Derivations using only elements common to both β: Derivations using at least one element of new version pathβ indicates use of new element β: Derivable iff path(N2, N6) admits at least one new tree pathα(N2, N3) pathβ(N2, N3) edgeα(N3, N6) edgeβ(N3, N6) pathα(N2, N6) pathβ(N2, N6) Rαα Rαβ Rβα Rββ

Central Question of this Talk 𝑃 old #include <stdi int main() { int x = ⋯; int *y = &x; *y = ⋯; } Analysis 𝐴 old Alarm(⋯) ⋮ 𝑃 new #include <stdi int main() { int x = ⋯; int *y = &x; *y = ⋯; } Analysis Alarm(⋯) ⋮ 𝐴 new Δ Outline: Problem description Differential derivation graph Experimental effectiveness Conclusion Outline: Problem description Differential derivation graph Experimental effectiveness Conclusion How do we prioritize alarms in 𝐴 new by relevance to Δ? Which alarms are relevant to Δ, and which are not? Furthermore, identifying the shades of grey in between.

Experimental Setup Instrumented the Sparrow static analyzer for C programs (Oh et al., PLDI 2012) Integer overflow, buffer overrun, format string Corpus of 10 popular Unix command line programs 13–112 KLOC 26 historical bugs, including 4 CVEs Baseline 1: Syntactic masking, 𝐴 new ∖ 𝐴 old Baseline 2: Batch-mode Bingo (PLDI 2018)

Experimental Effectiveness The Case of grep (V2.18 → V2.19) (V 68 KLOC, 7% changed during version bump Buffer overrun vulnerability introduced (CVE-2015-1345) “Successful attacks will allow attackers to execute arbitrary code …” ―Symantec The Case of grep (V2.18 → V2.19) (V 68 KLOC, 7% changed during version bump Buffer overrun vulnerability introduced (CVE-2015-1345) “Successful attacks will allow attackers to execute arbitrary code …” ―Symantec Alarms reported by the Sparrow static analyzer (Oh et al., PLDI 2012) Batch-mode interactive prioritization Alarms after syntactic masking 𝐴 new ∖ 𝐴 old 78% fewer alarms, but … Real bug missed ( ) ! Drake (this paper): Bug discovered within just 9 rounds of interaction!

Experimental Effectiveness Bingo (Average rounds of interaction to find all bugs) Drake Non-interactive Interactive 85 30 Average number of alarms per program Syntactic alarm masking 563 118 (+ 4 ) Batch Relevance-aware

Our Approach Outline: Outline: Problem description 𝑃 old #include <stdi int main() { int x = ⋯; int *y = &x; *y = ⋯; } Analysis 𝐴 old Alarm(⋯) ⋮ 𝑃 new #include <stdi int main() { int x = ⋯; int *y = &x; *y = ⋯; } Analysis Alarm(⋯) ⋮ 𝐴 new Δ Outline: Problem description Differential derivation graph Experimental effectiveness Conclusion Outline: Problem description Differential derivation graph Experimental effectiveness Conclusion How do we prioritize alarms in 𝐴 new by relevance to Δ? Which alarms are relevant to Δ, and which are not? Furthermore, identifying the shades of grey in between.

Conclusion New challenges while applying static analysis tools to large software Measures beyond accuracy: relevance, severity, … System to prioritize alarms during continuous integration Dramatic effectiveness in reducing alarm inspection burden In the paper: General alarm differencing framework for Datalog Formal descriptions of algorithm and proofs Detailed experimental evaluation