On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan November 29,

Slides:



Advertisements
Similar presentations
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Advertisements

1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.
Exploring P4 Trace Cache Features Ed Carpenter Marsha Robinson Jana Wooten.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan.
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
An Case for an Interleaving Constrained Shared-Memory Multi- Processor CS6260 Biao xiong, Srikanth Bala.
Concurrency.
S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder UCSD and Microsoft PLDI 2007.
Continuously Recording Program Execution for Deterministic Replay Debugging.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Yuanyuan ZhouUIUC-CS Architectural Support for Software Bug Detection Yuanyuan (YY) Zhou and Josep Torrellas University of Illinois at Urbana-Champaign.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
RCDC SLIDES README Font Issues – To ensure that the RCDC logo appears correctly on all computers, it is represented with images in this presentation. This.
Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) Umbra: Efficient and Scalable Memory Shadowing CGO 2010, Toronto, Canada April 26, 2010.
Rapid Identification of Architectural Bottlenecks via Precise Event Counting John Demme, Simha Sethumadhavan Columbia University
Rahul Sharma (Stanford) Michael Bauer (NVIDIA Research) Alex Aiken (Stanford) Verification of Producer-Consumer Synchronization in GPU Programs June 15,
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Parallelizing Security Checks on Commodity Hardware E.B. Nightingale, D. Peek, P.M. Chen and J. Flinn U Michigan.
- 1 - Dongyoon Lee †, Mahmoud Said*, Satish Narayanasamy †, Zijiang James Yang*, and Cristiano L. Pereira ‡ University of Michigan, Ann Arbor † Western.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
Parallelizing Security Checks on Commodity Hardware Ed Nightingale Dan Peek, Peter Chen Jason Flinn Microsoft Research University of Michigan.
A Case for Unlimited Watchpoints Joseph L. Greathouse †, Hongyi Xin*, Yixin Luo †‡, Todd Austin † † University of Michigan ‡ Shanghai Jiao Tong University.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 12,
Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,
Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd.
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,
Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.
Testudo: Heavyweight Security Analysis via Statistical Sampling Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Ilya.
Hardware Support for On-Demand Software Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan December 8, 2011.
Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
The Potential of Sampling for Dynamic Analysis Joseph L. GreathouseTodd Austin Advanced Computer Architecture Laboratory University of Michigan PLAS, San.
Demand-Driven Software Race Detection using Hardware Performance Counters Joseph L. Greathouse †, Zhiqiang Ma ‡, Matthew I. Frank ‡ Ramesh Peri ‡, Todd.
Eraser: A dynamic Data Race Detector for Multithreaded Programs Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson Presenter:
Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008.
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
E Virtual Machines Lecture 6 Topics in Virtual Machine Management Scott Devine VMware, Inc.
LASER: Light, Accurate Sharing dEtection and Repair Liang Luo, Akshitha Sriraman, Brooke Fugate, Shiliang Hu, Chris J Newburn, Gilles Pokam, Joseph Devietti.
Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
A Case for Unlimited Watchpoints
On-Demand Dynamic Software Analysis
Presented by Mike Marty
Lazy Preemption to Enable Path-Based Analysis of Interrupt-Driven Code
Tong Zhang, Dongyoon Lee, Changhee Jung
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
PHyTM: Persistent Hybrid Transactional Memory
CSC 591/791 Reliable Software Systems
Olatunji Ruwase* Shimin Chen+ Phillip B. Gibbons+ Todd C. Mowry*
APEx: Automated Inference of Error Specifications for C APIs
Effective Data-Race Detection for the Kernel
runtime verification Brief Overview Grigore Rosu
Hardware Mechanisms for Distributed Dynamic Software Analysis
Introduction to Operating Systems
OS Virtualization.
Jihyun Park, Changsun Park, Byoungju Choi, Gihun Chang
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
On-Demand Dynamic Software Analysis
Sampling Dynamic Dataflow Analyses
Presentation transcript:

On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan November 29, 2011

NIST: SW errors cost U.S. ~$60 billion/year as of 2002 FBI CCS: Security Issues $67 billion/year as of 2005  >⅓ from viruses, network intrusion, etc. Software Errors Abound 2 Cataloged Software Vulnerabilities

In spite of proposed solutions Hardware Plays a Role 3 Hardware Data Race Recording Deterministic Execution/Replay Atomicity Violation Detectors Bug-Free Memory Models Bulk Memory Commits TRANSACTIONAL MEMORY IBM BG/Q AMD ASF ?

Example of a Modern Bug 4 Thread 2 mylen=large Thread 1 mylen=small ptr ∅ Nov OpenSSL Security Flaw if(ptr == NULL) { len=thread_local->mylen; ptr=malloc(len); memcpy(ptr, data, len); }

Example of a Modern Bug 5 if(ptr==NULL) memcpy(ptr, data2, len2) ptr LEAKED TIME Thread 2 mylen=large Thread 1 mylen=small ∅ len2=thread_local->mylen; ptr=malloc(len2); len1=thread_local->mylen; ptr=malloc(len1); memcpy(ptr, data1, len1)

Dynamic Software Analysis Analyze the program as it runs + System state, find errors on any executed path – LARGE runtime overheads, only test one path 6 Developer Instrumented Program In-House Test Machine(s) LONG run time Analysis Results Analysis Instrumentation Program

Taint Analysis (e.g.TaintCheck) Dynamic Bounds Checking Data Race Detection (e.g. Inspector XE) Memory Checking (e.g. MemCheck) Runtime Overheads: How Large? x 10-80x 5-50x 2-300x Symbolic Execution x

Could use Hardware Data Race Detection: HARD, CORD, etc. Taint Analysis: Raksha, FlexiTaint, etc. Bounds Checking: HardBound – None Currently Exist; Bugs Are Here Now – Single-Use Specialization – Won’t be built due to HW, power, verification costs – Unchangeable algorithms locked in HW 8

Goals of this Talk Accelerate SW Analyses Using Existing HW Run Tests On Demand: Only When Needed Explore Future Generic HW Additions 9

Outline Problem Statement Background Information  Demand-Driven Dynamic Dataflow Analysis Proposed Solutions  Demand-Driven Data Race Detection  Unlimited Hardware Watchpoints 10

Dynamic Dataflow Analysis Associate meta-data with program values Propagate/Clear meta-data while executing Check meta-data for safety & correctness Forms dataflows of meta/shadow information 11

a += y z = y * 75 y = x * 1024 w = x + 42 Check w Example: Taint Analysis 12 validate(x) x = read_input() Clear a += y z = y * 75 y = x * 1024 x = read_input() Propagate Associate Input Check a Check z Data Meta-data

Demand-Driven Dataflow Analysis 13 Only Analyze Shadowed Data Native Application Native Application Instrumented Application Instrumented Application Instrumented Application Meta-Data Detection Non- Shadowed Data Shadowed Data No meta-data

Finding Meta-Data 14 No additional overhead when no meta-data  Needs hardware support Take a fault when touching shadowed data Solution: Virtual Memory Watchpoints V→P FAULT

Results by Ho et al. From “Practical Taint-Based Protection using Demand Emulation” 15

Outline Problem Statement Background Information  Demand-Driven Dynamic Dataflow Analysis Proposed Solutions  Demand-Driven Data Race Detection  Unlimited Hardware Watchpoints 16

Software Data Race Detection Add checks around every memory access Find inter-thread sharing events Synchronization between write-shared accesses?  No? Data race. 17

Thread 2 mylen=large Thread 1 mylen=small if(ptr==NULL) len1=thread_local->mylen; ptr=malloc(len1); memcpy(ptr, data1, len1) len2=thread_local->mylen; ptr=malloc(len2); memcpy(ptr, data2, len2) Example of Data Race Detection 18 ptr write-shared? Interleaved Synchronization? TIME

SW Race Detection is Slow 19 PhoenixPARSEC

Inter-thread Sharing is What’s Important “ Data races... are failures in programs that access and update shared data in critical sections” – Netzer & Miller, if(ptr==NULL) len1=thread_local->mylen; ptr=malloc(len1); memcpy(ptr, data1, len1) len2=thread_local->mylen; ptr=malloc(len2); memcpy(ptr, data2, len2) Thread-local data NO SHARING Shared data NO INTER-THREAD SHARING EVENTS TIME

Very Little Inter-Thread Sharing 21 PhoenixPARSEC

Use Demand-Driven Analysis! 22 Multi-threaded Application Multi-threaded Application Software Race Detector Local Access Inter-thread sharing Inter-thread Sharing Monitor

Finding Inter-thread Sharing Virtual Memory Watchpoints? –~100% of accesses cause page faults Granularity Gap Per-process not per-thread Must go through the kernel on faults Syscalls for setting/removing meta-data 23 FAULT Inter-Thread Sharing

Hardware Sharing Detector Hardware Performance Counters Intel’s HITM event: W→R Data Sharing 24 S S M M S S I I HITM Pipeline Cache Perf. Ctrs FAULT Core 1Core PEBS Armed Debug Store EFLAGS EIP RegVals MemInfo Precise Fault

Potential Accuracy & Perf. Problems Limitations of Performance Counters  HITM only finds W→R Data Sharing  Hardware prefetcher events aren’t counted Limitations of Cache Events  SMT sharing can’t be counted  Cache eviction causes missed events  False sharing, etc… PEBS events still go through the kernel 25

Demand-Driven Analysis on Real HW 26 Execute Instruction SW Race Detection Enable Analysis Disable Analysis HITM Interrupt? HITM Interrupt? Sharing Recently? Analysis Enabled? NO YES

Performance Increases 27 PhoenixPARSEC 51x

Demand-Driven Analysis Accuracy 28 1/1 2/4 3/3 4/4 3/3 4/4 2/4 4/4 2/4 Accuracy vs. Continuous Analysis: 97%

Outline Problem Statement Background Information  Demand-Driven Dynamic Dataflow Analysis Proposed Solutions  Demand-Driven Data Race Detection  Unlimited Hardware Watchpoints 29

Watchpoints Globally Useful Byte/Word Accurate and Per-Thread 30

Watchpoint-Based Software Analyses Taint analysis / Dynamic dataflow analysis Data Race Detection Deterministic Execution Canary-Based Bounds Checking Speculative Program Optimization Hybrid Transactional Memory 31

Challenges Some analyses require watchpoint ranges  Better stored as base + length Some need large # of small watchpoints  Better stored as bitmaps Need a large number 32

The Best of Both Worlds Store Watchpoints in Main Memory Cache watchpoints on-chip 33

Demand-Driven Taint Analysis 34

Watchpoint-Based Data Race Detection 35 PhoenixPARSEC

Watchpoint-Based Grace System 36 PhoenixSPEC OMP2001

BACKUP SLIDES 37

Performance Difference 38 PhoenixPARSEC

Accuracy on Real Hardware 39 kmeansfacesimferretfreqminevipsx264streamcluster W→W1/1 (100%) 0/1 (0%) --1/1 (100%) - R→W-0/1 (0%) 2/2 (100%) 1/1 (100%) 3/3 (100%) 1/1 (100%) W→R-2/2 (100%) 1/1 (100%) 2/2 (100%) 1/1 (100%) 3/3/ (100%) 1/1 (100%) Spider Monkey-0 Spider Monkey-1 Spider Monkey-2 NSPR-1Memcached-1Apache-1 W→W9/9 (100%) 1/1 (100%) 3/3 (100%) -1/1 (100%) R→W3/3 (100%) -1/1 (100%) 7/7 (100%) W→R8/8 (100%) 1/1 (100%) 2/2 (100%) 4/4 (100%) -2/2 (100%) kmeansfacesimferretfreqminevipsx264streamcluster W→W1/1 (100%) 0/1 (0%) --1/1 (100%) - R→W-0/1 (0%) 2/2 (100%) 1/1 (100%) 3/3 (100%) 1/1 (100%) W→R-2/2 (100%) 1/1 (100%) 2/2 (100%) 1/1 (100%) 3/3/ (100%) 1/1 (100%)

Width Test 40