On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 12,

Slides:



Advertisements
Similar presentations
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Advertisements

Exploring P4 Trace Cache Features Ed Carpenter Marsha Robinson Jana Wooten.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Concurrency.
S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder UCSD and Microsoft PLDI 2007.
Continuously Recording Program Execution for Deterministic Replay Debugging.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
What Great Research ?s Can RAMP Help Answer? What Are RAMP’s Grand Challenges ?
Yuanyuan ZhouUIUC-CS Architectural Support for Software Bug Detection Yuanyuan (YY) Zhou and Josep Torrellas University of Illinois at Urbana-Champaign.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
1 RAKSHA: A FLEXIBLE ARCHITECTURE FOR SOFTWARE SECURITY Computer Systems Laboratory Stanford University Hari Kannan, Michael Dalton, Christos Kozyrakis.
MSWAT: Low-Cost Hardware Fault Detection and Diagnosis for Multicore Systems Siva Kumar Sastry Hari, Man-Lap (Alex) Li, Pradeep Ramachandran, Byn Choi,
RCDC SLIDES README Font Issues – To ensure that the RCDC logo appears correctly on all computers, it is represented with images in this presentation. This.
Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) Umbra: Efficient and Scalable Memory Shadowing CGO 2010, Toronto, Canada April 26, 2010.
Rapid Identification of Architectural Bottlenecks via Precise Event Counting John Demme, Simha Sethumadhavan Columbia University
Rahul Sharma (Stanford) Michael Bauer (NVIDIA Research) Alex Aiken (Stanford) Verification of Producer-Consumer Synchronization in GPU Programs June 15,
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Parallelizing Security Checks on Commodity Hardware E.B. Nightingale, D. Peek, P.M. Chen and J. Flinn U Michigan.
- 1 - Dongyoon Lee †, Mahmoud Said*, Satish Narayanasamy †, Zijiang James Yang*, and Cristiano L. Pereira ‡ University of Michigan, Ann Arbor † Western.
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.
Parallelizing Security Checks on Commodity Hardware Ed Nightingale Dan Peek, Peter Chen Jason Flinn Microsoft Research University of Michigan.
A Case for Unlimited Watchpoints Joseph L. Greathouse †, Hongyi Xin*, Yixin Luo †‡, Todd Austin † † University of Michigan ‡ Shanghai Jiao Tong University.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,
Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd.
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,
Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.
Testudo: Heavyweight Security Analysis via Statistical Sampling Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Ilya.
Hardware Support for On-Demand Software Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan December 8, 2011.
Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan November 29,
The Potential of Sampling for Dynamic Analysis Joseph L. GreathouseTodd Austin Advanced Computer Architecture Laboratory University of Michigan PLAS, San.
Demand-Driven Software Race Detection using Hardware Performance Counters Joseph L. Greathouse †, Zhiqiang Ma ‡, Matthew I. Frank ‡ Ramesh Peri ‡, Todd.
Eraser: A dynamic Data Race Detector for Multithreaded Programs Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson Presenter:
Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008.
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
E Virtual Machines Lecture 6 Topics in Virtual Machine Management Scott Devine VMware, Inc.
LASER: Light, Accurate Sharing dEtection and Repair Liang Luo, Akshitha Sriraman, Brooke Fugate, Shiliang Hu, Chris J Newburn, Gilles Pokam, Joseph Devietti.
Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
A Case for Unlimited Watchpoints
On-Demand Dynamic Software Analysis
Presented by Mike Marty
Lazy Preemption to Enable Path-Based Analysis of Interrupt-Driven Code
Tong Zhang, Dongyoon Lee, Changhee Jung
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
MSWAT: Hardware Fault Detection and Diagnosis for Multicore Systems
PHyTM: Persistent Hybrid Transactional Memory
CSC 591/791 Reliable Software Systems
Olatunji Ruwase* Shimin Chen+ Phillip B. Gibbons+ Todd C. Mowry*
Effective Data-Race Detection for the Kernel
runtime verification Brief Overview Grigore Rosu
Hardware Mechanisms for Distributed Dynamic Software Analysis
Introduction to Operating Systems
OS Virtualization.
Jihyun Park, Changsun Park, Byoungju Choi, Gihun Chang
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Major Topics in Operating Systems
On-Demand Dynamic Software Analysis
Sampling Dynamic Dataflow Analyses
Presentation transcript:

On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 12, 2011

NIST: SW errors cost U.S. ~$60 billion/year as of 2002 FBI CCS: Security Issues $67 billion/year as of 2005  >⅓ from viruses, network intrusion, etc. Software Errors Abound 2 Cataloged Software Vulnerabilities

In spite of proposed solutions Hardware Plays a Role 3 Hardware Data Race Recording Deterministic Execution/Replay Atomicity Violation Detectors Bug-Free Memory Models Bulk Memory Commits TRANSACTIONAL MEMORY IBM BG/Q AMD ASF ?

Example of a Modern Bug 4 Thread 2 mylen=large Thread 1 mylen=small ptr ∅ Nov OpenSSL Security Flaw if(ptr == NULL) { len=thread_local->mylen; ptr=malloc(len); memcpy(ptr, data, len); }

Example of a Modern Bug 5 if(ptr==NULL) memcpy(ptr, data2, len2) ptr LEAKED TIME Thread 2 mylen=large Thread 1 mylen=small ∅ len2=thread_local->mylen; ptr=malloc(len2); len1=thread_local->mylen; ptr=malloc(len1); memcpy(ptr, data1, len1)

Dynamic Software Analysis Analyze the program as it runs + System state, find errors on any executed path – LARGE runtime overheads, only test one path 6 Developer Instrumented Program In-House Test Machine(s) LONG run time Analysis Results Analysis Instrumentation Program

Taint Analysis (e.g.TaintCheck) Dynamic Bounds Checking Data Race Detection (e.g. Inspector XE) Memory Checking (e.g. MemCheck) Runtime Overheads: How Large? x 10-80x 5-50x 2-300x Symbolic Execution x

Could use Hardware Data Race Detection: HARD, CORD, etc. Taint Analysis: Raksha, FlexiTaint, etc. Bounds Checking: HardBound – None Currently Exist; Bugs Are Here Now – Single-Use Specialization – Won’t be built due to HW, power, verification costs – Unchangeable algorithms locked in HW 8

Goals of this Talk Accelerate SW Analyses Using Existing HW Run Tests On Demand: Only When Needed Explore Future Generic HW Additions 9

Outline Problem Statement Background Information  Demand-Driven Dynamic Dataflow Analysis Proposed Solutions  Demand-Driven Data Race Detection  Unlimited Hardware Watchpoints 10

a += y z = y * 75 y = x * 1024 w = x + 42 Check w Example Dynamic Dataflow Analysis 11 validate(x) x = read_input() Clear a += y z = y * 75 y = x * 1024 x = read_input() Propagate Associate Input Check a Check z Data Meta-data

Demand-Driven Dataflow Analysis 12 Only Analyze Shadowed Data Native Application Native Application Instrumented Application Instrumented Application Instrumented Application Meta-Data Detection Non- Shadowed Data Shadowed Data No meta-data

Finding Meta-Data 13 No additional overhead when no meta-data  Needs hardware support Take a fault when touching shadowed data Solution: Virtual Memory Watchpoints V→P FAULT

Results by Ho et al. From “Practical Taint-Based Protection using Demand Emulation” 14

Outline Problem Statement Background Information  Demand-Driven Dynamic Dataflow Analysis Proposed Solutions  Demand-Driven Data Race Detection  Unlimited Hardware Watchpoints 15

Software Data Race Detection Add checks around every memory access Find inter-thread sharing events Synchronization between write-shared accesses?  No? Data race. 16

Thread 2 mylen=large Thread 1 mylen=small if(ptr==NULL) len1=thread_local->mylen; ptr=malloc(len1); memcpy(ptr, data1, len1) len2=thread_local->mylen; ptr=malloc(len2); memcpy(ptr, data2, len2) Data Race Detection 17 Shared? Synchronized? TIME

SW Race Detection is Slow 18 PhoenixPARSEC

Inter-thread Sharing is What’s Important “ Data races... are failures in programs that access and update shared data in critical sections” – Netzer & Miller, if(ptr==NULL) len1=thread_local->mylen; ptr=malloc(len1); memcpy(ptr, data1, len1) len2=thread_local->mylen; ptr=malloc(len2); memcpy(ptr, data2, len2) Thread-local data NO SHARING Shared data NO INTER-THREAD SHARING EVENTS TIME

Very Little Inter-Thread Sharing 20 PhoenixPARSEC

Use Demand-Driven Analysis! 21 Multi-threaded Application Multi-threaded Application Software Race Detector Local Access Inter-thread sharing Inter-thread Sharing Monitor

Finding Inter-thread Sharing Virtual Memory Watchpoints? –~100% of accesses cause page faults Granularity Gap Per-process not per-thread Must go through the kernel on faults Syscalls for setting/removing meta-data 22 FAULT Inter-Thread Sharing

Hardware Sharing Detector HITM in Cache: W→R Data Sharing Hardware Performance Counters 23 Read Y S S I I S S I I HITM Core 1Core 2 Write Y=5 M M Y=5 Pipeline Cache Perf. Ctrs FAULT PEBS Armed Debug Store EFLAGS EIP RegVals MemInfo Precise Fault

Potential Accuracy & Perf. Problems Limitations of Performance Counters  HITM only finds W→R Data Sharing  Hardware prefetcher events aren’t counted Limitations of Cache Events  SMT sharing can’t be counted  Cache eviction causes missed events  False sharing, etc… PEBS events still go through the kernel 24

On-Demand Analysis on Real HW 25 Execute Instruction SW Race Detection Enable Analysis Disable Analysis HITM Interrupt? HITM Interrupt? Sharing Recently? Analysis Enabled? NO YES > 97% < 3%

Performance Difference 26 PhoenixPARSEC

Performance Increases 27 PhoenixPARSEC 51x

Demand-Driven Analysis Accuracy 28 1/1 2/4 3/3 4/4 3/3 4/4 2/4 4/4 2/4 Accuracy vs. Continuous Analysis: 97%

Outline Problem Statement Background Information  Demand-Driven Dynamic Dataflow Analysis Proposed Solutions  Demand-Driven Data Race Detection  Unlimited Hardware Watchpoints 29

Watchpoints Globally Useful Byte/Word Accurate and Per-Thread 30

Watchpoint-Based Software Analyses Taint Analysis Data Race Detection Deterministic Execution Canary-Based Bounds Checking Speculative Program Optimization Hybrid Transactional Memory 31

Challenges Some analyses require watchpoint ranges  Better stored as base + length Some need large # of small watchpoints  Better stored as bitmaps Need a large number 32

The Best of Both Worlds Store Watchpoints in Main Memory Cache watchpoints on-chip 33

Demand-Driven Taint Analysis 34

Watchpoint-Based Data Race Detection 35 PhoenixPARSEC

Watchpoint Deterministic Execution 36 PhoenixSPEC OMP2001

BACKUP SLIDES 37