On-Demand Dynamic Software Analysis

Slides:



Advertisements
Similar presentations
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Advertisements

Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.
Exploring P4 Trace Cache Features Ed Carpenter Marsha Robinson Jana Wooten.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder UCSD and Microsoft PLDI 2007.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Operating System Support Focus on Architecture
Yuanyuan ZhouUIUC-CS Architectural Support for Software Bug Detection Yuanyuan (YY) Zhou and Josep Torrellas University of Illinois at Urbana-Champaign.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
MemTracker Efficient and Programmable Support for Memory Access Monitoring and Debugging Guru Venkataramani, Brandyn Roemer, Yan Solihin, Milos Prvulovic.
MSWAT: Low-Cost Hardware Fault Detection and Diagnosis for Multicore Systems Siva Kumar Sastry Hari, Man-Lap (Alex) Li, Pradeep Ramachandran, Byn Choi,
RCDC SLIDES README Font Issues – To ensure that the RCDC logo appears correctly on all computers, it is represented with images in this presentation. This.
Rapid Identification of Architectural Bottlenecks via Precise Event Counting John Demme, Simha Sethumadhavan Columbia University
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Parallelizing Security Checks on Commodity Hardware E.B. Nightingale, D. Peek, P.M. Chen and J. Flinn U Michigan.
- 1 - Dongyoon Lee †, Mahmoud Said*, Satish Narayanasamy †, Zijiang James Yang*, and Cristiano L. Pereira ‡ University of Michigan, Ann Arbor † Western.
NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA,SURATHKAL Presentation on ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS Publisher’s:
A Case for Unlimited Watchpoints Joseph L. Greathouse †, Hongyi Xin*, Yixin Luo †‡, Todd Austin † † University of Michigan ‡ Shanghai Jiao Tong University.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
Midterm Meeting Pete Bohman, Adam Kunk, Erik Shaw.
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 12,
Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,
Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd.
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
An Integrated Framework for Dependable and Revivable Architecture Using Multicore Processors Weidong ShiMotorola Labs Hsien-Hsin “Sean” LeeGeorgia Tech.
Lecture Topics: 11/24 Sharing Pages Demand Paging (and alternative) Page Replacement –optimal algorithm –implementable algorithms.
Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.
Testudo: Heavyweight Security Analysis via Statistical Sampling Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Ilya.
Hardware Support for On-Demand Software Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan December 8, 2011.
Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan November 29,
The Potential of Sampling for Dynamic Analysis Joseph L. GreathouseTodd Austin Advanced Computer Architecture Laboratory University of Michigan PLAS, San.
Demand-Driven Software Race Detection using Hardware Performance Counters Joseph L. Greathouse †, Zhiqiang Ma ‡, Matthew I. Frank ‡ Ramesh Peri ‡, Todd.
Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008.
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
E Virtual Machines Lecture 6 Topics in Virtual Machine Management Scott Devine VMware, Inc.
GangES: Gang Error Simulation for Hardware Resiliency Evaluation Siva Hari 1, Radha Venkatagiri 2, Sarita Adve 2, Helia Naeimi 3 1 NVIDIA Research, 2 University.
LASER: Light, Accurate Sharing dEtection and Repair Liang Luo, Akshitha Sriraman, Brooke Fugate, Shiliang Hu, Chris J Newburn, Gilles Pokam, Joseph Devietti.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
Optimistic Hybrid Analysis
A Case for Unlimited Watchpoints
Presented by Mike Marty
Lazy Preemption to Enable Path-Based Analysis of Interrupt-Driven Code
Tong Zhang, Dongyoon Lee, Changhee Jung
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
PHyTM: Persistent Hybrid Transactional Memory
CSC 591/791 Reliable Software Systems
Olatunji Ruwase* Shimin Chen+ Phillip B. Gibbons+ Todd C. Mowry*
APEx: Automated Inference of Error Specifications for C APIs
Effective Data-Race Detection for the Kernel
runtime verification Brief Overview Grigore Rosu
Hardware Mechanisms for Distributed Dynamic Software Analysis
Supporting Fault-Tolerance in Streaming Grid Applications
Introduction to Operating Systems
Jihyun Park, Changsun Park, Byoungju Choi, Gihun Chang
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
CARP: Compression-Aware Replacement Policies
Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,
On-Demand Dynamic Software Analysis
Sampling Dynamic Dataflow Analyses
Presentation transcript:

On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan This talk was given for the AMD Tech Topics series on December 12, 2011. (Thanks to Matthew Stringer and Martin Pohlack). This is basically a mix of my ISCA2011 talk along with some very preliminary slides from my ASPLOS 2012 talk. See the ISCA slides for detailed notes of what all the animations mean. December 12, 2011

Software Errors Abound NIST: SW errors cost U.S. ~$60 billion/year as of 2002 FBI CCS: Security Issues $67 billion/year as of 2005 >⅓ from viruses, network intrusion, etc. Cataloged Software Vulnerabilities

Hardware Plays a Role In spite of proposed solutions Bulk Memory Commits Hardware Data Race Recording Deterministic Execution/Replay TRANSACTIONAL MEMORY Bug-Free Memory Models Atomicity Violation Detectors AMD ASF? IBM BG/Q

Nov. 2010 OpenSSL Security Flaw Example of a Modern Bug Thread 1 Thread 2 mylen=small mylen=large Nov. 2010 OpenSSL Security Flaw if(ptr == NULL) { len=thread_local->mylen; ptr=malloc(len); memcpy(ptr, data, len); } ptr ∅

Example of a Modern Bug ptr ∅ TIME Thread 1 Thread 2 if(ptr==NULL) mylen=small mylen=large TIME if(ptr==NULL) if(ptr==NULL) len2=thread_local->mylen; ptr=malloc(len2); len1=thread_local->mylen; ptr=malloc(len1); memcpy(ptr, data1, len1) memcpy(ptr, data2, len2) ptr LEAKED ∅

Dynamic Software Analysis Analyze the program as it runs System state, find errors on any executed path LARGE runtime overheads, only test one path Analysis Instrumentation Developer In-House Test Machine(s) Program Instrumented Program Analysis Results LONG run time

Runtime Overheads: How Large? Data Race Detection (e.g. Inspector XE) Memory Checking (e.g. MemCheck) Taint Analysis (e.g.TaintCheck) Dynamic Bounds Checking 2-300x 2-200x 5-50x 10-80x Symbolic Execution 10-200x

Could use Hardware Data Race Detection: HARD, CORD, etc. Taint Analysis: Raksha, FlexiTaint, etc. Bounds Checking: HardBound None Currently Exist; Bugs Are Here Now Single-Use Specialization Won’t be built due to HW, power, verification costs Unchangeable algorithms locked in HW

Goals of this Talk Accelerate SW Analyses Using Existing HW Run Tests On Demand: Only When Needed Explore Future Generic HW Additions

Outline Problem Statement Background Information Proposed Solutions Demand-Driven Dynamic Dataflow Analysis Proposed Solutions Demand-Driven Data Race Detection Unlimited Hardware Watchpoints

Example Dynamic Dataflow Analysis Input Meta-data Associate x = read_input() x = read_input() validate(x) Clear Propagate y = x * 1024 y = x * 1024 w = x + 42 Check w Check w a += y z = y * 75 Check z Check z a += y z = y * 75 Check a Check a

Demand-Driven Dataflow Analysis Only Analyze Shadowed Data Instrumented Application Instrumented Application Native Application No meta-data Shadowed Data Non-Shadowed Data Meta-Data Detection

Finding Meta-Data No additional overhead when no meta-data Needs hardware support Take a fault when touching shadowed data Solution: Virtual Memory Watchpoints FAULT V→P V→P

Slowdown (normalized) Results by Ho et al. From “Practical Taint-Based Protection using Demand Emulation” System Slowdown (normalized) Taint Analysis 101.7x On-Demand Taint Analysis 1.98x

Outline Problem Statement Background Information Proposed Solutions Demand-Driven Dynamic Dataflow Analysis Proposed Solutions Demand-Driven Data Race Detection Unlimited Hardware Watchpoints

Software Data Race Detection Add checks around every memory access Find inter-thread sharing events Synchronization between write-shared accesses? No? Data race.

Data Race Detection Shared? TIME Synchronized? Thread 1 Thread 2 mylen=small mylen=large TIME if(ptr==NULL) Shared? len1=thread_local->mylen; ptr=malloc(len1); Synchronized? memcpy(ptr, data1, len1) if(ptr==NULL) len2=thread_local->mylen; ptr=malloc(len2); memcpy(ptr, data2, len2)

SW Race Detection is Slow Phoenix PARSEC

Inter-thread Sharing is What’s Important “Data races ... are failures in programs that access and update shared data in critical sections” – Netzer & Miller, 1992 Thread-local data NO SHARING TIME if(ptr==NULL) len1=thread_local->mylen; Shared data NO INTER-THREAD SHARING EVENTS ptr=malloc(len1); memcpy(ptr, data1, len1) if(ptr==NULL) len2=thread_local->mylen; ptr=malloc(len2); memcpy(ptr, data2, len2)

Very Little Inter-Thread Sharing Phoenix PARSEC

Use Demand-Driven Analysis! Software Race Detector Software Race Detector Multi-threaded Application Inter-thread sharing Local Access Inter-thread Sharing Monitor

Finding Inter-thread Sharing Virtual Memory Watchpoints? ~100% of accesses cause page faults Granularity Gap Per-process not per-thread Must go through the kernel on faults Syscalls for setting/removing meta-data Inter-Thread Sharing FAULT FAULT

Hardware Sharing Detector HITM in Cache: W→R Data Sharing Hardware Performance Counters Core 1 Core 2 S S HITM Y=5 Write Y=5 M I Read Y I Perf. Ctrs PEBS Debug Store Pipeline 1 - EFLAGS EIP RegVals MemInfo FAULT -1 Armed - Cache 1 - Precise Fault -

Potential Accuracy & Perf. Problems Limitations of Performance Counters HITM only finds W→R Data Sharing Hardware prefetcher events aren’t counted Limitations of Cache Events SMT sharing can’t be counted Cache eviction causes missed events False sharing, etc… PEBS events still go through the kernel

On-Demand Analysis on Real HW > 97% Execute Instruction NO HITM Interrupt? NO Analysis Enabled? Disable Analysis YES YES NO < 3% Sharing Recently? Enable Analysis SW Race Detection YES

Performance Difference Phoenix PARSEC

Performance Increases Phoenix PARSEC 51x

Demand-Driven Analysis Accuracy 1/1 2/4 2/4 2/4 3/3 4/4 4/4 3/3 4/4 4/4 4/4 Accuracy vs. Continuous Analysis: 97%

Outline Problem Statement Background Information Proposed Solutions Demand-Driven Dynamic Dataflow Analysis Proposed Solutions Demand-Driven Data Race Detection Unlimited Hardware Watchpoints

Watchpoints Globally Useful Byte/Word Accurate and Per-Thread

Watchpoint-Based Software Analyses Taint Analysis Data Race Detection Deterministic Execution Canary-Based Bounds Checking Speculative Program Optimization Hybrid Transactional Memory

Challenges Some analyses require watchpoint ranges Better stored as base + length Some need large # of small watchpoints Better stored as bitmaps Need a large number

The Best of Both Worlds Store Watchpoints in Main Memory Cache watchpoints on-chip

Demand-Driven Taint Analysis

Watchpoint-Based Data Race Detection Phoenix PARSEC

Watchpoint Deterministic Execution Phoenix SPEC OMP2001

BACKUP SLIDES