On-Demand Dynamic Software Analysis

Slides:



Advertisements
Similar presentations
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
Advertisements

Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.
Exploring P4 Trace Cache Features Ed Carpenter Marsha Robinson Jana Wooten.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
Yuanyuan ZhouUIUC-CS Architectural Support for Software Bug Detection Yuanyuan (YY) Zhou and Josep Torrellas University of Illinois at Urbana-Champaign.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
MemTracker Efficient and Programmable Support for Memory Access Monitoring and Debugging Guru Venkataramani, Brandyn Roemer, Yan Solihin, Milos Prvulovic.
RCDC SLIDES README Font Issues – To ensure that the RCDC logo appears correctly on all computers, it is represented with images in this presentation. This.
Rapid Identification of Architectural Bottlenecks via Precise Event Counting John Demme, Simha Sethumadhavan Columbia University
What is the Cost of Determinism?
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Parallelizing Security Checks on Commodity Hardware E.B. Nightingale, D. Peek, P.M. Chen and J. Flinn U Michigan.
- 1 - Dongyoon Lee †, Mahmoud Said*, Satish Narayanasamy †, Zijiang James Yang*, and Cristiano L. Pereira ‡ University of Michigan, Ann Arbor † Western.
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory Alexander Matveev Nir Shavit MIT.
NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA,SURATHKAL Presentation on ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS Publisher’s:
A Case for Unlimited Watchpoints Joseph L. Greathouse †, Hongyi Xin*, Yixin Luo †‡, Todd Austin † † University of Michigan ‡ Shanghai Jiao Tong University.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 12,
Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,
Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd.
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
An Integrated Framework for Dependable and Revivable Architecture Using Multicore Processors Weidong ShiMotorola Labs Hsien-Hsin “Sean” LeeGeorgia Tech.
Hardware Mechanisms for Distributed Dynamic Software Analysis Joseph L. Greathouse Advisor: Prof. Todd Austin May 10, 2012.
Sound and Precise Analysis of Parallel Programs through Schedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University.
Testudo: Heavyweight Security Analysis via Statistical Sampling Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Ilya.
Hardware Support for On-Demand Software Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan December 8, 2011.
Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan November 29,
The Potential of Sampling for Dynamic Analysis Joseph L. GreathouseTodd Austin Advanced Computer Architecture Laboratory University of Michigan PLAS, San.
Demand-Driven Software Race Detection using Hardware Performance Counters Joseph L. Greathouse †, Zhiqiang Ma ‡, Matthew I. Frank ‡ Ramesh Peri ‡, Todd.
Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
E Virtual Machines Lecture 6 Topics in Virtual Machine Management Scott Devine VMware, Inc.
GangES: Gang Error Simulation for Hardware Resiliency Evaluation Siva Hari 1, Radha Venkatagiri 2, Sarita Adve 2, Helia Naeimi 3 1 NVIDIA Research, 2 University.
LASER: Light, Accurate Sharing dEtection and Repair Liang Luo, Akshitha Sriraman, Brooke Fugate, Shiliang Hu, Chris J Newburn, Gilles Pokam, Joseph Devietti.
Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos.
A Case for Unlimited Watchpoints
On-Demand Dynamic Software Analysis
Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam
Presented by Mike Marty
Lazy Preemption to Enable Path-Based Analysis of Interrupt-Driven Code
Tong Zhang, Dongyoon Lee, Changhee Jung
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
PHyTM: Persistent Hybrid Transactional Memory
CSC 591/791 Reliable Software Systems
Olatunji Ruwase* Shimin Chen+ Phillip B. Gibbons+ Todd C. Mowry*
APEx: Automated Inference of Error Specifications for C APIs
Effective Data-Race Detection for the Kernel
runtime verification Brief Overview Grigore Rosu
Hardware Mechanisms for Distributed Dynamic Software Analysis
Introduction to Operating Systems
Jihyun Park, Changsun Park, Byoungju Choi, Gihun Chang
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
CARP: Compression-Aware Replacement Policies
Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,
Speculative execution and storage
Decomposing Hardware Lock Elision
TEE-Perf A Profiler for Trusted Execution Environments
Sampling Dynamic Dataflow Analyses
Presentation transcript:

On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan This talk was given at Intel Labs in Santa Clara on November 29, 2011. (Thanks to Gilles Pokam). This is basically a mix of my ISCA 2011 talk along with some preliminary slides about my ASPLOS 2012 paper. See those slides for detailed notes of what all the animations mean. November 29, 2011

Software Errors Abound NIST: SW errors cost U.S. ~$60 billion/year as of 2002 FBI CCS: Security Issues $67 billion/year as of 2005 >⅓ from viruses, network intrusion, etc. Cataloged Software Vulnerabilities

Hardware Plays a Role In spite of proposed solutions Bulk Memory Commits Hardware Data Race Recording Deterministic Execution/Replay TRANSACTIONAL MEMORY Bug-Free Memory Models Atomicity Violation Detectors AMD ASF? IBM BG/Q

Nov. 2010 OpenSSL Security Flaw Example of a Modern Bug Thread 1 Thread 2 mylen=small mylen=large if(ptr == NULL) { len=thread_local->mylen; ptr=malloc(len); memcpy(ptr, data, len); } Nov. 2010 OpenSSL Security Flaw ptr ∅

Example of a Modern Bug ptr ∅ TIME Thread 1 Thread 2 if(ptr==NULL) mylen=small mylen=large TIME if(ptr==NULL) if(ptr==NULL) len2=thread_local->mylen; ptr=malloc(len2); len1=thread_local->mylen; ptr=malloc(len1); memcpy(ptr, data1, len1) memcpy(ptr, data2, len2) ptr LEAKED ∅

Dynamic Software Analysis Analyze the program as it runs System state, find errors on any executed path LARGE runtime overheads, only test one path Analysis Instrumentation Developer In-House Test Machine(s) Program Instrumented Program Analysis Results LONG run time

Runtime Overheads: How Large? Data Race Detection (e.g. Inspector XE) Memory Checking (e.g. MemCheck) Taint Analysis (e.g.TaintCheck) Dynamic Bounds Checking 2-300x 2-200x 5-50x 10-80x Symbolic Execution 10-200x

Could use Hardware Data Race Detection: HARD, CORD, etc. Taint Analysis: Raksha, FlexiTaint, etc. Bounds Checking: HardBound None Currently Exist; Bugs Are Here Now Single-Use Specialization Won’t be built due to HW, power, verification costs Unchangeable algorithms locked in HW

Goals of this Talk Accelerate SW Analyses Using Existing HW Run Tests On Demand: Only When Needed Explore Future Generic HW Additions

Outline Problem Statement Background Information Proposed Solutions Demand-Driven Dynamic Dataflow Analysis Proposed Solutions Demand-Driven Data Race Detection Unlimited Hardware Watchpoints

Dynamic Dataflow Analysis Associate meta-data with program values Propagate/Clear meta-data while executing Check meta-data for safety & correctness Forms dataflows of meta/shadow information

Example: Taint Analysis Data Input Meta-data Associate x = read_input() x = read_input() validate(x) Clear Propagate y = x * 1024 y = x * 1024 w = x + 42 Check w Check w a += y z = y * 75 Check z Check z a += y z = y * 75 Check a Check a

Demand-Driven Dataflow Analysis Only Analyze Shadowed Data Instrumented Application Instrumented Application Native Application No meta-data Shadowed Data Non-Shadowed Data Meta-Data Detection

Finding Meta-Data No additional overhead when no meta-data Needs hardware support Take a fault when touching shadowed data Solution: Virtual Memory Watchpoints FAULT V→P V→P

Slowdown (normalized) Results by Ho et al. From “Practical Taint-Based Protection using Demand Emulation” System Slowdown (normalized) Taint Analysis 101.7x On-Demand Taint Analysis 1.98x

Outline Problem Statement Background Information Proposed Solutions Demand-Driven Dynamic Dataflow Analysis Proposed Solutions Demand-Driven Data Race Detection Unlimited Hardware Watchpoints

Software Data Race Detection Add checks around every memory access Find inter-thread sharing events Synchronization between write-shared accesses? No? Data race.

Example of Data Race Detection Thread 1 Thread 2 mylen=small mylen=large TIME if(ptr==NULL) len1=thread_local->mylen; Interleaved Synchronization? ptr write-shared? ptr=malloc(len1); memcpy(ptr, data1, len1) if(ptr==NULL) len2=thread_local->mylen; ptr=malloc(len2); memcpy(ptr, data2, len2)

SW Race Detection is Slow Phoenix PARSEC

Inter-thread Sharing is What’s Important “Data races ... are failures in programs that access and update shared data in critical sections” – Netzer & Miller, 1992 Thread-local data NO SHARING TIME if(ptr==NULL) len1=thread_local->mylen; Shared data NO INTER-THREAD SHARING EVENTS ptr=malloc(len1); memcpy(ptr, data1, len1) if(ptr==NULL) len2=thread_local->mylen; ptr=malloc(len2); memcpy(ptr, data2, len2)

Very Little Inter-Thread Sharing Phoenix PARSEC

Use Demand-Driven Analysis! Software Race Detector Software Race Detector Multi-threaded Application Inter-thread sharing Local Access Inter-thread Sharing Monitor

Finding Inter-thread Sharing Virtual Memory Watchpoints? ~100% of accesses cause page faults Granularity Gap Per-process not per-thread Must go through the kernel on faults Syscalls for setting/removing meta-data Inter-Thread Sharing FAULT FAULT

Hardware Sharing Detector Hardware Performance Counters Intel’s HITM event: W→R Data Sharing Perf. Ctrs PEBS Debug Store Pipeline 2 1 - EFLAGS EIP RegVals MemInfo FAULT -1 - Armed Cache 1 - Precise Fault - Core 1 Core 2 S S HITM M I

Potential Accuracy & Perf. Problems Limitations of Performance Counters HITM only finds W→R Data Sharing Hardware prefetcher events aren’t counted Limitations of Cache Events SMT sharing can’t be counted Cache eviction causes missed events False sharing, etc… PEBS events still go through the kernel

Demand-Driven Analysis on Real HW Execute Instruction NO HITM Interrupt? NO Analysis Enabled? Disable Analysis YES YES NO Sharing Recently? Enable Analysis SW Race Detection YES

Performance Increases Phoenix PARSEC 51x

Demand-Driven Analysis Accuracy 1/1 2/4 2/4 2/4 3/3 4/4 4/4 3/3 4/4 4/4 4/4 Accuracy vs. Continuous Analysis: 97%

Outline Problem Statement Background Information Proposed Solutions Demand-Driven Dynamic Dataflow Analysis Proposed Solutions Demand-Driven Data Race Detection Unlimited Hardware Watchpoints

Watchpoints Globally Useful Byte/Word Accurate and Per-Thread

Watchpoint-Based Software Analyses Taint analysis / Dynamic dataflow analysis Data Race Detection Deterministic Execution Canary-Based Bounds Checking Speculative Program Optimization Hybrid Transactional Memory

Challenges Some analyses require watchpoint ranges Better stored as base + length Some need large # of small watchpoints Better stored as bitmaps Need a large number

The Best of Both Worlds Store Watchpoints in Main Memory Cache watchpoints on-chip

Demand-Driven Taint Analysis

Watchpoint-Based Data Race Detection Phoenix PARSEC

Watchpoint-Based Grace System Phoenix SPEC OMP2001

BACKUP SLIDES

Performance Difference Phoenix PARSEC

Accuracy on Real Hardware kmeans facesim ferret freqmine vips x264 streamcluster W→W 1/1 (100%) 0/1 (0%) - R→W 2/2 (100%) 3/3 (100%) W→R 3/3/ (100%) kmeans facesim ferret freqmine vips x264 streamcluster W→W 1/1 (100%) 0/1 (0%) - R→W 2/2 (100%) 3/3 (100%) W→R 3/3/ (100%) Spider Monkey-0 Spider Monkey-1 Spider Monkey-2 NSPR-1 Memcached-1 Apache-1 W→W 9/9 (100%) 1/1 (100%) 3/3 (100%) - R→W 7/7 (100%) W→R 8/8 (100%) 2/2 (100%) 4/4 (100%)

Width Test In case the projector is clipping.