Parallelizing Dynamic Information Flow Tracking

Slides:



Advertisements
Similar presentations
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.
Advertisements

UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.
Compiler Optimized Dynamic Taint Analysis James Kasten Alex Crowell.
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture Seongbeom Kim, Dhruba Chandra, and Yan Solihin Dept. of Electrical and Computer.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Optimizing single thread performance Dependence Loop transformations.
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim,
Butterfly Analysis 1  Michelle Goodstein Butterfly Analysis: Adapting Dataflow Analysis to Dynamic Parallel Monitoring Michelle L. Goodstein*, Evangelos.
Understanding a Problem in Multicore and How to Solve It
Practical and Efficient Information Flow Tracking Using Speculative Hardware Haibo Chen, Xi Wu, Liwei Yuan, Binyu Zang Fudan Univ. Pen-chung Yew Univ.
Chrysalis Analysis: Incorporating Synchronization Arcs in Dataflow-Analysis-Based Parallel Monitoring Michelle Goodstein*, Shimin Chen †, Phillip B. Gibbons.
1 Multi-Core Systems CORE 0CORE 1CORE 2CORE 3 L2 CACHE L2 CACHE L2 CACHE L2 CACHE DRAM MEMORY CONTROLLER DRAM Bank 0 DRAM Bank 1 DRAM Bank 2 DRAM Bank.
TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.
@ Carnegie Mellon Databases Improving Hash Join Performance Through Prefetching Shimin Chen Phillip B. Gibbons Todd C. Mowry Anastassia Ailamaki ‡ Carnegie.
University of Michigan Electrical Engineering and Computer Science 1 Parallelizing Sequential Applications on Commodity Hardware Using a Low-Cost Software.
Dynamic Program Security Aaron Roth Ali Sinop Gunhee Kim Hyeontaek Lim.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Vigilante: End-to-End Containment of Internet Worms M. Costa et al. (MSR) SOSP 2005 Shimin Chen LBA Reading Group.
Application of Instruction Analysis/Synthesis Tools to x86’s Functional Unit Allocation Ing-Jer Huang and Ping-Huei Xie Institute of Computer & Information.
Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
MemTracker Efficient and Programmable Support for Memory Access Monitoring and Debugging Guru Venkataramani, Brandyn Roemer, Yan Solihin, Milos Prvulovic.
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Session 2B – 10:45 AM Vivek Seshadri Gennady Pekhimenko, Olatunji.
1 RAKSHA: A FLEXIBLE ARCHITECTURE FOR SOFTWARE SECURITY Computer Systems Laboratory Stanford University Hari Kannan, Michael Dalton, Christos Kozyrakis.
The Dirty-Block Index Vivek Seshadri Abhishek Bhowmick ∙ Onur Mutlu Phillip B. Gibbons ∙ Michael A. Kozuch ∙ Todd C. Mowry.
Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches Gennady Pekhimenko Vivek Seshadri Onur Mutlu, Todd C. Mowry Phillip B.
Cut-And-Stitch: Efficient Parallel Learning of Linear Dynamical Systems on SMPs Lei Li Computer Science Department School of Computer Science Carnegie.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.
Parallelizing Security Checks on Commodity Hardware E.B. Nightingale, D. Peek, P.M. Chen and J. Flinn U Michigan.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Pre-Silicon Simulation of Multi-Core Benchmarks Shubu Mukherjee Principal Engineer Director, SPEARS Group Intel Corporation Panel in Symposium on Workload.
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu,
Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†
Parallelizing Security Checks on Commodity Hardware Ed Nightingale Dan Peek, Peter Chen Jason Flinn Microsoft Research University of Michigan.
DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University.
A Case for Unlimited Watchpoints Joseph L. Greathouse †, Hongyi Xin*, Yixin Luo †‡, Todd Austin † † University of Michigan ‡ Shanghai Jiao Tong University.
Copyright, 1996 © Dale Carnegie & Associates, Inc. Life with Hardware Threads to Burn Todd C. Mowry Intel Research Pittsburgh & Carnegie Mellon University.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
Computer Architecture Lab at Evangelos Vlachos, Michelle L. Goodstein, Michael A. Kozuch, Shimin Chen, Phillip B. Gibbons, Babak Falsafi and Todd C. Mowry.
Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Vivek Seshadri Thomas Mullins, Amirali Boroumand,
Carnegie Mellon Log Based Dynamic Binary Analysis for Detecting Device Driver Defects Olatunji Ruwase Thesis Proposal Thesis Committee: Todd C. Mowry (Chair)
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
The Potential of Sampling for Dynamic Analysis Joseph L. GreathouseTodd Austin Advanced Computer Architecture Laboratory University of Michigan PLAS, San.
IMPROVING THE PREFETCHING PERFORMANCE THROUGH CODE REGION PROFILING Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture Department UPC.
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008.
Michael Ernst, page 1 Application Communities: Next steps MIT & Determina October 2006.
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SYSTEM ARCHITECTURE GROUP DEPARTMENT OF COMPUTER.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Mihai Burcea, J. Gregory Steffan, Cristiana Amza
On-Demand Dynamic Software Analysis
Simone Campanoni A research CAT Simone Campanoni
Selective Code Compression Scheme for Embedded System
Multiscalar Processors
Olatunji Ruwase* Shimin Chen+ Phillip B. Gibbons+ Todd C. Mowry*
Parallelized JUNO simulation software based on SNiPER
Flow Path Model of Superscalars
Linchuan Chen, Xin Huo and Gagan Agrawal
1Intel Research Pittsburgh 2CMU 3EPFL 4UT Austin
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Jason Flinn Michael Chow, David Devecsery, Xianzheng Dou, Andrew Quinn
On-Demand Dynamic Software Analysis
Prof. Onur Mutlu Carnegie Mellon University
Presentation transcript:

Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase* Phillip B. Gibbons† Todd C. Mowry* Vijaya Ramachandran§ Shimin Chen† Michael Kozuch† Michael Ryan† Carnegie Mellon University * Intel Research Pittsburgh † University of Texas at Austin §

Lifeguards: Pros and Cons + Monitors a running program in order to detect bugs & security attacks E.g., detect any accesses to unallocated memory program lifeguard 1 2 3 4 1 2 3 4 1 2 1 2 3 4 3 4 Slows down the program 3X to 30X program slowdown Parallelize lifeguards to make them faster Can run lifeguard on separate core SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

TaintCheck : A Dynamic Information Flow Tracking Lifeguard = = + JMP = …………… JMP My My PACKET PACKET Mx R1 DIFT Parallelism Challenge: Embarrassingly sequential lifeguards R2 R1 R1 My Mx My R1 R1 Catch security bugs [NewSome et al NDSS ‘05] TAINTED/UNTAINTED Propagation of taint status Memcheck [Nethercote et al PLDI’05] memory bugs SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

A Parallel DIFT Algorithm -------- --------------- ------- -------- -------- -------- -------- -------- -------- -------- n -------- Asymptotic Linear Speedup -------- Symbolic Inheritance Tracking Ο(n/p) Inheritance Resolution Ο(n/p) SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

Symbolic Inheritance Tracking Mx = R1 Mx = My R1 = R2 R1 R2 = R3 R2 R3 segment j + 1 segment j - 1 segment j Collapsed propagation chain SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

Inheritance Resolution Mx My R2 R1 R3 R2 segment j - 1 segment j segment j + 1 Resolve segments in sequential order Locations within segment are resolved in parallel SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

Symbolic Inheritance Tracking (Harder Case) = Mx R1 R1 R2 = + R2 R2 R1 = My R1 My = R1 My Mx My JMP ? R1 segment j + 1 segment j - 1 segment j Unary propagation [Costa et al SOSP ‘05] SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

Inheritance Resolution (Harder Case) Mx R2 My R1 R1 My My JMP ? segment j - 1 segment j segment j + 1 Detect security attack SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

Implementation : Parallel TaintCheck Algorithm Implementation -------- -------- --------------- ------- -------- -------- -------- -------- -------- -------- -------- -------- Parallel workers Master -------- -------- -------- -------- -------- -------- -------- -------- Speedup achieved because inheritance information is smaller than code segment SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

Achieving speedups with few workers Constant Factors sequential 2 workers Taint propagation 1 Inheritance Tracking 2 Inheritance Resolution -------- ----------- ------- .5T T 1.5T -------- ------- -------- --------- time Inheritance info ~ ½ segment -------- --------- ------- Require up to 4 workers to match sequential performance -------- --------- ------- SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

Hybrid Parallelism sequential 2 workers 1 worker -------- ----------- ------- .5T .75T T 1.5T -------- ------- -------- --------- ------- -------- --------- ------- -------- ------- --------- -------- --------- ------- --------- -------- ------- --------- -------- ------- -------- --------- Use inheritance tracking as accelerator for taint propagation Achieves speedup even with 1 worker SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

Evaluation Log Transport Decompress Compress capture Operating System (e.g. L2 cache) Core 1 Core 2 Decompress Compress capture dispatch Operating System Application Lifeguard Log Based Architectures [Chen et al ISCA ’08] Simics simulation 16 core 64K execution window 10 SPEC 2000 integer benchmarks SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

Slowdown Improvement using Pure Parallelism Number of Workers 0 workers = Sequential SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

gcc slowdown with few workers SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

Related Work Sequential DIFT: [Suh et al ASPLOS 04, Costa et al SOSP ’05, Newsome et al NDSS ’05, Nethercote et al PLDI ’07, Dalton et al ISCA ’07, Venkataramani et al HPCA ‘08] Parallel DIFT : Speck [Nightingale et al ASPLOS ’08] Parallel taint analysis lifeguard on commodity CMPs Parallel compression of code segments Sequential analysis of compressed segments Cannot achieve linear speedup (unary propagation not considered) Video decoder slowdown reduced from 18X to 9X using 9 lifeguard threads. SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase

Parallel DIFT algorithm Conclusion Parallel DIFT algorithm Symbolic Inheritance Tracking Unary propagation Asymptotic Linear speedup Parallel TaintCheck Lifeguard Program slowdown reduced from 3X – 5X to 1.2X – 3X with 8 worker threads Hybrid parallelism is useful with few workers SPAA ‘08 June 14, 2008 Parallelizing Dynamic Information Flow Tracking Olatunji Ruwase