1Intel Research Pittsburgh 2CMU 3EPFL 4UT Austin

Slides:



Advertisements
Similar presentations
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael Kozuch 1, Theodoros Strigkos 2, Babak Falsafi 3, Phillip.
Advertisements

Compiler Optimized Dynamic Taint Analysis James Kasten Alex Crowell.
Integrity & Malware Dan Fleck CS469 Security Engineering Some of the slides are modified with permission from Quan Jia. Coming up: Integrity – Who Cares?
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
Chrysalis Analysis: Incorporating Synchronization Arcs in Dataflow-Analysis-Based Parallel Monitoring Michelle Goodstein*, Shimin Chen †, Phillip B. Gibbons.
Design of a Framework for Testing Security Mechanisms for Program-Based Attacks Ben “Security” Breech and Lori Pollock University of Delaware.
TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.
Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC.
Securing software by enforcing data-flow integrity Manuel Costa Joint work with: Miguel Castro, Tim Harris Microsoft Research Cambridge University of Cambridge.
Dynamic Program Security Aaron Roth Ali Sinop Gunhee Kim Hyeontaek Lim.
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
Dynamic Tainting for Deployed Java Programs Du Li Advisor: Witawas Srisa-an University of Nebraska-Lincoln 1.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Vigilante: End-to-End Containment of Internet Worms M. Costa et al. (MSR) SOSP 2005 Shimin Chen LBA Reading Group.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
MemTracker Efficient and Programmable Support for Memory Access Monitoring and Debugging Guru Venkataramani, Brandyn Roemer, Yan Solihin, Milos Prvulovic.
1 RAKSHA: A FLEXIBLE ARCHITECTURE FOR SOFTWARE SECURITY Computer Systems Laboratory Stanford University Hari Kannan, Michael Dalton, Christos Kozyrakis.
The Dirty-Block Index Vivek Seshadri Abhishek Bhowmick ∙ Onur Mutlu Phillip B. Gibbons ∙ Michael A. Kozuch ∙ Todd C. Mowry.
Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) Efficient Memory Shadowing for 64-bit Architectures ISMM 2010, Toronto, Canada June 6, 2010.
Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) Umbra: Efficient and Scalable Memory Shadowing CGO 2010, Toronto, Canada April 26, 2010.
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Parallelizing Security Checks on Commodity Hardware E.B. Nightingale, D. Peek, P.M. Chen and J. Flinn U Michigan.
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu,
Introduction: Exploiting Linux. Basic Concepts Vulnerability A flaw in a system that allows an attacker to do something the designer did not intend,
@ Carnegie Mellon Databases Inspector Joins Shimin Chen Phillip B. Gibbons Todd C. Mowry Anastassia Ailamaki 2 Carnegie Mellon University Intel Research.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Presenter: Jianyong Dai Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookhot.
A Case for Unlimited Watchpoints Joseph L. Greathouse †, Hongyi Xin*, Yixin Luo †‡, Todd Austin † † University of Michigan ‡ Shanghai Jiao Tong University.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:
M. Tiwari, B. Agrawal, S. Mysore, J. Valamehr, T. Sherwood, CS & ECE of UCSB Reading Group Presentation by Theo.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Computer Architecture Lab at Evangelos Vlachos, Michelle L. Goodstein, Michael A. Kozuch, Shimin Chen, Phillip B. Gibbons, Babak Falsafi and Todd C. Mowry.
Carnegie Mellon Log Based Dynamic Binary Analysis for Detecting Device Driver Defects Olatunji Ruwase Thesis Proposal Thesis Committee: Todd C. Mowry (Chair)
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
LBA Reading Group Review: HeapMon: A helper-thread approach to programmable, automatic, and low- overhead memory bug detection.
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
Optimistic Hybrid Analysis
Translation Lookaside Buffer
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
Cache Memory and Performance
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Virtual Memory - Part II
William Stallings Computer Organization and Architecture 8th Edition
Paging COMP 755.
Olatunji Ruwase* Shimin Chen+ Phillip B. Gibbons+ Todd C. Mowry*
Pinpointing Vulnerabilities
The Hardware/Software Interface CSE351 Winter 2013
Effective Data-Race Detection for the Kernel
Taint tracking Suman Jana.
Energy-Efficient Address Translation
Chapter 8: Main Memory.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Reducing Memory Reference Energy with Opportunistic Virtual Caching
Andy Wang Operating Systems COP 4610 / CGS 5765
Andy Wang Operating Systems COP 4610 / CGS 5765
Translation Lookaside Buffer
Parallelizing Dynamic Information Flow Tracking
José A. Joao* Onur Mutlu‡ Yale N. Patt*
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Paging and Segmentation
CS703 - Advanced Operating Systems
Andy Wang Operating Systems COP 4610 / CGS 5765
Dynamic Binary Translators and Instrumenters
TEE-Perf A Profiler for Trusted Execution Environments
Presentation transcript:

1Intel Research Pittsburgh 2CMU 3EPFL 4UT Austin Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen Joint work with Michael Kozuch1, Theodoros Strigkos2, Babak Falsafi3, Phillip B. Gibbons1, Todd C. Mowry1,2, Vijaya Ramachandran4, Olatunji Ruwase2, Michael Ryan1, Evangelos Vlachos2 1Intel Research Pittsburgh 2CMU 3EPFL 4UT Austin

Instruction-Grain Monitoring Software often contain bugs Memory corruptions, data races, …, crashes Security attacks often designed to exploit bugs Instruction-grain lifeguards can help Dynamic monitoring: during application execution Instruction-grain: e.g., memory access, data flow Enables a wide range of powerful lifeguards Lifeguard Application Difficult to write bug-free code (don’t need to say) Reason: added functionality over time, time-to-market pressures, parallel code (for CMPs) Dynamic monitoring instruction-grain events enables a wide range of lifeguards Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Example Instruction-Grain Lifeguards AddrCheck: Monitor malloc/free, memory accesses Check if all memory accesses visit allocated memory regions MemCheck: AddrCheck + check uninitialized values Copying partially uninitialized structures is not an error Lazy error detection to avoid many false positives Track propagation of uninitialized values TaintCheck: detect overwrite-based security exploits Tainted data: data from network or disk Track propagation of tainted data to detect violations LockSet: detect data races in parallel programs [Nethercote’04] [Nethercote & Seward ’03 ’07] Let me briefly describe a number of representative examples [Newsome & Song’05] [Savage et al.’97] Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Design Space of Support Platform Lifeguard-specific hardware This paper Good [Crandall & Chong’04], [Dalton et al’07], [Shetty et al’06], [Shi et al’06], [Suh et al’04], [Venkataramani’07], [Venkataramani’08], [Zhou et al’07] General-Purpose HW improving DBI 3-8X slowdowns [Chen et al’06] [Corliss’03] Performance Dynamic binary instrumentation (DBI) 10-100X slowdowns [Bruening’04] [Luk et al’05] [Nethercote’04] Our contribution is to achieve …flexibility + performance In this way, we hope the solution has a better chance to get into future hardware. DBI: 1,14,20 [Bruening’04] [Luk et al’05][Nethercote’04] Lifeguard Specific: 7,8, 28,29, 30,34,35, 41 [Crandall & Chong’04], [Dalton et al’07], [Shetty et al’06], [Shi et al’06], [Suh et al’04], [Venkataramani’08], [Venkataramani ’07], [Zhou et al’07] General-Purpose HW: LBA [Chen et al’06] DISE [Corliss’03] Poor Specific Lifeguard General Purpose: Wide Range of Lifeguards Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Outline Introduction Background Three Hardware Acceleration Techniques Experimental Evaluation Conclusion Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Example Lifeguard: TaintCheck [Newsome & Song’05] Purpose: detect overwrite-based security exploits Metadata kept for application memory and registers Tainted data: data from network or disk Track taint propagation Detect violation: e.g., tainted jump target address Application TaintCheck Lifeguard mov %eax  A mov B  %eax taint(%eax) = taint(A) taint(B) = taint(%eax) Heap overflow, stack smashing say at the beginning add %ebx  D taint(%ebx)|= taint(D) Detect exploit before attack code takes control jmp *(F) if (taint(F)==1) error; Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

TaintCheck w/ Detailed Tracking Detect violation 1 taint bit / application byte TaintCheck w/ detailed tracking: Construct taint propagation trail More detailed metadata per application location PC of Instruction that tainted this location “tainted from” address Not supported by previous lifeguard-specific HW [Newsome & Song’05] Input Violation Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Instruction-Grain Lifeguard Metadata Characteristics Organization varies per application byte/word size, format, semantics vary greatly Frequently updated e.g., propagation tracking Frequently checked e.g., memory accesses Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Event-capture and delivery Lifeguard Support Application (unmodified) Lifeguard (software) Rare e.g., malloc/free, system calls Frequent e.g., memory access, data movement Events Event Handlers metadata rare events Rare Update Check 1 2 3 Event-capture and delivery More details General-Purpose HW improving DBI Performance bottlenecks: metadata mapping, updates, and checks Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Event-capture and delivery Our Contributions Application (unmodified) Lifeguard (software) Rare e.g., malloc/free, system calls Frequent e.g., memory access, data movement Events Event Handlers metadata rare events Rare Update Check M-TLB IT IF Event-capture and delivery More details Metadata-TLB for metadata mapping Inheritance Tracking for metadata updates Idempotent Filters for metadata checks Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Outline Introduction Background Three Hardware Acceleration Techniques Metadata-TLB Inheritance Tracking Idempotent Filters Experimental Evaluation Conclusion Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Metadata-TLB: Motivation Level-1 index Level-2 chunks Metadata per app byte/word Element size may vary Two-level structure: Robustness & space efficiency Mapping: application address  metadata address Frequently used in almost every handler Can be very costly Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Metadata Mapping takes 5 out of 8 instructions ! Example (TaintCheck) void dest_reg_op_mem_4B (UINT32 src_addr /*%eax*/, UINT32 dest_reg /*%edx */) // app instruction type: dest_reg  dest_reg op mem(src_addr) // handler operation: reg_taint(dest_reg)|= mem_taint(src_addr)   map *mp = level1_index[src_addr>>16];     mov  %eax, %ecx                  shr  $16, %ecx        mov  level1_index(,%ecx,4),%ecx   int idx = (src_addr & 0xffff)>>2;     and  $0xffff, %eax shr  $2, %eax UChar mem_taint = mp[idx];    movzbl (%ecx,%eax,1), %eax reg_taint[dest_reg] |= mem_taint;    or %al, reg_taint(%edx)          nlba (); nlba This is our model of how event is delivered Metadata Mapping takes 5 out of 8 instructions ! Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Our Solution: Metadata-TLB A TLB-like HW associative lookup table LMA (Load Metadata Address) instruction: Application address  lifeguard metadata address Managed by (user-mode) lifeguard software Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Example (TaintCheck) w/ M-TLB void dest_reg_op_mem_4B (UINT32 src_addr /*%eax*/, UINT32 dest_reg /*%edx */) // app instruction type: dest_reg  dest_reg op mem(src_addr) // handler operation: reg_taint(dest_reg)|= mem_taint(src_addr)   map *mp = level1_index[src_addr>>16];     mov  %eax, %ecx                  shr  $16, %ecx        mov  level1_index(,%ecx,4),%ecx   int idx = (src_addr & 0xffff)>>2;     and  $0xffff, %eax shr  $2, %eax UChar mem_taint = mp[idx];    movzbl (%ecx,%eax,1), %eax reg_taint[dest_reg] |= mem_taint;    or %al, reg_taint(%edx)          nlba (); nlba   UChar *p = LMA_macro(src_addr);     LMA  %eax, %ecx UChar mem_taint = *p;      mov (%ecx), %al reg_taint[dest_reg] |= mem_taint;    or %al, reg_taint(%edx)   nlba (); nlba Reduce handler size by half ! Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Inheritance Tracking: Motivation Propagation tracking is expensive Metadata updates for almost every app instruction Previous hardware solutions track propagation automatically update metadata in hardware Problem: only support simple metadata semantics e.g., do not support TaintCheck w/ detailed tracking Our goal: flexibility AND performance Idea: inheritance structure is common, so let’s track inheritance in hardware! Making simplified assumptions about metadata format Track inheritance still keeps the lifeguard flexibility and at the same time reduce a large fractions of metadata update calls Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Problem with General Inheritance Tracking mov %eax  A mov B  %eax taint(%eax) = taint(A) taint(B) = taint(%eax) Application Propagation Tracking %eax inherits from A B inherits from %eax Inheritance Tracking add %ebx  D taint(%ebx) |= taint(D) insert D into %ebx’s inherit-from list Problem: state explosion for binary operations ! Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Unary Inheritance Tracking Many lifeguards can take advantage of unary IT: MemCheck TaintCheck Large performance improvements if used Can be disabled if unary IT does not match the lifeguard check known check Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Tracking Register Inheritance Transformed event State Transition & Event to Deliver Original event IT(%rs) IT(%rd) Deliver event IT table for registers More details in the paper: IT table and state transition table details Conflict detection Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Can significantly reduce metadata update events! Example Application Before Inheritance Tracking mov %eax  A mov B  %eax mem_to_reg reg_to_mem mem_to_mem mov %ebx  C add %ebx  D mov E  %ebx mem_to_reg dest_reg_op_mem reg_to_mem imm_to_mem Can significantly reduce metadata update events! Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Idempotent Filters: Idea Typically, metadata checks give the same result if Event parameters are the same and Metadata are the same Idea: filter out idempotent (redundant) events For example: AddrCheck: After checking that a memory location is allocated Subsequent loads/stores to the same location are safe Until the next free() event LockSet: (surprisingly) In between synchronization events (e.g., lock/unlock) Check first load to a location Check first store to a location Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Outline Introduction Background Three Hardware Acceleration Techniques Experimental Evaluation Log-Based Architectures (LBA) Simulation Study (w/ reduced input sets) PIN-based Analysis (w/ full inputs) Conclusion Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Log-Based Architectures Application (unmodified) Lifeguard (software) Rare e.g., malloc/free, system calls Frequent e.g., memory access, data movement Events Event Handlers metadata rare events Rare Update Check Event-capture and delivery More details Log-Based Architecture (LBA) Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Idea: Exploiting Chip Multiprocessors LBA components Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Simulation Setup: Dual-Core LBA System Log Transport (e.g. L2 cache) Core 1 Core 2 decompress Compress capture dispatch Operating System: Fedora Core 5 Application Lifeguard Extend Virtutech Simics M-TLB IT & IF Application and lifeguard are processes Application is stalled when log buffer is full Model a 2-level cache hierarchy Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Overall Performance: TaintCheck 1.36X LBA baseline LBA optimized Slowdown = application execution time w/o lifeguard application execution time w/ lifeguard Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Applying Our Techniques One by One 10.0 AddrCheck MemCheck TaintCheck TaintCheck w/ detailed tracking LockSet 9.0 7.80 8.0 7.0 6.05 6.0 average slowdowns 5.0 4.21 4.25 3.81 4.0 3.23 3.27 3.36 3.20 2.71 3.0 2.29 1.90 2.0 1.36 1.51 1.40 1.02 1.0 0.0 BASE MTLB BASE MTLB BASE MTLB BASE MTLB BASE MTLB MTLB+IF MTLB+IT MTLB+IT MTLB+IT MTLB+IF MTLB+IT+IF IT, IF, and M-TLB are indeed complementary Achieve dramatically better performance Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

PIN-Based Analysis: IT IT removes 35.8% to 82.0% of the propagation events Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

PIN-Based Analysis: IF 10 20 30 40 50 60 70 80 8 16 32 64 128 256 number of filter entries reduced check events (%) fully-assoc 16-way 8-way 4-way 2-way 1-way AddrCheck LockSet IF can effectively reduce check events 4-way works as well as fully-associative Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Conclusion Our focus: Instruction-Grain Lifeguards Three complementary hardware techniques: Metadata-TLB (M-TLB) Inheritance Tracking (IT) Idempotent Filters (IF) Flexible to support a wide range of lifeguards Reducing overheads by 2-3X in our experiments Achieving 2-51% overheads for all but MemCheck Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

Thank you! Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen

People Working on LBA Project Intel Research: Shimin Chen Phillip B. Gibbons University Faculty: Babak Falsafi (EPFL) Todd C. Mowry (CMU) CMU Students: Michelle Goodstein Olatunji Ruwase Mike Kozuch Michael Ryan Vijaya Ramachandran (UT Austin) Theodoros Strigkos Evangelos Vlachos Previous Contributors: Limor Fix (IRP) Steve Schlosser (IRP) Anastasia Ailamaki (CMU) Greg Ganger (CMU) Bin Lin (Northwestern) Radu Teodorescu (UIUC) Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Shimin Chen