Heat Stroke: Power-Density- Based Denial of Service in SMT Jahangir Hasan Ankit Jalote T. N. Vijaykumar School of Electrical & Computer Engineering, Purdue.

Slides:

Advertisements

Similar presentations

Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,

Advertisements

CS 7810 Lecture 4 Overview of Steering Algorithms, based on Dynamic Code Partitioning for Clustered Architectures R. Canal, J-M. Parcerisa, A. Gonzalez.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.

Multithreading processors Adapted from Bhuyan, Patterson, Eggers, probably others.

UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.

CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Resource Containers: A new Facility for Resource Management in Server Systems G. Banga, P. Druschel,

Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.

1 Multi-Core Systems CORE 0CORE 1CORE 2CORE 3 L2 CACHE L2 CACHE L2 CACHE L2 CACHE DRAM MEMORY CONTROLLER DRAM Bank 0 DRAM Bank 1 DRAM Bank 2 DRAM Bank.

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

CS 7810 Lecture 20 Initial Observations of the Simultaneous Multithreading Pentium 4 Processor N. Tuck and D.M. Tullsen Proceedings of PACT-12 September.

1 Lecture 11: ILP Innovations and SMT Today: out-of-order example, ILP innovations, SMT (Sections 3.5 and supplementary notes)

An Integrated Framework for Dependable Revivable Architectures Using Multi-core Processors Weiding Shi, Hsien-Hsin S. Lee, Laura Falk, and Mrinmoy Ghosh.

Aleksandar Kuzmanovic & Edward W. Knightly A Performance vs. Trust Perspective in the Design of End-Point Congestion Control Protocols.

Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.

1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.

Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.

Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

SyNAR: Systems Networking and Architecture Group Symbiotic Jobscheduling for a Simultaneous Multithreading Processor Presenter: Alexandra Fedorova Simon.

By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and

The Vector-Thread Architecture Ronny Krashinsky, Chris Batten, Krste Asanović Computer Architecture Group MIT Laboratory for Computer Science

Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.

In-Line Interrupt Handling for Software Managed TLBs Aamer Jaleel and Bruce Jacob Electrical and Computer Engineering University of Maryland at College.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.

Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.

Enhancing the Security of Corporate Wi-Fi Networks using DAIR PRESENTED BY SRAVANI KAMBAM 1.

Stall-Time Fair Memory Access Scheduling Onur Mutlu and Thomas Moscibroda Computer Architecture Group Microsoft Research.

Adaptive Cache Partitioning on a Composite Core Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke Computer Engineering Lab University.

ReSlice: Selective Re-execution of Long-retired Misspeculated Instructions Using Forward Slicing Smruti R. Sarangi, Wei Liu, Josep Torrellas, Yuanyuan.

1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.

1 Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University.

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

Analyzing Performance Vulnerability due to Resource Denial-Of-Service Attack on Chip Multiprocessors Dong Hyuk WooGeorgia Tech Hsien-Hsin “Sean” LeeGeorgia.

A paper by: Paul Kocher, Joshua Jaffe, and Benjamin Jun Presentation by: Michelle Dickson.

Securing Passwords Against Dictionary Attacks Presented By Chad Frommeyer.

Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.

Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.

Runtime Software Power Estimation and Minimization Tao Li.

1 SIGCOMM ’ 03 Low-Rate TCP-Targeted Denial of Service Attacks A. Kuzmanovic and E. W. Knightly Rice University Reviewed by Haoyu Song 9/25/2003.

Thermal-aware Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work was supported.

Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.

Selective Packet Inspection to Detect DoS Flooding Using Software Defined Networking Author : Tommy Chin Jr., Xenia Mountrouidou, Xiangyang Li and Kaiqi.

Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.

Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal

LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”

On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.

Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.

Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.

Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur.

ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.

Adaptable Approach to Estimating Thermal Effects in a Data Center Environment Corby Ziesman IMPACT Lab Arizona State University.

PipeliningPipelining Computer Architecture (Fall 2006)

Data Prefetching Smruti R. Sarangi.

Adaptive Cache Partitioning on a Composite Core

Simultaneous Multithreading

Assembly Language for Intel-Based Computers, 5th Edition

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

/ Computer Architecture and Design

Bruhadeshwar Meltdown Bruhadeshwar

Application Slowdown Model

Hardware Multithreading

TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for Online Search Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and T. N. Vijaykumar.

How to improve (decrease) CPI

Data Prefetching Smruti R. Sarangi.

/ Computer Architecture and Design

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

Identifying Slow HTTP DoS/DDoS Attacks against Web Servers DEPARTMENT ANDDepartment of Computer Science & Information SPECIALIZATIONTechnology, University.

Virtual Memory: Working Sets

Presented by Florian Ettinger

Presentation transcript:

Heat Stroke: Power-Density- Based Denial of Service in SMT Jahangir Hasan Ankit Jalote T. N. Vijaykumar School of Electrical & Computer Engineering, Purdue University Carla Brodley Department of Computer Science, Tufts University

Denial of Service (DOS) Attacks Resource sharing prevalent in current systems Malicious users can exploit the sharing DOS attacks maliciously hog shared resource Can render the system practically inoperative For example 1.Fork bomb 2.TCP syn flood Can be detrimental to businesses and organizations Must address DOS attacks to prevent financial loss

Vulnerability of SMT Multiple threads share pipeline resources in SMT 1.Register File 2.Cache 3.Fetch bandwidth => Opportunity for DOS attacks Previously-known attacks on SMT -Trace Cache flushing via self-modifying code [Micro02] Are there other unaddressed vulnerabilities?

Heat Stroke: A Novel Attack in SMT Repeatedly accessing pipeline resources create hot spots Must stall/slow down to cool Resource shared => all threads suffer Heat Stroke exploits this vulnerability Persistently access shared resource at a high rate Repeated hot-spots cause repeated slowing/stalling Significantly degrade performance of all other threads - E.g., 1.2ms to heat, 12ms to cool => 90% slowdown Must address this novel and detrimental attack

Previous Schemes not Applicable Can heat stroke be solved by packaging? -Designed for avg. power, not local hot spots -Problem worsens with scaling Can heat stroke be solved by architecture? -Slow down/stall entire pipeline (Clk, V Scaling) -They address occasional hot spots -But heat stroke is persistent and prolonged Heat Stroke causes large degradation

Contributions Identify Heat Stroke as a novel DOS attack in SMT -Does not exploit ICOUNT or monopolize resource -Purely a power-density problem Propose Selective Sedation to address Heat Stroke -Identify culprit thread based on resource usage -Throttle only culprit thread -Allow other threads to continue execution We successfully prevent Heat Stroke

Key Features of Selective Sedation We do not solve the general power-density problem -Stall only the thread having power-density problem -Prevent it from affecting non-problematic threads We do not separate malicious and non-malicious -Doing so would be hard -Must stall thread if it creates hot spot, malicious or not -Therefore unnecessary to determine nature

Overview Introduction Heat Stroke Examples Our Solution to Heat Stroke Methodology Results Conclusions

An Example of Heat Stroke Label1: add $1, $2, $3 br Label1 High-ILP program executes without stalls Repeatedly access register file at high rate Create repeated hot spots at register file Heat-up time short (1.2ms), cooling time long (12ms) Degrades CPU utilization to 10%, but is it due to hogging fetch bandwidth or due to heat?

Moderated Example of Heat Stroke Label1: add $1, $2, $3 br 15*10 6 Label1 Label2: ld $4, addr1 (cache miss) … … … ld $4, addr9 (cache miss) br 3*10 3 Label2 Net ILP low => not hog fetch bandwidth Register file still accessed at high rate Cleverly moderated code still inflicts heat stroke Heat stroke does not monopolize resources High IPC Phase Low IPC Phase

Overview Introduction Heat Stroke Examples Our Solution to Heat Stroke Methodology Results Conclusions

Selective Sedation Solution based on two key observations 1.Need stall only culprit thread, not entire pipeline => Avoid performance loss for normal threads 2.Access rate of culprit thread higher than others => Easy to identify culprit Two steps in Selective Sedation 1.Correctly and efficiently identify culprit thread 2.Selectively sedate only culprit thread

Timely Detection of Heat Stroke Performance suffers due to long cooling time Damage already done if hot spot gets created Use temperature threshold just below emergency [HPCA01] Detect Heat Stroke in timely manner Launch Selective Sedation before actual hot spot

Identifying Culprit Thread Flat average of access rate can be misleading Need to track recent history Wt. average counter for recent access-rate history - Details in paper When temperature threshold is exceeded => Highest value counter indicates culprit thread

Selective Sedation Stall fetch of only culprit thread Remaining threads continue to execute Allow hot resource to cool Resume culprit’s fetch when temperature gets normal -to avoid starvation of culprit thread Can report repeat-offender to OS

Experimental Methodology -Extend Wattch to include Hot-Spot and SMT -Base case stops pipeline upon a hot spot -All simulations run for 500 million cycles -Use a history of 0.5 million cycles for identifying culprit Architectural Parameters Issue6, out-of-order L164K 4-way, 2 cycle L22M 8-way, 12-cycle ROB128 Power Density Parameters Clock4 GHz Heat Sink0.8 K/W Cooling time10 ms

Overview -Introduction -Heat Stroke Examples -Our Solution to Heat Stroke -Methodology -Results -Conclusions

Inflicting and Sedating Heat Stroke Heat stroke causes repeated hot spots Selective Sedation drastically contains hot spots

Performance Impact of Heat Stroke and Selective Sedation Our realistic heat sink is reasonable Heat stroke does not hog resources Heat stroke causes huge performance loss Selective Sedation restores performance

Effect of Sedation on Normal programs Selective Sedation has no adverse effect on normal programs

Overview Introduction Heat-Stroke Examples Our Solution to Heat Stroke Methodology Results Conclusions

Identified Heat Stroke as a novel DOS attack Proposed Selective Sedation to address Heat Stroke Identify and stall culprit thread, not entire pipeline Our results show that selective sedation: -Effectively prevents Heat Stroke -Is robust across heat-sink and threshold variations -Has no adverse effect on normal programs We identified and solved a novel DOS attack in SMT

Heat Stroke: Power-Density- Based Denial of Service in SMT Jahangir Hasan Ankit Jalote T. N. Vijaykumar School of Electrical & Computer Engineering, Purdue University Carla Brodley Department of Computer Science, Tufts University

Symbiotic OS Scheduling solves heat stroke by ensuring fairness ? Symbiotic OS Scheduling for SMT [SIGMETRICS02]: 1. Assumes that degradation is due to incompatibility => Continues to run malicious threads to find compatibility 2. Heat stroke can cause long solo-execution of threads => System utilization degraded to guarantee fairness 3. Cleverly designed code can fool Symbiotic OS Scheduler Test phase => behave normal, Run phase =>heat stroke Symbiotic scheduling does not solve heat stroke

Once remote machine is accessed, other DOS attacks more severe than heat stroke are possible ? Systems are patched for known DOS attack methods -Important to address every DOS attack method -Heat stroke unaddressed => system vulnerable Heat stroke must be addressed irrespective of other attacks

Selective sedation is unfair to non- malicious high-resource-usage thread? Non-malicious high-resource-usage thread: -Performance is inherently power-density limited -Any scheme must throttle such thread Selective sedation does nothing to worsen its performance Selective sedation is not unfair

Variation Studies Effectiveness not sensitive to heat-sink or threshold precision