Application Slowdown Model

Slides:



Advertisements
Similar presentations
Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.
Advertisements

Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
Challenges and Opportunities for System Software in the Multi-Core Era or The Sky is Falling, The Sky is Falling!
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Improving Cache Performance by Exploiting Read-Write Disparity
1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.
1 Multi-Core Systems CORE 0CORE 1CORE 2CORE 3 L2 CACHE L2 CACHE L2 CACHE L2 CACHE DRAM MEMORY CONTROLLER DRAM Bank 0 DRAM Bank 1 DRAM Bank 2 DRAM Bank.
Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks Vivek Seshadri Samihan Yedkar ∙ Hongyi Xin ∙ Onur Mutlu Phillip.
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Session 2B – 10:45 AM Vivek Seshadri Gennady Pekhimenko, Olatunji.
1 Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu.
Scalable Many-Core Memory Systems Lecture 4, Topic 3: Memory Interference and QoS-Aware Memory Systems Prof. Onur Mutlu
Stall-Time Fair Memory Access Scheduling Onur Mutlu and Thomas Moscibroda Computer Architecture Group Microsoft Research.
Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.
Micro 2012 Closing Remarks Onur Mutlu PC Chair December 3, 2012 Vancouver, BC, Canada.
Lixia Liu, Zhiyuan Li Purdue University, USA PPOPP 2010, January 2009.
Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.
Parallelism-Aware Batch Scheduling Enhancing both Performance and Fairness of Shared DRAM Systems Onur Mutlu and Thomas Moscibroda Computer Architecture.
The Evicted-Address Filter
ThyNVM Enabling Software-Transparent Crash Consistency In Persistent Memory Systems Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and.
Achieving High Performance and Fairness at Low Cost Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, Onur Mutlu 1 The Blacklisting Memory.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency.
Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur.
DRAM Tutorial Lecture Vivek Seshadri. Vivek Seshadri – Thesis Proposal DRAM Module and Chip 2.
Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Lavanya Subramanian 1.
M AESTRO : Orchestrating Predictive Resource Management in Future Multicore Systems Sangyeun Cho, Socrates Demetriades Computer Science Department University.
PPEP: online Performance, power, and energy prediction framework
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Improving Cache Performance using Victim Tag Stores
15-740/ Computer Architecture Lecture 0: Announcements/Logistics
Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh Eiman Ebrahimi, Gwangsun Kim,
Predictable Cache Coherence for Multi-Core Real-Time Systems
Reducing Memory Interference in Multicore Systems
A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Session 2A: Today, 10:20 AM Nandita Vijaykumar.
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,
A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Nandita Vijaykumar Gennady Pekhimenko, Adwait.
New Cache Designs for Thwarting Cache-based Side Channel Attacks
Milad Hashemi, Onur Mutlu, and Yale N. Patt
(Find all PTEs that map to a given PPN)
Prof. Gennady Pekhimenko University of Toronto Fall 2017
Application Slowdown Model
Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons, Onur Mutlu
Rachata Ausavarungnirun GPU 2 (Virginia EF) Tuesday 2PM-3PM
Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons, Onur Mutlu
Hasan Hassan, Nandita Vijaykumar, Samira Khan,
Thesis Defense Lavanya Subramanian
Row Buffer Locality Aware Caching Policies for Hybrid Memories
Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Session C1, Tuesday 10:40 AM Vivek Seshadri.
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques HPCA Session 3A – Monday, 3:15 pm,
Achieving High Performance and Fairness at Low Cost
Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu
Session 1A at am MEMCON Detecting and Mitigating
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.
Tapestry: Reducing Interference on Manycore Processors for IaaS Clouds
MEMCON: Detecting and Mitigating Data-Dependent Failures by Exploiting Current Memory Content Samira Khan, Chris Wilkerson, Zhe Wang, Alaa Alameldeen,
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
مديريت موثر جلسات Running a Meeting that Works
Prof. Onur Mutlu ETH Zürich Fall November 2017
Design of Digital Circuits Lab 6 Supplement: Testing the ALU
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.
Prof. Onur Mutlu Carnegie Mellon University 11/14/2012
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Computer Architecture Lecture 19a: Multi-Core Cache Management II
Prof. Onur Mutlu ETH Zürich Fall November 2018
Types of Errors And Error Analysis.
JMP Example 6 Three samples of size 10 are taken from an API (Active Pharmaceutical Ingredient) plant. The first one was taken at a batch reactor pressure.
Presentation transcript:

Application Slowdown Model Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur Mutlu

Problem: Interference at Shared Resources Core Core Shared Cache Main Memory Core Core 1. High application slowdowns 2. Unpredictable application slowdowns

Problem: Interference at Shared Resources Core Core Shared Cache Main Memory Core Core Our Goal: Achieve high and predictable performance

Our Approach Build a model to accurately estimate slowdowns Use slowdown estimates to build slowdown-aware resource management mechanisms

Challenge in Estimating Slowdown Slowdown= Performance Alone Performance Shared lowdown of an application is its performance when it is run stand alone on a system to its performance when it is sharing the system with other applications. An application’s performance when it is sharing the system with other applications can be measured in a straightforward manner. However, measuring an application’s alone performance, without actually running it alone is harder. This is where are first observation comes in.

Challenge in Estimating Slowdown Slowdown= Performance Alone Performance Shared Slowdown of an application is its performance when it is run stand alone on a system to its performance when it is sharing the system with other applications. An application’s performance when it is sharing the system with other applications can be measured in a straightforward manner. However, measuring an application’s alone performance, without actually running it alone is harder. This is where are first observation comes in. Easy

Challenge in Estimating Slowdown Slowdown= Performance Alone Performance Shared Slowdown of an application is its performance when it is run stand alone on a system to its performance when it is sharing the system with other applications. An application’s performance when it is sharing the system with other applications can be measured in a straightforward manner. However, measuring an application’s alone performance, without actually running it alone is harder. This is where are first observation comes in. Easy

Our estimation error: 10% Our Model Our model overcomes this challenge Our estimation error: 10% Best previous model’s error: 30% Slowdown of an application is its performance when it is run stand alone on a system to its performance when it is sharing the system with other applications. An application’s performance when it is sharing the system with other applications can be measured in a straightforward manner. However, measuring an application’s alone performance, without actually running it alone is harder. This is where are first observation comes in.

Leveraging Our Slowdown Estimates

Leveraging Our Slowdown Estimates Core Shared Cache Main Memory Core Slowdown-aware cache capacity partitioning

Leveraging Our Slowdown Estimates Core Shared Cache Main Memory Core Slowdown-aware memory bandwidth partitioning

Talk at 2:40pm in Tapa Ballroom 2 Application Slowdown Model Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur Mutlu