Energy-Efficient Data Compression for Modern Memory Systems Gennady Pekhimenko ACM Student Research Competition March, 2015.

Slides:

Advertisements

Similar presentations

Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu

Advertisements

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.

T.Sharon-A.Frank 1 Multimedia Compression Basics.

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim,

System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.

Dynamic Data Compression in Multi-hop Wireless Networks Abhishek B. Sharma (USC) Collaborators: Leana Golubchik Ramesh Govindan Michael J. Neely.

Performance Analysis and Optimization (General guidelines; Some of this is review) Outline: introduction evaluation methods timing space—code compression.

A Study of Energy Efficiency Methods for Memory Mao-Yin Wang & Cheng-Wen Wu.

Video Staging: A Proxy-Server- Based Approach to End-to-End Video Delivery over Wide-Area Networks Zhi-Li Zhang, Yuewei Wang, David H.C Du, Dongli Su Άννα.

DWT based Scalable video coding with scalable motion coding Syed Jawwad Bukhari.

June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.

Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.

Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture Brian Rogers †‡, Anil Krishna.

ClickCAM Using Click for Exploring Power Saving Schemes in Router Architectures Jonathan Ellithorpe, Laura Keys CS 252, Spring 2009.

Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.

Assets and Dynamics Computation for Virtual Worlds.

AdHoc Probe: Path Capacity Probing in Wireless Ad Hoc Networks Ling-Jyh Chen, Tony Sun, Guang Yang, M.Y. Sanadidi, Mario Gerla Computer Science Department,

Skewed Compressed Cache

Towards Eco-friendly Database Management Systems W. Lang, J. M. Patel (U Wisconsin), CIDR 2009 Shimin Chen Big Data Reading Group.

Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches Gennady Pekhimenko Vivek Seshadri Onur Mutlu, Todd C. Mowry Phillip B.

Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.

High Throughput Compression of Double-Precision Floating-Point Data Martin Burtscher and Paruj Ratanaworabhan School of Electrical and Computer Engineering.

Sharanya Eswaran, Penn State University Matthew Johnson, CUNY Archan Misra, Telcordia Technolgoies Thomas La Porta, Penn State University Utility-driven.

CacheLab 10/10/2011 By Gennady Pekhimenko. Outline Memory organization Caching – Different types of locality – Cache organization Cachelab – Warnings.

 Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there’s.

Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor Mark Gebhart 1,2 Stephen W. Keckler 1,2 Brucek Khailany 2 Ronny Krashinsky.

Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,

Power Save Mechanisms for Multi-Hop Wireless Networks Matthew J. Miller and Nitin H. Vaidya University of Illinois at Urbana-Champaign BROADNETS October.

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.

ELEC 5270 – Low Power Design of Electronic Circuits Spring 2009 Grant Lewis 1.

Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada Optimal Partitioning of Fine-Grained Scalable Video Streams Mohamed Hefeeda.

Application of Data Compression to the MIL-STD-1553 Data Bus Scholar’s Day Feb. 1, 2008 By Bernard Lam.

Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.

Lecture 18: Dynamic Reconfiguration II November 12, 2004 ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II.

Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.

Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

Adaptive Data Aggregation for Wireless Sensor Networks S. Jagannathan Rutledge-Emerson Distinguished Professor Department of Electrical and Computer Engineering.

PFPC: A Parallel Compressor for Floating-Point Data Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Cornell University.

Main Memory CS448.

CS 153 Design of Operating Systems Spring 2015 Lecture 21: File Systems.

A new perspective on processing-in-memory architecture design These data are submitted with limited rights under Government Contract No. DE-AC52-8MA27344.

Exploiting Compressed Block Size as an Indicator of Future Reuse

March, 2002 Efficient Bitmap Indexing Techniques for Very Large Datasets Kesheng John Wu Ekow Otoo Arie Shoshani.

1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.

Energy-Efficient Signal Processing and Communication Algorithms for Scalable Distributed Fusion.

Spin-down Disk Model Not Spinning Spinning & Ready Spinning & Access Spinning & Seek Spinning up Spinning down Inactivity Timeout threshold* Request Trigger:

Sunpyo Hong, Hyesoon Kim

Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?

Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.

Prefetching Techniques. 2 Reading Data prefetch mechanisms, Steven P. Vanderwiel, David J. Lilja, ACM Computing Surveys, Vol. 32, Issue 2 (June 2000)

Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]

Application-Aware Traffic Scheduling for Workload Offloading in Mobile Clouds Liang Tong, Wei Gao University of Tennessee – Knoxville IEEE INFOCOM

Block Cache for Embedded Systems Dominic Hillenbrand and Jörg Henkel Chair for Embedded Systems CES University of Karlsruhe Karlsruhe, Germany.

CSE 340 Computer Architecture Summer 2016 Understanding Performance.

A Case for Toggle-Aware Compression for GPU Systems

A New Technique for Sidelobe Suppression in OFDM Systems

File-System Implementation

SECTIONS 1-7 By Astha Chawla

Prof. Onur Mutlu and Gennady Pekhimenko Carnegie Mellon University

Protocols for Low Power

Donghyuk Lee, Mike O’Connor, Niladrish Chatterjee

CARP: Compression Aware Replacement Policies

Performance Optimization for Embedded Software

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.

Tapestry: Reducing Interference on Manycore Processors for IaaS Clouds

CARP: Compression-Aware Replacement Policies

Using NFFI Web Services on the tactical level: An evaluation of compression techniques 13th ICCRTS: C2 for Complex Endeavors Frank T. Johnsen.

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.

Presentation transcript:

Energy-Efficient Data Compression for Modern Memory Systems Gennady Pekhimenko ACM Student Research Competition March, 2015

High Performance Computing Is Everywhere 2 Modern memory systems are bandwidth constrained Energy efficiency is key across the board Data Compression is a promising technique to address these challenges Applications today are data-intensive

Potential of Data Compression Multiple simple patterns: zeros, repeated values, narrow values, pointers 3 0xC04039C00xC04039C80xC04039D00xC04039D8… Different Algorithms: BΔI [PACT’12] is based on Base-Delta Encoding Frequent Pattern Compression (FPC) [ISCA’04] C-Pack [Trans. on VLSI’12] Statistical Compression (SC 2 ) [ISCA’14]... These algorithms improve performance But there are challenges… Low Dynamic Range: Differences between values are significantly smaller than the values themselves

Energy Efficiency: Bit Toggles Previous data: Toggles Energy = C*V 2 How energy is spent in data transfers: 0101 New data: Energy: Energy of data transfers (e.g., NoC, DRAM) is proportional to the number of toggles

Excessive Number of Bit Toggles 5 0x00003A000x8001D0000x00003A010x8001D008 … Flit 0 Flit 1 XOR … = # Toggles = 2 Compressed Cache Line (FPC) 0x5 0x3A000x7 8001D000 0x5 0x3A01 Flit 0 Flit 1 XOR … = # Toggles = 31 0x7 8001D008 … 5 3A D D D A02 1 Uncompressed Cache Line

Effect of Compression on Bit Toggles 6 NVIDIA Apps: Mobile GPU – 54 in total, Discrete GPU – 167 in total Significant increase in the number of toggles, hence potentially increase in consumed energy

Toggle-Aware Data Compression Problem: 1.53X effective compression ratio 2.19X increase in toggle count Goal: Find the optimal tradeoff between toggle count and compression ratio Key Idea: Determine toggle rate for compressed vs. uncompressed data Use a heuristic (Energy X Delay or Energy X Delay 2 metric) to estimate the tradeoff Throttle compression to reach estimated tradeoff 10

Energy Control (EC) Flow 8 $Line Compression Comp. $Line # Toggles Energy Control T0T0 T1T1 Comp. ratio Decision function Send compressed or uncompressed Select

Energy Control: Effect on Bit Toggles 9 NVIDIA Apps: Mobile GPU – 54 in total, Discrete GPU – 167 in total Significant decrease in the number of toggles

Energy Control: Effect on Compression Ratio 10 NVIDIA Apps: Mobile GPU – 54 in total, Discrete GPU – 167 in total Modest decrease in compression ratio

Optimization: Metadata Consolidation 11 Compressed Cache Line with FPC, 4-byte flits Toggle-aware FPC: all metadata consolidated 0x5, 0x3A00,0x5, 0x3A01, 0x5, 0x3A02, … All metadata 0x5, 0x3A03, 0x3A00,0x3A01, 0x3A02,0x3A03, 0x5 …0x5 0x5 # Toggles = 18 # Toggles = 2 Additional 3.2%/2.9% reduction in toggles for FPC/C-Pack

Summary Bandwidth and energy efficiency are the first order concerns in modern systems Data compression is an attractive way to get higher effective bandwidth efficiently Problem: Excessive toggles (‘0’  ’1’) waste power/energy Key Idea: Estimate the tradeoff between compression ratio and energy efficiency (Energy X Delay or Energy X Delay 2 ) Throttle compression when the overall energy increases 12

Energy-Efficient Data Compression for Modern Memory Systems Gennady Pekhimenko ACM Student Research Competition March, 2015