Reducing Cache Traffic and Energy with Macro Data Load

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

361 Computer Architecture Lecture 15: Cache Memory
09/16/2002 ICCD 2002 A Circuit-Level Implementation of Fast, Energy-Efficient CMOS Comparators for High-Performance Microprocessors* *supported in part.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
CS/COE1541: Introduction to Computer Architecture Datapath and Control Review Sangyeun Cho Computer Science Department University of Pittsburgh.
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Exploring Memory Consistency for Massively Threaded Throughput- Oriented Processors Blake Hechtman Daniel J. Sorin 0.
Federation: Repurposing Scalar Cores for Out- of-Order Instruction Issue David Tarjan*, Michael Boyer, and Kevin Skadron* University of Virginia Department.
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Synonymous Address Compaction for Energy Reduction in Data TLB Chinnakrishnan Ballapuram Hsien-Hsin S. Lee Milos Prvulovic School of Electrical and Computer.
Virtual Memory Chapter 18 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.
Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.
ISLPED 2003 Power Efficient Comparators for Long Arguments in Superscalar Processors *supported in part by DARPA through the PAC-C program and NSF Dmitry.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Memory Organization.
Compilation Techniques for Energy Reduction in Horizontally Partitioned Cache Architectures Aviral Shrivastava, Ilya Issenin, Nikil Dutt Center For Embedded.
ISLPED’03 1 Reducing Reorder Buffer Complexity Through Selective Operand Caching *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk,
ICS’02 1 Low-Complexity Reorder Buffer Architecture* *supported in part by DARPA through the PAC-C program and NSF Gurhan Kucuk, Dmitry Ponomarev, Kanad.
Maninder Kaur CACHE MEMORY 24-Nov
Sangyeun Cho Hyunjin Lee
7-1 Chapter 7 - Memory Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles of Computer Architecture.
7-1 Chapter 7 - Memory Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of Computer Architecture Miles.
Computer Science 210 Computer Organization The von Neumann Architecture.
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
What is cache memory?. Cache Cache is faster type of memory than is found in main memory. In other words, it takes less time to access something in cache.
Page 1 Trace Caches Michele Co CS 451. Page 2 Motivation  High performance superscalar processors  High instruction throughput  Exploit ILP –Wider.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
Reduction of Register File Power Consumption Approach: Value Lifetime Characteristics - Pradnyesh Gudadhe.
Memory Management. Why memory management? n Processes need to be loaded in memory to execute n Multiprogramming n The task of subdividing the user area.
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
CISC 879 : Advanced Parallel Programming Rahul Deore Dept. of Computer & Information Sciences University of Delaware Exploring Memory Consistency for Massively-Threaded.
Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.
M AESTRO : Orchestrating Predictive Resource Management in Future Multicore Systems Sangyeun Cho, Socrates Demetriades Computer Science Department University.
Cache Memory.
Variable Word Width Computation for Low Power
Computer architecture and computer organization
Improving Memory Access 1/3 The Cache and Virtual Memory
SECTIONS 1-7 By Astha Chawla
Basic Performance Parameters in Computer Architecture:
‘99 ACM/IEEE International Symposium on Computer Architecture
Cache Memory Presentation I
Lu Peng, Jih-Kwon Peir, Konrad Lai
Improving Program Efficiency by Packing Instructions Into Registers
Flow Path Model of Superscalars
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Computer Architecture Lecture 3
Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
BIC 10503: COMPUTER ARCHITECTURE
Lecture 23: Cache, Memory, Virtual Memory
Lecture: Cache Innovations, Virtual Memory
Chapter 6 Memory System Design
Lecture 22: Cache Hierarchies, Memory
Lecture: Cache Hierarchies
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
Chapter 1 Computer System Overview
Chapter Contents 7.1 The Memory Hierarchy 7.2 Random Access Memory
Restrictive Compression Techniques to Increase Level 1 Cache Capacity
Presentation transcript:

Reducing Cache Traffic and Energy with Macro Data Load Lei Jin and Sangyeun Cho* Dept. of Computer Science University of Pittsburgh

Motivation Data cache access is a frequent event 20~40% of all instructions access data cache Data cache energy can be significant (~16% in StrongARM chip [Montanaro et al. 1997]) Reducing cache traffic leads to energy savings Existing thoughts Store-to-load forwarding Load-to-load forwarding Use available resources to keep data for reuse LSQ [Nicolaescu et al. 2003] Reorder buffer [Önder and Gupta 2001]

Macro Data Load (ML) Previous works are limited by exact data matching Same address and same data type Exploit spatial locality in cache-port-wide data Accessing port-wide data is free Naturally fits datapath and LSQ width Recent processors support 64 bits Many accesses are less than 64 bits w/o ML w/ ML

ML Potential ML uncovers more opportunities CINT2k CFP2k MiBench ML uncovers more opportunities ML especially effective with limited resource

ML Implementation Architectural changes Net impact Relocated data alignment logic Sequential LSQ-cache access Net impact LSQ becomes a small fully associative cache with FIFO replacement

Result: Energy Reduction CINT CFP MiBench Up to 35% (MiBench) energy reduction! More effective than previous techniques