2013/10/21 Yun-Chung Yang An Energy-Efficient Adaptive Hybrid Cache Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman, Yi Zou Computer.

Slides:

Advertisements

Similar presentations

Advertisements

2013/06/10 Yun-Chung Yang Kandemir, M., Yemliha, T. ; Kultursay, E. Pennsylvania State Univ., University Park, PA, USA Design Automation Conference (DAC),

CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.

Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.

Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.

Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.

Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and

Virtual Memory Hardware Support

CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.

1 Lecture 16: Large Cache Design Papers: An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches, Kim et al., ASPLOS’02 Distance Associativity.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

Glenn Reinman, Brad Calder, Department of Computer Science and Engineering, University of California San Diego and Todd Austin Department of Electrical.

1 Lecture 15: Virtual Memory and Large Caches Today: TLB design and large cache design basics (Sections )

CS 333 Introduction to Operating Systems Class 11 – Virtual Memory (1)

CS 104 Introduction to Computer Science and Graphics Problems

Principle Behind Hierarchical Storage  Each level memorizes values stored at lower levels  Instead of paying the full latency for the “furthermost” level.

Lecture 17: Virtual Memory, Large Caches

EECS 470 Cache Systems Lecture 13 Coverage: Chapter 5.

1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.

1 Lecture 11: Large Cache Design Topics: large cache basics and… An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches, Kim et al.,

Mem. Hier. CSE 471 Aut 011 Evolution in Memory Management Techniques In early days, single program run on the whole machine –Used all the memory available.

An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.

Dyer Rolan, Basilio B. Fraguela, and Ramon Doallo Proceedings of the International Symposium on Microarchitecture (MICRO’09) Dec /7/14.

1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.

1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.

Chapter 8 Memory Management Dr. Yingwu Zhu. Outline Background Basic Concepts Memory Allocation.

A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.

CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (

Lecture 19: Virtual Memory

1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison.

Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.

Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.

CSCI 232© 2005 JW Ryder1 Cache Memory Organization Direct Mapping Fully Associative Set Associative (very popular) Sector Mapping.

1 Lecture 13: Cache, TLB, VM Today: large caches, virtual memory, TLB (Sections 2.4, B.4, B.5)

Low-Power Cache Organization Through Selective Tag Translation for Embedded Processors with Virtual Memory Support Xiangrong Zhou and Peter Petrov Proceedings.

CS399 New Beginnings Jonathan Walpole. Virtual Memory (1)

2013/12/09 Yun-Chung Yang Partitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems Takase, H. ; Tomiyama, H.

Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.

Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)

Paging Paging is a memory-management scheme that permits the physical-address space of a process to be noncontiguous. Paging avoids the considerable problem.

1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.

CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.

1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.

Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)

University of Toronto Department of Electrical And Computer Engineering Jason Zebchuk RegionTracker: Optimizing On-Chip Cache.

1 Lecture: Virtual Memory Topics: virtual memory, TLB/cache access (Sections 2.2)

1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)

Energy-Efficient Hardware Data Prefetching Yao Guo, Mahmoud Abdullah Bennaser and Csaba Andras Moritz.

High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.

CS203 – Advanced Computer Architecture Virtual Memory.

Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.

CDA 5155 Virtual Memory Lecture 27. Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media) CostLatencyAccess.

1 Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5)

15-740/ Computer Architecture Lecture 19: Caching II Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/31/2011.

CS161 – Design and Architecture of Computer

COSC3330 Computer Architecture

ECE232: Hardware Organization and Design

Memory COMPUTER ARCHITECTURE

CS161 – Design and Architecture of Computer

CS510 Operating System Foundations

Energy-Efficient Address Translation

Evolution in Memory Management Techniques

TLC: A Tag-less Cache for reducing dynamic first level Cache Energy

Lecture: DRAM Main Memory

FIGURE 12-1 Memory Hierarchy

Chapter 5 Memory CSE 820.

Address-Value Delta (AVD) Prediction

Presentation transcript:

2013/10/21 Yun-Chung Yang An Energy-Efficient Adaptive Hybrid Cache Jason Cong, Karthik Gururaj, Hui Huang, Chunyue Liu, Glenn Reinman, Yi Zou Computer Science Department, University of California, Los Angeles Low Power Electronics and Design (ISLPED) 2011 International Symposium on Page 67 – 72

Abstract Related Work What’s the Problem  Run-time behavior  Set balancing Proposed Method – Adaptive Hybrid Cache Experiment Result Conclusion 2

By reconfiguring part of the cache as software- managed scratchpad memory (SPM), hybrid caches manage to handle both unknown and predictable memory access patterns. However, existing hybrid caches provide a flexible partitioning of cache and SPM without considering adaptation to the run-time cache behavior. Previous cache set balancing techniques are either energy-inefficient or require serial tag and data array access. In this paper an adaptive hybrid cache is proposed to dynamically remap SPM blocks from high-demand cache sets to low-demand cache sets. This achieves 19%, 25%, 18% and 18% energy-runtime-production reductions over four previous representative techniques on a wide range of benchmarks. 3

4 Software Controlled Partition cache from way to blocks [2], [3] Energy(set utilization) SPM Serial tag/data access [4], [5] [8]-[10] This paper Victim cache[11] Balanced Cache[12] Column caching FlexCache Reconfigurable cache Virtual Local Store Not need tag/data access Use CAM memory

Previous hybrid cache designs partition the cache and SPM without adaption to the run time cache behavior.  Due to SPM allocation is uniform and cache behavior is non- uniform 。 Hot cache set problem. 5

Adaptive Hybrid Cache (a) Original Code (b) Transformed Code for AH-cache Compiler’s job (c) Memory space for AH -cache (d) SPM blocks Adaptive Mapping to cache (e) SPM mapping in cache (f) SPM mapping look-up table(SMLT) 6

Hardware for AH-cache  The green part is for accessing the SPM.  Perform addressing cache and SMLT look-up in parallel with the virtual address calculation in the pipeline architecture. 7

Dynamically remap SPM blocks from high- demand cache sets to low-demand cache sets  Migrate SPM blocks from high demand sets to low demand sets Initial map of SPM block in cache is random 8

Goal : Application requires P SPM blocks while AH-cache can provide Q SPM blocks at most, there will be S=P-Q blocks to adaptive satisfy the high-demand cache sets. Solution :  Use victim tag buffer to capture the demand of each set.  Floating block holder – Record the cache sets that hold the floating blocks. 9

Re-insertion bit = 1, means this set is highly demanded and re-inserted into FBH queue. 10

Problem : Worst-case of S cycles delay for searching, where S is max size of SPM. Solution :  Storing re-insertion bit into table called re-insertion bit table(RIBT)  Search parallel in 16 re-insertion bit. 11

Storage Overhead Critical path of SMLT table in pipeline stage Comparison with other(performance, miss rate, energy)  Non-adaptive hybrid cache(N)  Non-adaptive hybrid cache + balanced cache(B)  Non-adaptive hybrid cache + victim cache(Vp, Vs)  Phase-reconfigurable hybrid cache(R)  Adaptive hybrid cache(AH)  Static optimized hybrid cache(S) 12

13 16KB, 2 way-associative, 128 sets, 64B data block, 4B tag entry size.  B SPM blocks  SMLT bit entries(1 valid + 6 bit index + 2 bit way)  Insertion flag + 4-bit counter  FBH queue bit entries  RIBT 8 16-bit entries  Total : 0.4KB, 3% of the hybrid cache size

14 32nm technology(cache block size is 64B) 0.2ns for critical path fits in 4GHz core.

15 R reduces cache miss by 34% AH-cache reduces the cache miss by 52%  AH-cache outer perform B because of B cache allocate SPM in uniform way without considering cache set demand.  Victim cache depends on its size.

AH-cache outer perform B, Vp, Vs and R by 3%, 4%, 8% and 12%, respectively 16

Although the proposed method with additional hardware, SMLT, VTB and adaptive mapping unit, AH-cache still have energy reduction of 16%, 22%, 10% and 7% compared to designs B, Vp, Vs and R, respectively. 17

AH-cache dynamic remapping SPM blocks to cache block on run-time behavior. AH-cache achieves energy-runtime-production reduction of 19%, 25%, 18% and 18% over designs B, Vp, Vs and R. My comment  Detail explained  Mention the usage of tag while in SPM mode 18