Restrictive Compression Techniques to Increase Level 1 Cache Capacity

Slides:

Advertisements

Similar presentations

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Advertisements

August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.

LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.

1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.

Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Oguz Ergin*, Deniz Balkan, Kanad Ghose, Dmitry Ponomarev Department.

Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton University M. Franklin – University of Maryland Presented by:

Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.

Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.

Benefits of Early Cache Miss Determination Memik G., Reinman G., Mangione-Smith, W.H. Proceedings of High Performance Computer Architecture Pages: 307.

1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.

Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.

Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.

Dyer Rolan, Basilio B. Fraguela, and Ramon Doallo Proceedings of the International Symposium on Microarchitecture (MICRO’09) Dec /7/14.

Compressed Memory Hierarchy Dongrui SHE Jianhua HUI.

1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.

Defining Anomalous Behavior for Phase Change Memory

Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA ICPP, Kaohsiung, Taiwan,

Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University.

Lecture 19: Virtual Memory

2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large.

Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)

Computer Architecture Lecture 26 Fasih ur Rehman.

VIRTUAL MEMORY By Thi Nguyen. Motivation  In early time, the main memory was not large enough to store and execute complex program as higher level languages.

Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06.

Low-Power Cache Organization Through Selective Tag Translation for Embedded Processors with Virtual Memory Support Xiangrong Zhou and Peter Petrov Proceedings.

1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.

Increasing Cache Efficiency by Eliminating Noise Prateek Pujara & Aneesh Aggarwal {prateek,

Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,

Reduction of Register File Power Consumption Approach: Value Lifetime Characteristics - Pradnyesh Gudadhe.

Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani.

Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.

1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.

IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.

Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.

Cache Data Compaction: Milestone 2 Edward Ma, Siva Penke, Abhijeeth Nuthan.

COSC2410: LAB 19 INTRODUCTION TO MEMORY/CACHE DIRECT MAPPING 1.

Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.

CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.

Lecture 5 Cache Operation

Dynamic Associative Caches:

CS 704 Advanced Computer Architecture

Two Dimensional Highly Associative Level-Two Cache Design

COSC3330 Computer Architecture

Understanding Operating Systems Seventh Edition

Selective Code Compression Scheme for Embedded System

SECTIONS 1-7 By Astha Chawla

CSC 4250 Computer Architectures

Cache Memory Presentation I

Lecture: Large Caches, Virtual Memory

Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy Snehasish Kumar, Hongzhou Zhao†, Arrvindh Shriraman Eric Matthews∗, Sandhya.

Part V Memory System Design

CMSC 611: Advanced Computer Architecture

An Introduction to Cache Design

Lecture 23: Cache, Memory, Virtual Memory

Module IV Memory Organization.

ICIEV 2014 Dhaka, Bangladesh

Module IV Memory Organization.

Adapted from slides by Sally McKee Cornell University

Part V Memory System Design

Contents Memory types & memory hierarchy Virtual memory (VM)

CSC3050 – Computer Architecture

Principle of Locality: Memory Hierarchies

A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara, Aneesh Aggarwal {prateek, aneesh} @ binghamton.edu State University of New York at Binghamton Presented By: Prateek Pujara 11 December 2019 ICCD'05 San Jose

OUTLINE Introduction Motivation Restrictive Compression Schemes AWN - All Words Narrow AHS - Additional Half-word Storage Enhanced Techniques AAHS - Adaptive AHS OATS - Optimizing Address TagS Conclusion 11 December 2019 ICCD'05 San Jose

Processor Memory Gap Performance gap between processor and memory is increasing. Cache memory is used to bridge this gap. 11 December 2019 ICCD'05 San Jose

Cache Problems Suggested Solutions Access Latency Energy consumption Small size Pipelining Decoupled tag and data 11 December 2019 ICCD'05 San Jose

Pipelined Cache Access Decode Set Compare tag + Byte-offset Read Data Drive output 11 December 2019 ICCD'05 San Jose

Is that enough? Pipelining the cache prevents reduction in throughput. Decoupling the access results in minimal energy consumption. However small size can result in performance loss. 11 December 2019 ICCD'05 San Jose

Alternative solutions Cache compression Many elaborative techniques are proposed for L2 cache/memory. L2 cache can tolerate the overhead Can L1 tolerate? 11 December 2019 ICCD'05 San Jose

Previous work Frequent Value Cache (FVC) Small cache is provided to store frequently seen values in a cache block. Higher order insignificant bits are compressed to save energy. 11 December 2019 ICCD'05 San Jose

Problem with L1 cache compression Cannot tolerate the increase in latency. Elaborate techniques cannot be used. Should not update the byte-offset. For example A block is compressed by ignoring all the insignificant higher order bits/bytes. The byte-offset of each word depends on size of words before it. 11 December 2019 ICCD'05 San Jose

Contribution of this work We investigate techniques to increase the L1 cache capacity narrow widths of the data is explored. Our compression techniques AWN (All Words Narrow) AHS (Additional Half-word Storage) AAHS (Adaptive AHS) We also propose OATS to reduce the tag space requirement, which is inevitable with any compression technique. 11 December 2019 ICCD'05 San Jose

Narrow width data Narrow Word: A word, which can be represented using half the number of bits. (16 in case of 32-bit architecture) 11 December 2019 ICCD'05 San Jose

TERMS USED IN THE PAPER Narrow Cache Block A cache block, which contains all narrow words i.e. all the words are represented by half the number of bits. Normal Cache Block A cache block in which all the words are represented using entire set of bits. Physical Cache Block Physical space provided in the cache to store a cache block. 11 December 2019 ICCD'05 San Jose

AWN All Words Narrow All the words in the block should be narrow words. All the narrow words are compressed into half the size. Thus size of the cache block is reduced to half. 11 December 2019 ICCD'05 San Jose

AWN All Words Narrow 000003af 00000000 ffff93af 00007401 ..00 0000 ..00 0000 ..11 1001 ..00 0111 11 December 2019 ICCD'05 San Jose

space for another narrow AWN All Words Narrow ffff 0000 03af 0000 0000 93af 0000 7401 physical cache block narrow cache block space for another narrow cache block 11 December 2019 ICCD'05 San Jose

AWN All Words Narrow 000003af 00000000 ffff93af 00009401 ..00 0000 ..00 0000 ..11 1001 ..00 1001 000003af 00000000 ffff93af 00009401 normal cache block 11 December 2019 ICCD'05 San Jose

Additional Hardware Additional tag space provided for each physical cache block. A width bit is provided for each physical cache block. width bit = 0 ----- normal cache block width bit = 1 ----- 2 narrow cache blocks 11 December 2019 ICCD'05 San Jose

Implementation details Byte-offset = 3 Byte-offset = 3 Width bit = 1 Width bit = 0 Byte-Offset decoder Byte-Offset decoder Size = 32 bits Size = 32 bits 16 bits 16 bits 32 bits 2 narrow blocks 1 normal block 03af0000 93af7401 2794ffff 98f14000 03af0000 93af7401 2794ffff 98f14000 16 bits 16 bits 32 bits 93af 98f1 2794fff Conventional case or Compressed block with width bit = 0 Compressed block with width bit = 1 11 December 2019 ICCD'05 San Jose

Implementation details Replacement Policy The replacement policy is still LRU. If the new cache block is a narrow cache block then conventional LRU policy is used. If the new cache block is a normal cache block, MRU information is used for replacement. This ensures that the technique does not perform worse than the conventional cache. 11 December 2019 ICCD'05 San Jose

Implementation details Replacement Policy narrow cache block (5) (15) (10) (12) Width-bit 1 Width-bit 1 new cache block (normal cache block) narrow cache block (6) narrow cache block (16) Width-bit 1 new - normal cache block (0) Width-bit 11 December 2019 ICCD'05 San Jose

AHS Additional Half-word Storage Limitations of AWN Even a single normal word makes the whole block a normal block. Additional half-word storage is provided to convert these blocks to narrow blocks. An extra storage bit is provided for each word in the physical cache block. 11 December 2019 ICCD'05 San Jose

AHS Additional Half-word Storage narrow cache block 0000 03af 0000 0000 ffff 93af 0000 7401 space for another narrow cache block physical cache block xxxx 2 extra half-words per physical cache block extra storage bits 11 December 2019 ICCD'05 San Jose

AHS Additional Half-word Storage normal cache block 0000 03af 0000 0000 f f f f 93af 01f0 9401 space for another narrow cache block physical cache block 03af 0000 93af 9401 xxxx 01f0 xxxx 1 2 extra half-words per physical cache block extra storage bits 11 December 2019 ICCD'05 San Jose

Increase in cache capacity 11 December 2019 ICCD'05 San Jose

Limitations of AHS Additional half-word storage space is not optimally utilized. Extra half-word space is equally divided among the potential narrow cache blocks that can occupy the physical cache block. Thus the physical cache block cannot contain one block with 1 normal-sized word and another block with 3 normal sized words in case of 4 extra half-word space provided. 11 December 2019 ICCD'05 San Jose

Adaptive AHS - AAHS An adaptive scheme that allows the blocks to take varied number of extra half-words. 2 extra storage bits are required because a cache block can use more than 1 half words. To avoid the increase in extra storage bits we restrict a cache block to take only 3 half-words in the case of 4 extra half-word space provided. 11 December 2019 ICCD'05 San Jose

Adaptive AHS - AAHS 03af 0000 93af 7401 3801 0000 f8b2 9401 ffff 93af 00ff 7401 0100 3801 0010 0000 ffff f8b2 0000 9401 narrow cache block with 1 normal-sized words narrow cache block with 3 normal-sized words physical cache block 03af 0000 93af 7401 3801 0000 f8b2 9401 xxxx 00ff 0100 xxxx 0010 xxxx 0000 xxxx 00 01 00 01 00 00 10 00 11 extra storage bits 4 extra half-words per physical cache block 11 December 2019 ICCD'05 San Jose

Optimizing Address TagS - OATS Additional tag space and tag comparisons are required for AWN and AHS techniques. Intuitively, the higher order bits of the address tags in a set are expected to be the same. Instead of providing the entire set of bits used for the address tag, only a small number of additional tag bits are provided for each physical cache block. 11 December 2019 ICCD'05 San Jose

Optimizing Address TagS - OATS A physical cache block with 22 bits address tag in the conventional cache, may be provided with 24 bits, partitioned into 3 parts: 20 higher order bits - common for both the blocks. 2 lower order bits separate for each block. A physical cache block can hold 1 normal cache block or 2 narrow cache blocks that have the same 20 higher order tag bits. 11 December 2019 ICCD'05 San Jose

Increase in cache capacity 11 December 2019 ICCD'05 San Jose

CONCLUSION We proposed Restrictive Compression techniques that do not require update of byte-offset and hence does not impact the cache access latency. Our basic technique AWN compresses a block only if all the words in the block are of small size and results in 20% increase in cache capacity. We extended the AWN technique by providing some additional space for upper half words (AHS, AAHS) which resulted in about 50% increase in cache capacity while incurring a 38% increase in storage space. To avoid the additional tag requirement (which is inevitable with compression) we proposed OATS technique which reduces the overhead of AHS to about 30%. 11 December 2019 ICCD'05 San Jose

Questions/Comments? Prateek Pujara - prateek@binghamton.edu Aneesh Aggarwal - aneesh@binghamton.edu Electrical and Computer Engineering Department State University of New York at Binghamton 11 December 2019 ICCD'05 San Jose