Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project.

Slides:

Advertisements

Similar presentations

1 ICCD 2010 Amsterdam, the Netherlands Rami Sheikh North Carolina State University Mazen Kharbutli Jordan Univ. of Science and Technology Improving Cache.

Advertisements

Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

A Performance Comparison of DRAM Memory System Optimizations for SMT Processors Zhichun ZhuZhao Zhang ECE Department Univ. Illinois at ChicagoIowa State.

Coherence Ordering for Ring-based Chip Multiprocessors Mike Marty and Mark D. Hill University of Wisconsin-Madison.

Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *

1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.

Managing Wire Delay in Large CMP Caches Bradford M. Beckmann David A. Wood Multifacet Project University of Wisconsin-Madison MICRO /8/04.

1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.

Performance of Cache Memory

1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.

1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim,

Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.

Teaching Old Caches New Tricks: Predictor Virtualization Andreas Moshovos Univ. of Toronto Ioana Burcea’s Thesis work Some parts joint with Stephen Somogyi.

Improving Cache Performance by Exploiting Read-Write Disparity

Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison

WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.

(C) 2002 Milo MartinHPCA, Feb Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

Exploiting Load Latency Tolerance for Relaxing Cache Design Constraints Ramu Pyreddy, Gary Tyson Advanced Computer Architecture Laboratory University of.

EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.

EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.

Evaluating Non-deterministic Multi-threaded Commercial Workloads Computer Sciences Department University of Wisconsin—Madison

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.

Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.

(C) 2004 Daniel SorinDuke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1, Milo M. K. Martin 2, Mark D. Hill 3, David.

Skewed Compressed Cache

Using Compression to Improve Chip Multiprocessor Performance Alaa R. Alameldeen Dissertation Defense Wisconsin Multifacet Project University of Wisconsin-Madison.

Systems I Locality and Caching

VPC3: A Fast and Effective Trace-Compression Algorithm Martin Burtscher.

Interactions Between Compression and Prefetching in Chip Multiprocessors Alaa R. Alameldeen* David A. Wood Intel CorporationUniversity of Wisconsin-Madison.

Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches Gennady Pekhimenko Vivek Seshadri Onur Mutlu, Todd C. Mowry Phillip B.

DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.

Cooperative Caching for Chip Multiprocessors Jichuan Chang Guri Sohi University of Wisconsin-Madison ISCA-33, June 2006.

Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University.

ISLPED’99 International Symposium on Low Power Electronics and Design

Revisiting Hardware-Assisted Page Walks for Virtualized Systems

Electrical and Computer Engineering University of Wisconsin - Madison Prefetching Using a Global History Buffer Kyle J. Nesbit and James E. Smith.

Simulating a $2M Commercial Server on a $2K PC Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill.

A Robust Main-Memory Compression Scheme Magnus Ekman and Per Stenström Chalmers University of Technology Göteborg, Sweden.

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison.

Computer Science Department In-N-Out: Reproducing Out-of-Order Superscalar Processor Behavior from Reduced In-Order Traces Kiyeon Lee and Sangyeun Cho.

MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,

Virtual Hierarchies to Support Server Consolidation Mike Marty Mark Hill University of Wisconsin-Madison ISCA 2007.

Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.

Exploiting Compressed Block Size as an Indicator of Future Reuse

CMP L2 Cache Management Presented by: Yang Liu CPS221 Spring 2008 Based on: Optimizing Replication, Communication, and Capacity Allocation in CMPs, Z.

Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.

1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.

Exploiting Value Locality in Physical Register Files Saisanthosh Balakrishnan Guri Sohi University of Wisconsin-Madison 36 th Annual International Symposium.

COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.

Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:

Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

A Robust Main-Memory Compression Scheme (ISCA 06) Magnus Ekman and Per Stenström Chalmers University of Technolog, Göteborg, Sweden Speaker: 雋中.

Computer Sciences Department University of Wisconsin-Madison

Zhichun Zhu Zhao Zhang ECE Department ECE Department

ASR: Adaptive Selective Replication for CMP Caches

Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper, Dan Sorin§, Mark.

Jason F. Cantin, Mikko H. Lipasti, and James E. Smith

Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD

ECE 445 – Computer Organization

Reducing Memory Reference Energy with Opportunistic Virtual Caching

Address-Value Delta (AVD) Prediction

CARP: Compression-Aware Replacement Policies

Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

Lecture 20: OOO, Memory Hierarchy

Lecture 20: OOO, Memory Hierarchy

A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,

Presentation transcript:

Adaptive Cache Compression for High-Performance Processors Alaa Alameldeen and David Wood University of Wisconsin-Madison Wisconsin Multifacet Project

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression2 Overview  Design of high performance processors Processor speed improves faster than memory Processor speed improves faster than memory  Memory latency dominates performance Need more effective cache designs Need more effective cache designs  On-chip cache compression + Increases effective cache size - Increases cache hit latency  Does cache compression help or hurt?

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression3 Does Cache Compression Help or Hurt?

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression4 Does Cache Compression Help or Hurt?

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression5 Does Cache Compression Help or Hurt?

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression6 Does Cache Compression Help or Hurt?  Adaptive Compression determines when compression is beneficial

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression7 Outline  Motivation  Cache Compression Framework Compressed Cache Hierarchy Compressed Cache Hierarchy Decoupled Variable-Segment Cache Decoupled Variable-Segment Cache  Adaptive Compression  Evaluation  Conclusions

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression8 Compressed Cache Hierarchy Instruction Fetcher Fetcher L2 Cache (Compressed) L1 D-Cache (Uncompressed) Load-StoreQueue L1 I-Cache (Uncompressed) L1 Victim Cache CompressionPipeline DecompressionPipeline UncompressedLineBypass From Memory To Memory

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression9 Address B Decoupled Variable-Segment Cache  Objective: pack more lines into the same space Data Area Address A Tag Area  2-way set-associative with 64-byte lines  Tag Contains Address Tag, Permissions, LRU (Replacement) Bits

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression10 Address B Decoupled Variable-Segment Cache  Objective: pack more lines into the same space Data Area Address A Tag Area Address C Address D Add two more tags

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression11 Address B Decoupled Variable-Segment Cache  Objective: pack more lines into the same space Data Area Address A Tag Area Address C Address D Add Compression Size, Status, More LRU bits

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression12 Address B Decoupled Variable-Segment Cache  Objective: pack more lines into the same space Data Area Address A Tag Area Address C Address D Divide Data Area into 8-byte segments

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression13 Decoupled Variable-Segment Cache  Objective: pack more lines into the same space Data Area Tag Area Address B Address A Address C Address D Data lines composed of 1-8 segments

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression14 Addr B compressed 2 Decoupled Variable-Segment Cache  Objective: pack more lines into the same space Data Area Addr A uncompressed 3 Addr C compressed 6 Addr D compressed 4 Tag Area Compression Status Compressed Size Tag is present but line isn’t

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression15 Outline  Motivation  Cache Compression Framework  Adaptive Compression Key Insight Key Insight Classification of L2 accesses Classification of L2 accesses Global compression predictor Global compression predictor  Evaluation  Conclusions

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression16 Adaptive Compression  Use past to predict future  Key Insight: LRU Stack [Mattson, et al., 1970] indicates for each reference whether compression helps or hurts LRU Stack [Mattson, et al., 1970] indicates for each reference whether compression helps or hurts Benefit(Compression ) > Cost(Compression ) Do not compress future lines Compress Yes No

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression17 Cost/Benefit Classification  Classify each cache reference  Four-way SA cache with space for two 64-byte lines Total of 16 available segments Addr A uncompressed 3 Addr B compressed 2 LRU Stack Data Area Addr C compressed 6 Addr D compressed 4

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression18 An Unpenalized Hit  Read/Write Address A LRU Stack order = 1 ≤ 2  Hit regardless of compression Uncompressed Line  No decompression penalty Neither cost nor benefit Addr A uncompressed 3 Addr B compressed 2 LRU Stack Data Area Addr C compressed 6 Addr D compressed 4

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression19 A Penalized Hit  Read/Write Address B LRU Stack order = 2 ≤ 2  Hit regardless of compression Compressed Line  Decompression penalty incurred Compression cost Addr A uncompressed 3 Addr B compressed 2 LRU Stack Data Area Addr C compressed 6 Addr D compressed 4

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression20 An Avoided Miss  Read/Write Address C LRU Stack order = 3 > 2  Hit only because of compression Compression benefit: Eliminated off-chip miss Addr A uncompressed 3 Addr B compressed 2 LRU Stack Data Area Addr C compressed 6 Addr D compressed 4

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression21 An Avoidable Miss  Read/Write Address D Line is not in the cache but tag exists at LRU stack order = 4 Missed only because some lines are not compressed Potential compression benefit Addr A uncompressed 3 Addr B compressed 2 LRU Stack Data Area Addr C compressed 6 Addr D compressed 4 Sum(CSize) = 15 ≤ 16

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression22 An Unavoidable Miss  Read/Write Address E LRU stack order > 4  Compression wouldn’t have helped Line is not in the cache and tag does not exist Neither cost nor benefit Addr A uncompressed 3 Addr B compressed 2 LRU Stack Data Area Addr C compressed 6 Addr D compressed 4

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression23 Compression Predictor  Estimate: Benefit(Compression) – Cost(Compression)  Single counter : Global Compression Predictor (GCP) Saturating up/down 19-bit counter Saturating up/down 19-bit counter  GCP updated on each cache access Benefit: Increment by memory latency Benefit: Increment by memory latency Cost: Decrement by decompression latency Cost: Decrement by decompression latency Optimization: Normalize to decompression latency = 1 Optimization: Normalize to decompression latency = 1  Cache Allocation Allocate compressed line if GCP  0 Allocate compressed line if GCP  0 Allocate uncompressed lines if GCP < 0 Allocate uncompressed lines if GCP < 0

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression24 Outline  Motivation  Cache Compression Framework  Adaptive Compression  Evaluation Simulation Setup Simulation Setup Performance Performance  Conclusions

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression25 Simulation Setup  Simics full system simulator augmented with: Detailed OoO processor simulator [TFSim, Mauer, et al., 2002] Detailed OoO processor simulator [TFSim, Mauer, et al., 2002] Detailed memory timing simulator [Martin, et al., 2002] Detailed memory timing simulator [Martin, et al., 2002]  Workloads: Commercial workloads: Commercial workloads:  Database servers: OLTP and SPECJBB  Static Web serving: Apache and Zeus SPEC2000 benchmarks: SPEC2000 benchmarks:  SPECint: bzip, gcc, mcf, twolf  SPECfp: ammp, applu, equake, swim

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression26 System configuration  A dynamically scheduled SPARC V9 uniprocessor  Configuration parameters: L1 Cache Split I&D, 64KB each, 2-way SA, 64B line, 2- cycles/access L2 Cache Unified 4MB, 8-way SA, 64B line, 20cycles+decompression latency per access Memory 4GB DRAM, 400-cycle access time, 128 outstanding requests Processor pipeline 4-wide superscalar, 11-stage pipeline: fetch (3), decode(3), schedule(1), execute(1+), retire(3) Reorder buffer 64 entries

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression27 Simulated Cache Configurations  Always: All compressible lines are stored in compressed format Decompression penalty for all compressed lines Decompression penalty for all compressed lines  Never: All cache lines are stored in uncompressed format Cache is 8-way set associative with half the number of sets Cache is 8-way set associative with half the number of sets Does not incur decompression penalty Does not incur decompression penalty  Adaptive: Our adaptive compression scheme

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression28 Performance SpecINT SpecFPCommercial

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression29 Performance

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression30 Performance 35% Speedu p 18% Slowdown

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression31 Performance Adaptive performs similar to the best of Always and Never Bug in GCP update

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression32 Effective Cache Capacity

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression33 Cache Miss Rates Penalized Hits Per Avoided Miss Misses Per 1000 Instructions

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression34 Adapting to L2 Sizes Misses Per 1000 Instructions Penalized Hits Per Avoided Miss

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression35 Conclusions  Cache compression increases cache capacity but slows down cache hit time Helps some benchmarks (e.g., apache, mcf) Helps some benchmarks (e.g., apache, mcf) Hurts other benchmarks (e.g., gcc, ammp) Hurts other benchmarks (e.g., gcc, ammp)  Our Proposal: Adaptive compression Uses (LRU) replacement stack to determine whether compression helps or hurts Uses (LRU) replacement stack to determine whether compression helps or hurts Updates a single global saturating counter on cache accesses Updates a single global saturating counter on cache accesses  Adaptive compression performs similar to the better of Always Compress and Never Compress

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression36 Backup Slides  Frequent Pattern Compression (FPC) Frequent Pattern Compression (FPC) Frequent Pattern Compression (FPC)  Decoupled Variable-Segment Cache Decoupled Variable-Segment Cache Decoupled Variable-Segment Cache  Classification of L2 Accesses Classification of L2 Accesses Classification of L2 Accesses  (LRU) Stack Replacement (LRU) Stack Replacement (LRU) Stack Replacement  Cache Miss Rates Cache Miss Rates Cache Miss Rates  Adapting to L2 Sizes – mcf Adapting to L2 Sizes Adapting to L2 Sizes  Adapting to L1 Size Adapting to L1 Size Adapting to L1 Size  Adapting to Decompression Latency – mcf Adapting to Decompression Latency Adapting to Decompression Latency  Adapting to Decompression Latency – ammp Adapting to Decompression Latency Adapting to Decompression Latency  Phase Behavior – gcc Phase Behavior Phase Behavior  Phase Behavior – mcf Phase Behavior Phase Behavior  Can We Do Better Than Adaptive? Can We Do Better Than Adaptive? Can We Do Better Than Adaptive?

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression37 Decoupled Variable-Segment Cache  Each set contains four tags and space for two uncompressed lines  Data area divided into 8-byte segments  Each tag is composed of: Address tag Address tag Permissions Permissions CStatus : 1 if the line is compressed, 0 otherwise CStatus : 1 if the line is compressed, 0 otherwise CSize: Size of compressed line in segments CSize: Size of compressed line in segments LRU/replacement bits LRU/replacement bits Same as uncompressed cache

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression38 Frequent Pattern Compression  A significance-based compression algorithm  Related Work: X-Match and X-RL Algorithms [Kjelso, et al., 1996] X-Match and X-RL Algorithms [Kjelso, et al., 1996] Address and data significance-based compression [Farrens and Park, 1991, Citron and Rudolph, 1995, Canal, et al., 2000] Address and data significance-based compression [Farrens and Park, 1991, Citron and Rudolph, 1995, Canal, et al., 2000]  A 64-byte line is decompressed in five cycles  More details in technical report: “Frequent Pattern Compression: A Significance-Based Compression Algorithm for L2 Caches,” Alaa R. Alameldeen and David A. Wood, Dept. of Computer Sciences Technical Report CS-TR , April 2004 (available online). “Frequent Pattern Compression: A Significance-Based Compression Algorithm for L2 Caches,” Alaa R. Alameldeen and David A. Wood, Dept. of Computer Sciences Technical Report CS-TR , April 2004 (available online).

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression39 Frequent Pattern Compression (FPC)  A significance-based compression algorithm combined with zero run-length encoding Compresses each 32-bit word separately Compresses each 32-bit word separately Suitable for short ( byte) cache lines Suitable for short ( byte) cache lines Compressible Patterns: zero runs, sign-ext. 4,8,16-bits, zero- padded half-word, two SE half-words, repeated byte Compressible Patterns: zero runs, sign-ext. 4,8,16-bits, zero- padded half-word, two SE half-words, repeated byte A 64-byte line is decompressed in a five-stage pipeline A 64-byte line is decompressed in a five-stage pipeline  More details in technical report: “Frequent Pattern Compression: A Significance-Based Compression Algorithm for L2 Caches,” Alaa R. Alameldeen and David A. Wood, Dept. of Computer Sciences Technical Report CS-TR , April 2004 (available online). “Frequent Pattern Compression: A Significance-Based Compression Algorithm for L2 Caches,” Alaa R. Alameldeen and David A. Wood, Dept. of Computer Sciences Technical Report CS-TR , April 2004 (available online).

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression40 Classification of L2 Accesses  Cache hits: Unpenalized hit: Hit to an uncompressed line that would have hit without compression Unpenalized hit: Hit to an uncompressed line that would have hit without compression - Penalized hit: Hit to a compressed line that would have hit without compression + Avoided miss: Hit to a line that would NOT have hit without compression  Cache misses: + Avoidable miss: Miss to a line that would have hit with compression Unavoidable miss: Miss to a line that would have missed even with compression Unavoidable miss: Miss to a line that would have missed even with compression

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression41  Differentiate penalized hits and avoided misses? Only hits to top half of the tags in the LRU stack are penalized hits Only hits to top half of the tags in the LRU stack are penalized hits  Differentiate avoidable and unavoidable misses?  Is not dependent on LRU replacement Any replacement algorithm for top half of tags Any replacement algorithm for top half of tags Any stack algorithm for the remaining tags Any stack algorithm for the remaining tags (LRU) Stack Replacement

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression42 Cache Miss Rates

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression43 Adapting to L2 Sizes x x Misses Per 1000 Instructions Penalized Hits Per Avoided Miss

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression44 Adapting to L1 Size

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression45 Adapting to Decompression Latency

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression46 Adapting to Decompression Latency

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression47 Phase Behavior Predictor Value (K) Cache Size (MB)

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression48 Phase Behavior Predictor Value (K) Cache Size (MB)

ISCA 2004Alaa Alameldeen – Adaptive Cache Compression49 Can We Do Better Than Adaptive?  Optimal is an unrealistic configuration: Always with no decompression penalty