Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy Snehasish Kumar, Hongzhou Zhao†, Arrvindh Shriraman Eric Matthews∗, Sandhya.

Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
Snehasish Kumar, Hongzhou Zhao†, Arrvindh Shriraman Eric Matthews∗, Sandhya Dwarkadas†, Lesley Shannon∗ School of Computing Sciences, Simon Fraser University †Department of Computer Science, University of Rochester ∗School of Engineering Science, Simon Fraser University IEEE/ACM 45th Annual International Symposium on Microarchitecture

Overview Problem being addressed Prior work Challenges encountered
Proposed solution Results Review of the paper

Problem Statement Current caches are designed for fixed block size
Block granularity is decided based on average spatial locality across general workload Many common applications exhibit fairly lower spatial localities than the design point Unused words occupy between 17—80% of a 64K L1 cache and between 1%—79% of a 1MB private LLC Unused-word transfers comprise 11% of on-chip cache hierarchy energy consumption.

Prior Work Sector Cache: Aims at minimizing bandwidth Fetches sub blocks Works well for applications with low-moderate spatial locality Reduces mispredicted spatial prefetches thus reducing bandwidth usage and energy consumption

Prior Work

Challenges to static design
Small cache lines tend to fetch fewer unused words Impose significant performance penalties by missing opportunities for spatial prefetching in applications with high spatial locality Larger cache line sizes effectively prefetch neighboring words Increase number of unused words and network bandwidth. Determining a fixed optimal point for the cache line granularity at hardware design time is a challenge

Amoeba Cache A novel cache architecture that supports fine grain (per-miss) dynamic adjustment of cache block size and the # of blocks per set. Filters out unused words in a block and prevents them from being inserted into the cache, allowing the resulting free space to hold other useful blocks. Can adapt to the available spatial locality

Amoeba Cache

Amoeba Cache How to grow and shrink the # of tags as the # of blocks per set vary with block granularity? Eliminates Tag array Tags and Data are kept together in a single data array Bitmaps indicate which words in the data array are tags Valid bits are also stored as bitmap

Amoeba Cache

Amoeba Cache Data lookup:
Tag bitmap activates words from the array containing tags for comparison Minimum size of Amoeba block is 2 words(1 tag 1 data), adjacent words cannot be tags

Amoeba Cache Block Insertion:
The Valid bitmap is used to determine empty slots within the set 1 means allocated, 0 means empty For an incoming block with m words, m consecutive 0’s are searched The replacement algorithm is triggered repeatedly until space is created To reclaim space from Amoeba block, Tag bits and Valid bits from the bitset corresponding to the block are unset. Uses LRU policy to choose a way within the cache and randomly picks a random candidate from within the set for block replacement

Partial Misses: Low probability (5 in 1K accesses) Identify overlapping blocks Evict to MSHR Allocate space for entire block Miss request Block copied

Results Result 1:Amoeba-Cache increases cache capacity by harvesting space from unused words and can achieve an 18% reduction in both L1 and L2 miss rate. Result 2:Amoeba-Cache adaptively sizes the cache block granularity and reduces L1↔L2 bandwidth by 46% and L2↔Memory bandwidth by 38%.

Results

Results Result 3: Boosts performance by 10% on commercial applications saving 11% energy of on-chip memory hierarchy. Off-chip L2 to MM sees a mean energy reduction of 41% across all workloads.

Review of the paper Connects proposed work with prior work
Builds up on the proposed idea gradually with sufficient examples Algorithms explained well with control flow diagrams Lots of comparative graphs to support the results The maximum region size(RMAX) stated differently in text(bytes) and in diagram(words). The reason for using of the metric 1/(MissRate×Bandwidth) for determining block granularity could have been supported better.

Thank you!

Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy Snehasish Kumar, Hongzhou Zhao†, Arrvindh Shriraman Eric Matthews∗, Sandhya.

Similar presentations

Presentation on theme: "Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy Snehasish Kumar, Hongzhou Zhao†, Arrvindh Shriraman Eric Matthews∗, Sandhya."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy Snehasish Kumar, Hongzhou Zhao†, Arrvindh Shriraman Eric Matthews∗, Sandhya.

Similar presentations

Presentation on theme: "Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy Snehasish Kumar, Hongzhou Zhao†, Arrvindh Shriraman Eric Matthews∗, Sandhya."— Presentation transcript:

Similar presentations

About project

Feedback