University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

Slides:



Advertisements
Similar presentations
A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy J. Zebchuk, E. Safi, and A. Moshovos.
Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Fabián E. Bustamante, Spring 2007
D. Tam, R. Azimi, L. Soares, M. Stumm, University of Toronto Appeared in ASPLOS XIV (2009) Reading Group by Theo 1.
CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Memory Organization.
1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
Skewed Compressed Cache
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
Ioana Burcea * Stephen Somogyi §, Andreas Moshovos*, Babak Falsafi § # Predictor Virtualization *University of Toronto Canada § Carnegie Mellon University.
A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy Jason Zebchuk, Elham Safi, and Andreas Moshovos
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Two Ways to Exploit Multi-Megabyte Caches AENAO Research Toronto Kaveh Aasaraai Ioana Burcea Myrto Papadopoulou Elham Safi Jason Zebchuk Andreas.
1 Linux Operating System 許 富 皓. 2 Memory Addressing.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
Moshovos © 1 ReCast: Boosting L2 Tag Line Buffer Coverage “for Free” Won-Ho Park, Toronto Andreas Moshovos, Toronto Babak Falsafi, CMU
Moshovos © 1 RegionScout: Exploiting Coarse Grain Sharing in Snoop Coherence Andreas Moshovos
Virtual Memory 1 1.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
CS2100 Computer Organisation Cache II (AY2015/6) Semester 1.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
Reduction of Register File Power Consumption Approach: Value Lifetime Characteristics - Pradnyesh Gudadhe.
1 Efficient System-on-Chip Energy Management with a Segmented Counting Bloom Filter Mrinmoy Ghosh- Georgia Tech Emre Özer- ARM Ltd Stuart Biles- ARM Ltd.
Scavenger: A New Last Level Cache Architecture with Global Block Priority Arkaprava Basu, IIT Kanpur Nevin Kirman, Cornell Mainak Chaudhuri, IIT Kanpur.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
Memory Hierarchies Sonish Shrestha October 3, 2013.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
University of Toronto Department of Electrical And Computer Engineering Jason Zebchuk RegionTracker: Optimizing On-Chip Cache.
Exploiting Value Locality in Physical Register Files Saisanthosh Balakrishnan Guri Sohi University of Wisconsin-Madison 36 th Annual International Symposium.
Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
CDA 5155 Virtual Memory Lecture 27. Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media) CostLatencyAccess.
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
1 University of Maryland Using Information About Cache Evictions to Measure the Interactions of Application Data Structures Bryan R. Buck Jeffrey K. Hollingsworth.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Region-Centric Memory Design AENAO Research Group Patrick Akl, M.A.Sc. Ioana Burcea, Ph.D. C. Myrto Papadopoulou, M.A.Sc. C. Elham Safi, Ph.D. C. Jason.
Memory Hierarchy Ideal memory is fast, large, and inexpensive
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
Basic Performance Parameters in Computer Architecture:
Appendix B. Review of Memory Hierarchy
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
ECE 445 – Computer Organization
Energy-Efficient Address Translation
Lecture 21: Memory Hierarchy
Part V Memory System Design
Reducing Memory Reference Energy with Opportunistic Virtual Caching
TLC: A Tag-less Cache for reducing dynamic first level Cache Energy
Lecture 17: Case Studies Topics: case studies for virtual memory and cache hierarchies (Sections )
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Lecture 20: OOO, Memory Hierarchy
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 20: OOO, Memory Hierarchy
CS-447– Computer Architecture Lecture 20 Cache Memories
Patrick Akl and Andreas Moshovos AENAO Research Group
Overview Problem Solution CPU vs Memory performance imbalance
Virtual Memory 1 1.
Presentation transcript:

University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006 Workshop on Complexity-Effective Design - June 2006 RegionTracker: Using Dual-Grain Tracking for Energy Efficient Cache Lookup

June 18, 2006 Zebchuk © 2RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup I$ D$ CPU L2 DATA Need for Energy Efficient L2 Lookups n Locate blocks in high level caches more efficiently n Conventional tags are getting larger l Technology, microarchitectural and application trends l Larger caches use more energy n Demonstrate lookup energy reductions up to 82% l Up to 38% average across SPEC L2 TAGS

June 18, 2006 Zebchuk © 3RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup Dual-Grain Tracking Memory as a collection of REGIONS Memory as a collection of blocks n Region: 2 n sized, aligned memory area n Similar concept already used by various structures l TLB, Page Table

June 18, 2006 Zebchuk © 4RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup Program Behavior / Motivation n Few active Regions n “Bursty” access n Mostly gone before accessed again n RegionTracker: l Identify First Misses l Track block location for Few Regions In principleIn practice And before is touched again ðHow can this reduce energy?

June 18, 2006 Zebchuk © 5RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup I$ D$ CPU L2 DATA RegionTracker: Low Power Lookups n Frequent case: l Few Active Regions l Macroscopically Transient n RegionTracker: l Dynamically Identify Newly Touched Regions l Track block location using a compact structure L2 TAGS I$ D$ CPU L2 DATA L2 TAGS

June 18, 2006 Zebchuk © 6RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup RegionTracker Organization n CRH for First Miss Detection: 5% of tags n CBV for Tracking blocks within 128 regions: 17.5% n 128 x 8kB regions = 1MB tracked (at most 25% of a 4MB L2) I$ D$ CPU L2 DATA L2 TAGS

June 18, 2006 Zebchuk © 7RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup Which Regions are Cached? n If we had as many counters as regions: l Block Allocation: counter[region]++ l Block Eviction: counter[region]-- l Region cached only if counter[Region] non-zero n Not Practical: l E.g., 8KB Regions and 4GB Memory  512K counters Region Tag offset counter

June 18, 2006 Zebchuk © 8RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup Which Regions are Cached? Region Tag offset counter hash() n Imprecise: l Records a superset of currently cached Regions l False positives: lost opportunity, correctness preserved l Small: e.g., 512-4k entries for 2MB or 4MB cache n First Miss: l Full location information for ALL BLOCKS l No need for temporal locality Cached Region Hash (CRH)

June 18, 2006 Zebchuk © 9RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup CBV: Tracking Blocks within Regions Region Tag Block info Region Tag offsetblock Block #0 Block #63 Which data way is the block cached at? n Parallel lookup of RegionTag and Block Info n Experiments with 64 and 128 entry, 8-way set-associative CBV 4 256

June 18, 2006 Zebchuk © 10RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup Conventional Solution n Tag Hierarchy l Requires Locality u Temporal u Spatial as long as L2 block size > L1 block size l Latency limited l Not very energy efficient l RegionTracker is Better I$ D$ CPU L2 DATA L2 TAGS

June 18, 2006 Zebchuk © 11RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup Tag Hierarchy Set Tag Block TagOffset Set Tag #0 Tag #7 n Each access reads/writes 23 bytes n Sequential Comparison of Set Tag AND Block Tag ========

June 18, 2006 Zebchuk © 12RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup Complexity Tradeoffs n Tag Heirarchy l Read/Write 184 bits u Complex Wiring to transfer 184 bits l Updated on every Tag Hierarchy miss n RegionTracker l Read/Write 4 bits u Only 4 bits transferred from tag array l Updated on L2 misses only l Flexible implementation (vertical/horizontal partitioning) l No modification to conventional cache policies/structures

June 18, 2006 Zebchuk © 13RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup Methodology n Processor l Deeply-Pipelined l 128-entry window l 8-way superscalar l 32kB L1 instruction and data caches n Spec CPU 2000 / Reference Inputs n 10 Billion Committed Instr. Samples after 100B n Used CACTI to estimate energy requirements

June 18, 2006 Zebchuk © 14RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup Energy Savings w/ 4MB L2 n Average reduction of 38% n Up to 82% reduction (gzip) n Robust performance, significant power savings for most programs Better CRH/CBV:

June 18, 2006 Zebchuk © 15RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup Tag Hierarchy Savings n Only 2 configurations actually save power! n Similar fraction of requests served by RegionTracker n RegionTracker much better! Sets:

June 18, 2006 Zebchuk © 16RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup RegionTracker Summary n Coarse-Grain tracking to capture first misses n Dual-Grain tracking to track blocks n Service many L2 Requests n Reduce L2 Lookup Energy n Does not require temporal locality n Can exploit spatial locality much better than a tag hierarchy n Significantly reduces L2 Lookup Power with minimal additional complexity

June 18, 2006 Zebchuk © 17RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup RegionTracker: Using Dual-Grain Tracking for Energy Efficient Cache Lookup Jason Zebchuk and Andreas Moshovos {zebchuk, University of Toronto Department of Electrical and Computer Engineering