Download presentation
Presentation is loading. Please wait.
Published byJulian Freeman Modified over 9 years ago
1
1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi, Sandhya Dwarkadas, Greg Semeraro, Grigorios Magklis, and Michael Scott ECE and CS Departments University of Rochester
2
2 Why Adaptive Structures? General purpose uP are “one size fits all” But, needs vary across (within) applications Can save considerable energy by matching resources to the application Objective: Less energy for same performance by adapting storage structures to application
3
3 Related Work Adaptable cache –Balasubramonian et al., MICRO 2000 –Dhodapkar and Smith, ISCA 2002 Adaptable issue logic –Buyuktosunoglu et al., GLS VLSI 2001 –Folegnani and Gonzalez, ISCA 2000
4
4 Common Themes A single adaptive structure Use of global information for feedback Exploration-based (caches)
5
5 Related Work (cont) Adaptable IQ, LSQ, and ROB –Ponomarev et al., MICRO 2001 –Three (3) adaptable structures –Reconfigurations based on local state
6
6 Integrating Multiple Adaptive Structures L2 Unified Cache ROB Rename map FPQ IPREG IIQ LSQ L1 Dcache Branch predict L1 Icache Integer Memory Floating Pt FPREG Int FUs FP FUs FetchQ
7
7 Challenges Multiple (9) adaptive structures creates state explosion problem Use of global information makes assigning cause and effect difficult Potential for additive performance effects among the structures
8
8 Approach: Local Management Local information for configuration decisions Tight control over performance variance
9
9 Part I: The Caches L2 Unified Cache ROB Rename map FPQ IPREG IIQ LSQ L1 Dcache Branch predict L1 Icache Integer Memory Floating Pt FPREG Int FUs FP FUs FetchQ
10
10 The Accounting Cache A access (primary) B access (secondary) Sequential accesses, A then B Save energy on A access hit Swap blocks on A access miss 2013 2013 2013 2013 2013 Swap A1 B3 A2 B2 A3 B1 A4 B0
11
11 Most-Recently-Used Statistics 0123 Way1234 LineABCD 0123 0123 0123 0123 0123 0123 MRU State Transitions MRU[0] MRU State Counters MRU[1] MRU[2] MRU[3] Misses 3 2 1 0 0 A A A B B C
12
12 Configuration Evaluation MRU[0]MRU[1]MRU[2]MRU[3]Misses 32100 (lru)(mru) Delay = 6 D A + 3 D B Delay = 6 D A + 1 D B Delay = 6 D A Energy = 6 E 1 + 3 E 3 Energy = 7 E 2 Energy = 6 E 3 Energy = 6 E 4 BASE
13
13 Tolerance and the Bank Account Tolerance allows more delay than BASE –D TOL = D BASE (1 + TOL) –TOL = {0.015, 0.062, 0.25} (1/64, 1/16, 1/4) Bank account allows accumulation of unused tolerance Use account credits in later intervals –Allows aggressive resizing –Amortizes mistakes over many intervals
14
14 Memory Hierarchy 20132013 2013 L1 I-Cache (A/B) L1 D-Cache (A, no B) L2 Unified Cache (A/B) One Possible Configuration
15
15 Environment Simplescalar simulator Microarchitecture is similar to Alpha 21264 Benchmarks are a mix of SPEC95, SPEC2K, and Olden Energy models for buffers and caches from Buyuktosunoglu et al., GLS VLSI 2001 and Balasubramonian et al., MICRO 2000
16
16 Cache Results
17
17 Part II: Queues, Regs, and ROB L2 Unified Cache ROB Rename map FPQ IPREG IIQ LSQ L1 Dcache Branch predict L1 Icache Integer Memory Floating Pt FPREG Int FUs FP FUs FetchQ
18
18 Resizable Queues/Reg File m Buffer PNPN P1P1 N partitions of m elements
19
19 Buffer Sizing Distribution of Buffer Size 0 0 0 Full Grow buffer Proper size Precise shrink ave 8K cycle period Tolerances: 1.5% (1/64) 6.2% (1/16) 25.0% (1/4) With Limited Histogramming
20
20 Resizing the Register File Issue: Do not know when registers expire Solution: To make reg file smaller, move values out of partition (P) to be turned off –First, inhibit new assignments to P –Next, use a software interrupt routine to move values via normal rename logic mov r1 r1 –Register mappings automatically updated
21
21 Floating Point App Results
22
22 Summary Results
23
23 Conclusion Simultaneous adaptation of all major regular structures –Accounting cache –Limited histogramming for buffers –Adaptable register file Local control yet tolerable performance loss Future work –Augment local control with global control for bounded performance loss
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.