Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,

Similar presentations


Presentation on theme: "1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,"— Presentation transcript:

1 1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi, Sandhya Dwarkadas, Greg Semeraro, Grigorios Magklis, and Michael Scott ECE and CS Departments University of Rochester

2 2 Why Adaptive Structures? General purpose uP are “one size fits all” But, needs vary across (within) applications Can save considerable energy by matching resources to the application Objective: Less energy for same performance by adapting storage structures to application

3 3 Related Work Adaptable cache –Balasubramonian et al., MICRO 2000 –Dhodapkar and Smith, ISCA 2002 Adaptable issue logic –Buyuktosunoglu et al., GLS VLSI 2001 –Folegnani and Gonzalez, ISCA 2000

4 4 Common Themes A single adaptive structure Use of global information for feedback Exploration-based (caches)

5 5 Related Work (cont) Adaptable IQ, LSQ, and ROB –Ponomarev et al., MICRO 2001 –Three (3) adaptable structures –Reconfigurations based on local state

6 6 Integrating Multiple Adaptive Structures L2 Unified Cache ROB Rename map FPQ IPREG IIQ LSQ L1 Dcache Branch predict L1 Icache Integer Memory Floating Pt FPREG Int FUs FP FUs FetchQ

7 7 Challenges Multiple (9) adaptive structures creates state explosion problem Use of global information makes assigning cause and effect difficult Potential for additive performance effects among the structures

8 8 Approach: Local Management Local information for configuration decisions Tight control over performance variance

9 9 Part I: The Caches L2 Unified Cache ROB Rename map FPQ IPREG IIQ LSQ L1 Dcache Branch predict L1 Icache Integer Memory Floating Pt FPREG Int FUs FP FUs FetchQ

10 10 The Accounting Cache A access (primary) B access (secondary) Sequential accesses, A then B Save energy on A access hit Swap blocks on A access miss 2013 2013 2013 2013 2013 Swap A1 B3 A2 B2 A3 B1 A4 B0

11 11 Most-Recently-Used Statistics 0123 Way1234 LineABCD 0123 0123 0123 0123 0123 0123 MRU State Transitions MRU[0] MRU State Counters MRU[1] MRU[2] MRU[3] Misses 3 2 1 0 0 A A A B B C

12 12 Configuration Evaluation MRU[0]MRU[1]MRU[2]MRU[3]Misses 32100 (lru)(mru) Delay = 6 D A + 3 D B Delay = 6 D A + 1 D B Delay = 6 D A Energy = 6 E 1 + 3 E 3 Energy = 7 E 2 Energy = 6 E 3 Energy = 6 E 4 BASE

13 13 Tolerance and the Bank Account Tolerance allows more delay than BASE –D TOL = D BASE (1 + TOL) –TOL = {0.015, 0.062, 0.25} (1/64, 1/16, 1/4) Bank account allows accumulation of unused tolerance Use account credits in later intervals –Allows aggressive resizing –Amortizes mistakes over many intervals

14 14 Memory Hierarchy 20132013 2013 L1 I-Cache (A/B) L1 D-Cache (A, no B) L2 Unified Cache (A/B) One Possible Configuration

15 15 Environment Simplescalar simulator Microarchitecture is similar to Alpha 21264 Benchmarks are a mix of SPEC95, SPEC2K, and Olden Energy models for buffers and caches from Buyuktosunoglu et al., GLS VLSI 2001 and Balasubramonian et al., MICRO 2000

16 16 Cache Results

17 17 Part II: Queues, Regs, and ROB L2 Unified Cache ROB Rename map FPQ IPREG IIQ LSQ L1 Dcache Branch predict L1 Icache Integer Memory Floating Pt FPREG Int FUs FP FUs FetchQ

18 18 Resizable Queues/Reg File m Buffer PNPN P1P1 N partitions of m elements

19 19 Buffer Sizing Distribution of Buffer Size 0 0 0 Full Grow buffer Proper size Precise shrink ave 8K cycle period Tolerances: 1.5% (1/64) 6.2% (1/16) 25.0% (1/4) With Limited Histogramming

20 20 Resizing the Register File Issue: Do not know when registers expire Solution: To make reg file smaller, move values out of partition (P) to be turned off –First, inhibit new assignments to P –Next, use a software interrupt routine to move values via normal rename logic mov r1 r1 –Register mappings automatically updated

21 21 Floating Point App Results

22 22 Summary Results

23 23 Conclusion Simultaneous adaptation of all major regular structures –Accounting cache –Limited histogramming for buffers –Adaptable register file Local control yet tolerable performance loss Future work –Augment local control with global control for bounded performance loss


Download ppt "1 Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David H. Albonesi,"

Similar presentations


Ads by Google