Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler-in-the-Loop ADL-driven Early Architectural Exploration Aviral Shrivastava 1 Nikil Dutt 1 Alex Nicolau 1 Eugene Earlie 2 1 Center For Embedded.

Similar presentations


Presentation on theme: "Compiler-in-the-Loop ADL-driven Early Architectural Exploration Aviral Shrivastava 1 Nikil Dutt 1 Alex Nicolau 1 Eugene Earlie 2 1 Center For Embedded."— Presentation transcript:

1 Compiler-in-the-Loop ADL-driven Early Architectural Exploration Aviral Shrivastava 1 Nikil Dutt 1 Alex Nicolau 1 Eugene Earlie 2 1 Center For Embedded Computer Systems, University of California, Irvine, CA, USA 2 Strategic CAD Labs, Intel, Hudson, MA, USASC L

2 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 2 Bypassing Improves Performance Pipelining improves performance Pipelining improves performance Limited by pipeline hazards Bypasses eliminate certain data hazards Bypasses eliminate certain data hazards Further improve performance FD RF R1  R2 + R3R4  R4 + R1 FD OR X1 RF X2 WB R1  R2 + R3R4  R4 + R1 OR X1 X2 WB R1

3 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 3 Area and Power consumption Area and Power consumption Wide multiplexers Bypass Control logic Bypass wires Impact of Bypassing Cycle time Cycle time Bypasses may be a part of timing-critical path FDX1 RFX2 WB M1 M2 Wiring congestion Wiring congestion Overall chip complexity Overall chip complexity deeply pipelined out-of-order processors P. Ahuja et al., The Performance Impact of incomplete bypassing in processor pipelines MICRO 1995 A. Abnous and N. Bagerzadeh, Pipelining and bypassing in a VLIW processor, IEEE Trans... 1995. OR

4 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 4 Problem, Solution and Problem Problem – How do I customize bypasses? Problem – How do I customize bypasses? Important for Embedded Systems Solution – Solution – Keep only the most beneficial bypasses Area, Power and Performance trade-off FDORX1 RF X2 WB Problems – Problems – How to Compile for a processor with partial bypassing? Requires Compiler-in-the-Loop Exploration

5 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 5 Compiler-in-the-Loop Exploration How to compile for Partial Bypassing How to compile for Partial Bypassing Compiler in the exploration loop Compiler in the exploration loop Power-Performance-Area Tradeoff Power-Performance-Area Tradeoff

6 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 6 Bypass Sensitive Scheduling No Hazard Bypasses transfer data between dependent operations Bypasses transfer data between dependent operations Missing bypasses cause pipeline hazard Missing bypasses cause pipeline hazard Hazard FD OR X1 RF X2 WB R1  R2 + R3R4  R4 + R1 R1 R1  R2 + R3 R1 R1  R2 + R3 R1 Bypass-sensitive compiler should be able to Bypass-sensitive compiler should be able to detect and avoid pipeline hazards

7 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 7 Operation Table Operation Table Operation Table for ADD R1 R2 R3 FDORX1 RF X2 WB C1C2 C3 BRF C4 C5 Operation Table is a binding between Operation Table is a binding between Operation and Processor Resources and Registers Can detect Resource Hazards Can detect Resource Hazards OTs model processor resources Can detect Data Hazards Can detect Data Hazards OTs model processor registers 1. F 2. D 3. OR ReadOperands R2 C1 RF R3 C2 RF C5 BRF DestOperands R1 RF 4. X1 WriteOperands R1 C4 BRF 5. X2 6. XWB WriteOperands R1 C3 RF Details are in the paper !!

8 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 8 Up to 20% Performance Improvement on MiBench Up to 20% performance improvement

9 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 9 Compiler-in-the-Loop Exploration How to compile for Partial Bypassing How to compile for Partial Bypassing Compiler in the exploration loop Compiler in the exploration loop Power-Performance-Area Tradeoff Power-Performance-Area Tradeoff

10 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 10 Compiler-in-the-Loop Exploration Application Bypass Configuration gcc –O3 Executable Traditional Cycles Cycle Accurate Simulator Traditional Exploration CIL Cycles OT-based Compiler Executable Cycle Accurate Simulator Bypass-sensitive Compiler-in-the-Loop Exploration

11 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 11 Bypass Exploration 7 pipeline stages can bypass result 7 pipeline stages can bypass result We vary which pipeline stage bypasses a result We vary which pipeline stage bypasses a result 2 7 = 128 bypass configurations Encode bypass configuration Configuration 28 = Bypass paths from MWB, M2 and XWB are present Bypass paths from MWB, M2 and XWB are present F1F2IDRFX1X2XWB M1 D1D2DWB MWBM2

12 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 12 Bypass Explorations on XScale CIL-compiler can effectively exploit the bypass configuration CIL-compiler can effectively exploit the bypass configuration Significant performance difference Significant performance difference bitcount 850000 900000 950000 1000000 1050000 1100000 1150000 1200000 1250000 0326496128 Bypass Source Configurations Execution Cycles Traditional CIL

13 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 13 X-bypass explorations in XScale XWB X1X2 XWB X2 X2 X1 XWB X1 XWB X2 X1 X-bypass Configuration bitcount 850000 900000 950000 1000000 1050000 1100000 1150000 1200000 - Execution Cycles Traditional CIL Difference in trends F1F2IDRFX1X2XWB M1 D1D2DWB MWBM2

14 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 14 M-bypass explorations in XScale Difference in trends X1X2XWB D1D2DWB F1F2IDRF M1MWBM2

15 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 15 bitcount 860000 880000 900000 920000 940000 960000 980000 -DWBD2DWB D2 D Bypass Configurations Execution Cycles Traditional CIL D-bypass exploration in XScale Difference in trends X1 D1D2DWB F1F2IDRF X2XWB M1MWBM2

16 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 16 Compiler-in-the-Loop Exploration How to compile for Partial Bypassing How to compile for Partial Bypassing Compiler in the exploration loop Compiler in the exploration loop Power-Performance-Area Tradeoff Power-Performance-Area Tradeoff

17 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 17 Performance-Energy-Area Trade-off Point 2 Point 1 Design Point 1 Design Point 1 no bypass from MWB and XWB to first operand 18% less area and 14% less energy consumption of bypass control logic 2% performance loss Design Point 2 Design Point 2 Only D2 and X2 bypass to first operand 25% less area and 16% less energy consumption of bypass control logic 6% performance loss

18 TechCon 2005 Copyright © 2005 UCI ACES Laboratory 18 Summary Bypassing improves performance but is costly in terms of area and power Bypassing improves performance but is costly in terms of area and power Partial bypassing presents valuable trade-offs, however poses challenges in compilation Partial bypassing presents valuable trade-offs, however poses challenges in compilation We propose a compilation approach for partial bypassing We propose a compilation approach for partial bypassing Up to 20% performance improvement by bypass-sensitive compiler We propose Compiler-in-the-Loop Exploration of partial bypasses. We propose Compiler-in-the-Loop Exploration of partial bypasses. More meaningful exploration of design space CIL Exploration of bypasses is able to discover interesting pareto-optimal design points CIL Exploration of bypasses is able to discover interesting pareto-optimal design points


Download ppt "Compiler-in-the-Loop ADL-driven Early Architectural Exploration Aviral Shrivastava 1 Nikil Dutt 1 Alex Nicolau 1 Eugene Earlie 2 1 Center For Embedded."

Similar presentations


Ads by Google