Presentation is loading. Please wait.

Presentation is loading. Please wait.

Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.

Similar presentations


Presentation on theme: "Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical."— Presentation transcript:

1 Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical and Computer Engineering

2 2 Processors and FPGA Systems We seek improvement through customization Processors are the “heart” of FPGA systems Memory Interface UART Custom Logic Ethernet Performs coordination and even computation  Better processors => less hardware to design Soft Processor

3 3 Enablers for customizing soft processors 1. FPGA Reconfigurability  No hardware cost for altering a design 2. Applications differ in architectural requirements  Can specialize architecture for each application 3. A soft processor might be used to run either: a) A single application b) A single class of applications c) Many applications, but can be reconfigured We want to evaluate effectiveness of specialization

4 4 Research Goals 1. Investigate “Application-tuning”  Tune microarchitecture to favour an application  Preserve general purpose functionality 2. Investigate “Instruction-set Subsetting”  Sacrifice general purpose functionality  Eliminate hardware not required by application Investigate efficiency through real implementations

5 5 SPREE SPREE System (Soft Processor Rapid Exploration Environment) RTL ISADatapath ■ Input: Processor description 1. Verify ISA against datapath 2. Datapath Instantiation 3. Control Generation ■Multi-cycle/variable-cycle FUs ■Multiplexer select signals ■Interlocking ■Branch handling ■ SPREE System ■ Output: Synthesizable Verilog Processor Description

6 6 Back-end Infrastructure RTL 2. Resource Usage 3. Clock Frequency 4. Power 1.Cycle Count Quartus II 5.0 CAD Software Modelsim RTL Simulator Benchmarks (MiBench, Dhrystone 2.1, RATES, XiRisc) Stratix 1S40C5 We can measure area/performance/energy accurately

7 7 Exploration of Architectural Customizations 1. Architectural-tuning 2. Instruction-set subsetting

8 8 What exactly are we tuning? We focus on core microarchitecture Hardware vs software multiplication Shifter implementation Pipelining  Depth  Organization  Forwarding Not ISA (we use MIPS-I)

9 9 Comparison to Altera’s Nios II Has three variations:  Nios II/e – unpipelined, no HW multiplier  Nios II/s – 5-stage, with HW multiplier  Nios II/f – 6-stage, dynamic branch prediction Caveats – not completely fair comparison  Very similar but tweaked ISA  Nios II Supports exceptions, OS, and caches We do not and save on the hardware costs We believe the comparison is meaningful

10 10 SPREE vs Nios II Competitive while allowing more customization smaller faster -3-stage pipe -HW multiply -Multiply-based shifter

11 11 1. Architectural Tuning Experiment Hardware vs software multiplication Shifter implementation Pipelining  Depth  Organization  Forwarding What is best overall (general purpose) configuration What are best per application (application-tuned) configurations

12 12 Performance per Area of All Processors 14.1% improvement over general purpose, some 30%

13 13 2. Instruction-set Subsetting SPREE automatically removes  Unused connections  Unused components Reduce processor by reducing the ISA  Can create application-specific processor Eliminate unused parts of the ISA

14 14 Instruction-set Usage of Benchmark Set Applications do not use complete ISA Strong potential for hardware reduction

15 15 Fraction of Area Area Reduction from Instruction-set Subsetting Area reduced by 60% in some, 25% on average

16 16 Combining Application Tuning and Instruction-set Subsetting 33.2% Efficiency Gain: Subsetting 16%, Combined 24.5%

17 17 Summary of Presented Architectural Conclusions Application tuning: 14% average efficiency gain  Will only increase as we explore more architectures Instruction-set Subsetting  Up to 60% area & energy savings  16% average efficiency gain Combined Application tuning & Subsetting  24.5% average efficiency gain

18 18 General Purpose vs App-tuned vs Nios II Choose best Nios II overall and per application SPREE customizations allow 17% better efficiency than Nios II 17%

19 19 Future Work Consider other exciting architectural axes  Branch prediction, aggressive forwarding  ISA changes  Datapaths (eg. VLIW)  Caches and memory hierarchy Compiler assistance  Can improve tuning & subsetting

20 20 Metrics for Measurement Efficiency: Performance per area Performance: MIPS Area: Equivalent Stratix Logic Elements (LEs)  Relative silicon areas used for RAMs/Multipliers

21 21 Energy Impact of Subsetting Up to 60% energy savings and 25% on average

22 22 Microarchitecture What exactly are we tuning? Control Pipeline Datapath FUs Reg File ISA Extensions (Tensilica, Stretch) Memory Hierarchy Instruction Set HW Multiply FU Shifter type Pipelining  Depth  Organization  Forwarding Are we tuning enough?

23 23 Performance per Area of All Processors 14.1% improvement over general purpose, some 30%

24 24 Processors and FPGA Designs Soft Processor Our goal is to explore customization of soft processors FPGA P Custom Logic UART Ethernet Memory Interface


Download ppt "Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical."

Similar presentations


Ads by Google