Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.

Slides:



Advertisements
Similar presentations
Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC
Advertisements

44 nd DAC, June 4-8, 2007 Processor External Interrupt Verification Tool (PEVT) Fu-Ching Yang, Wen-Kai Huang and Ing-Jer Huang Dept. of Computer Science.
CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit Kynan Fraser.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
VESPA: Portable, Scalable, and Flexible FPGA-Based Vector Processors Peter YiannacourasUniv. of Toronto J. Gregory Steffan Univ. of Toronto Jonathan Rose.
Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
Educational Computer Architecture Experimentation Tool Dr. Abdelhafid Bouhraoua.
ECE 232 L1 Intro.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 1 Introduction.
Specific Choice of Soft Processor Features Mark Grover Prof. Greg Steffan Dept. of Electrical and Computer Engineering.
Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
SPREE Tutorial Peter Yiannacouras April 13, 2006.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Automated Design of Custom Architecture Tulika Mitra
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Amalgam: a Reconfigurable Processor for Future Fabrication Processes Nicholas P. Carter University of Illinois at Urbana-Champaign.
Data Parallel FPGA Workloads: Software Versus Hardware Peter Yiannacouras J. Gregory Steffan Jonathan Rose FPL 2009.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
Fine-Grain Performance Scaling of Soft Vector Processors Peter Yiannacouras Jonathan Rose Gregory J. Steffan ESWEEK – CASES 2009, Grenoble, France Oct.
Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.
SPREE RTL Generator RTL Simulator RTL CAD Flow 3. Area 4. Frequency 5. Power Correctness1. 2. Cycle count SPREE Benchmarks Verilog Results 3. Architecture.
Reconfigurable Computing Zack Smaridge Everett Salley 1/54.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation.
© 2010 Altera Corporation—Public Easily Build Designs Using Altera’s Video and Image Processing Framework 2010 Technology Roadshow.
2015/10/22\course\cpeg323-08F\Final-Review F.ppt1 Midterm Review Introduction to Computer Systems Engineering (CPEG 323)
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
The Microarchitecture of FPGA-Based Soft Processors Peter Yiannacouras CARG - June 14, 2005.
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
Lab 2 Parallel processing using NIOS II processors
Architecture Selection of a Flexible DSP Core Using Re- configurable System Software July 18, 1998 Jong-Yeol Lee Department of Electrical Engineering,
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.
Improving Memory System Performance for Soft Vector Processors Peter Yiannacouras J. Gregory Steffan Jonathan Rose WoSPS – Oct 26, 2008.
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Lx: A Technology Platform for Customizable VLIW Embedded Processing.
Content Project Goals. Workflow Background. System configuration. Working environment. System simulation. System synthesis. Benchmark. Multicore.
1 Scaling Soft Processor Systems Martin Labrecque Peter Yiannacouras and Gregory Steffan University of Toronto FCCM 4/14/2008.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
1 Comparing FPGA vs. Custom CMOS and the Impact on Processor Microarchitecture Henry Wong Vaughn Betz, Jonathan Rose.
Presenter: Darshika G. Perera Assistant Professor
Programmable Hardware: Hardware or Software?
ECE354 Embedded Systems Introduction C Andras Moritz.
Computer Architecture
Application-Specific Customization of Soft Processor Microarchitecture
Improving Program Efficiency by Packing Instructions Into Registers
A Review of Processor Design Flow
The Stanford FLASH Multiprocessor
A High Performance SoC: PkunityTM
Guest Lecturer TA: Shreyas Chand
Improving Memory System Performance for Soft Vector Processors
A small SOPC-based aircraft autopilot system that contains an FPGA with a Nios processor core, a DSP processor, and memory is seen above. The bottom sensor.
Computer Architecture
Measuring the Gap between FPGAs and ASICs
Application-Specific Customization of Soft Processor Microarchitecture
Presentation transcript:

Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical and Computer Engineering

2 Processors and FPGA Systems We seek improvement through customization Processors are the “heart” of FPGA systems Memory Interface UART Custom Logic Ethernet Performs coordination and even computation  Better processors => less hardware to design Soft Processor

3 Enablers for customizing soft processors 1. FPGA Reconfigurability  No hardware cost for altering a design 2. Applications differ in architectural requirements  Can specialize architecture for each application 3. A soft processor might be used to run either: a) A single application b) A single class of applications c) Many applications, but can be reconfigured We want to evaluate effectiveness of specialization

4 Research Goals 1. Investigate “Application-tuning”  Tune microarchitecture to favour an application  Preserve general purpose functionality 2. Investigate “Instruction-set Subsetting”  Sacrifice general purpose functionality  Eliminate hardware not required by application Investigate efficiency through real implementations

5 SPREE SPREE System (Soft Processor Rapid Exploration Environment) RTL ISADatapath ■ Input: Processor description 1. Verify ISA against datapath 2. Datapath Instantiation 3. Control Generation ■Multi-cycle/variable-cycle FUs ■Multiplexer select signals ■Interlocking ■Branch handling ■ SPREE System ■ Output: Synthesizable Verilog Processor Description

6 Back-end Infrastructure RTL 2. Resource Usage 3. Clock Frequency 4. Power 1.Cycle Count Quartus II 5.0 CAD Software Modelsim RTL Simulator Benchmarks (MiBench, Dhrystone 2.1, RATES, XiRisc) Stratix 1S40C5 We can measure area/performance/energy accurately

7 Exploration of Architectural Customizations 1. Architectural-tuning 2. Instruction-set subsetting

8 What exactly are we tuning? We focus on core microarchitecture Hardware vs software multiplication Shifter implementation Pipelining  Depth  Organization  Forwarding Not ISA (we use MIPS-I)

9 Comparison to Altera’s Nios II Has three variations:  Nios II/e – unpipelined, no HW multiplier  Nios II/s – 5-stage, with HW multiplier  Nios II/f – 6-stage, dynamic branch prediction Caveats – not completely fair comparison  Very similar but tweaked ISA  Nios II Supports exceptions, OS, and caches We do not and save on the hardware costs We believe the comparison is meaningful

10 SPREE vs Nios II Competitive while allowing more customization smaller faster -3-stage pipe -HW multiply -Multiply-based shifter

11 1. Architectural Tuning Experiment Hardware vs software multiplication Shifter implementation Pipelining  Depth  Organization  Forwarding What is best overall (general purpose) configuration What are best per application (application-tuned) configurations

12 Performance per Area of All Processors 14.1% improvement over general purpose, some 30%

13 2. Instruction-set Subsetting SPREE automatically removes  Unused connections  Unused components Reduce processor by reducing the ISA  Can create application-specific processor Eliminate unused parts of the ISA

14 Instruction-set Usage of Benchmark Set Applications do not use complete ISA Strong potential for hardware reduction

15 Fraction of Area Area Reduction from Instruction-set Subsetting Area reduced by 60% in some, 25% on average

16 Combining Application Tuning and Instruction-set Subsetting 33.2% Efficiency Gain: Subsetting 16%, Combined 24.5%

17 Summary of Presented Architectural Conclusions Application tuning: 14% average efficiency gain  Will only increase as we explore more architectures Instruction-set Subsetting  Up to 60% area & energy savings  16% average efficiency gain Combined Application tuning & Subsetting  24.5% average efficiency gain

18 General Purpose vs App-tuned vs Nios II Choose best Nios II overall and per application SPREE customizations allow 17% better efficiency than Nios II 17%

19 Future Work Consider other exciting architectural axes  Branch prediction, aggressive forwarding  ISA changes  Datapaths (eg. VLIW)  Caches and memory hierarchy Compiler assistance  Can improve tuning & subsetting

20 Metrics for Measurement Efficiency: Performance per area Performance: MIPS Area: Equivalent Stratix Logic Elements (LEs)  Relative silicon areas used for RAMs/Multipliers

21 Energy Impact of Subsetting Up to 60% energy savings and 25% on average

22 Microarchitecture What exactly are we tuning? Control Pipeline Datapath FUs Reg File ISA Extensions (Tensilica, Stretch) Memory Hierarchy Instruction Set HW Multiply FU Shifter type Pipelining  Depth  Organization  Forwarding Are we tuning enough?

23 Performance per Area of All Processors 14.1% improvement over general purpose, some 30%

24 Processors and FPGA Designs Soft Processor Our goal is to explore customization of soft processors FPGA P Custom Logic UART Ethernet Memory Interface