Application-Specific Customization of Soft Processor Microarchitecture

Slides:



Advertisements
Similar presentations
Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC
Advertisements

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.
VESPA: Portable, Scalable, and Flexible FPGA-Based Vector Processors Peter YiannacourasUniv. of Toronto J. Gregory Steffan Univ. of Toronto Jonathan Rose.
Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
ECE 232 L1 Intro.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 1 Introduction.
Specific Choice of Soft Processor Features Mark Grover Prof. Greg Steffan Dept. of Electrical and Computer Engineering.
Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
SPREE Tutorial Peter Yiannacouras April 13, 2006.
Automated Design of Custom Architecture Tulika Mitra
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Data Parallel FPGA Workloads: Software Versus Hardware Peter Yiannacouras J. Gregory Steffan Jonathan Rose FPL 2009.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
Fine-Grain Performance Scaling of Soft Vector Processors Peter Yiannacouras Jonathan Rose Gregory J. Steffan ESWEEK – CASES 2009, Grenoble, France Oct.
SPREE RTL Generator RTL Simulator RTL CAD Flow 3. Area 4. Frequency 5. Power Correctness1. 2. Cycle count SPREE Benchmarks Verilog Results 3. Architecture.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
The Microarchitecture of FPGA-Based Soft Processors Peter Yiannacouras CARG - June 14, 2005.
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.
Improving Memory System Performance for Soft Vector Processors Peter Yiannacouras J. Gregory Steffan Jonathan Rose WoSPS – Oct 26, 2008.
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Lx: A Technology Platform for Customizable VLIW Embedded Processing.
Sunpyo Hong, Hyesoon Kim
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
1 Scaling Soft Processor Systems Martin Labrecque Peter Yiannacouras and Gregory Steffan University of Toronto FCCM 4/14/2008.
PipeliningPipelining Computer Architecture (Fall 2006)
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
1 Comparing FPGA vs. Custom CMOS and the Impact on Processor Microarchitecture Henry Wong Vaughn Betz, Jonathan Rose.
Presenter: Darshika G. Perera Assistant Professor
New Opportunities for Computer Architecture Research Using High-Density FPGAs and Design Tools Nahi Abdul-Ghani, Patrick Akl, Mohammad El-Majzoub, Maroulla.
Prototyping SoC-based Gate Drive Logic for Power Convertors by Generating code from Simulink models. Researchers Rounak Siddaiah, Graduate Student-University.
Flopoco in LegUp Jenny deng.
Floating-Point FPGA (FPFPGA)
ECE354 Embedded Systems Introduction C Andras Moritz.
Computer Architecture
Design-Space Exploration
Introduction to Programmable Logic
Application-Specific Customization of Soft Processor Microarchitecture
Improving java performance using Dynamic Method Migration on FPGAs
Improving Program Efficiency by Packing Instructions Into Registers
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
A Review of Processor Design Flow
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
A High Performance SoC: PkunityTM
Improving Memory System Performance for Soft Vector Processors
A small SOPC-based aircraft autopilot system that contains an FPGA with a Nios processor core, a DSP processor, and memory is seen above. The bottom sensor.
ARM ORGANISATION.
Measuring the Gap between FPGAs and ASICs
Arani Bhattacharya, Han Chen, Peter Milder, Samir R. Das
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Edward S. Rogers Sr. Department of Electrical and Computer Engineering

Processors and FPGA Systems Processors lie at the “heart” of FPGA systems UART Custom Logic Soft Processor Memory Interface Ethernet Performs coordination and even computation Better processors => less hardware to design We seek improvement through customization

Motivating Application-Specific Customizations of Soft Processors FPGA Configurability Can consider unlimited processor variants A soft processor might be used to run either: A single application A single class of applications Many applications, but can be reconfigured Applications differ in architectural requirements Can specialize architecture for each application We want to evaluate effectiveness of specialization

Research Goals To investigate The potential for “Application-tuning” Tune processor microarchitecture to favour an application Preserve general purpose functionality “Instruction-set Subsetting” Sacrifice general purpose functionality Eliminate hardware not required by application Combination of both methods Measure efficiency gained through real implementations

SPREE System (Soft Processor Rapid Exploration Environment) Description ISA Datapath Input: Processor description Made of hand-coded components SPREE System Verify ISA against datapath SPREE Datapath Instantiation Control Generation Multi-cycle/variable-cycle FUs Multiplexer select signals Interlocking Branch handling RTL Output: Synthesizable Verilog

Back-End Infrastructure RTL Benchmarks (MiBench, Dhrystone 2.1, RATES, XiRisc) Modelsim RTL Simulator Quartus II 4.2 CAD Software Stratix 1S40C5 Cycle Count 2. Resource Usage 3. Clock Frequency 4. Power We can measure area/performance/energy accurately

Comparison to Altera’s Nios II Has three variations: Nios II/e – unpipelined, no HW multiplier Nios II/s – 5-stage, with HW multiplier Nios II/f – 6-stage, dynamic branch prediction

Architectural Parameters Used in SPREE Multiplication Support Hardware FU or software routine Shifter implementation Flipflops, multiplier, or LUTs Pipelining Depth (2-7 stages) Organization Forwarding We focus on core microarchitecture

SPREE vs Nios II faster smaller 3-stage pipe HW multiply Multiply-based shifter faster smaller

Exploration of Soft Processor Architectural Customizations Architectural-tuning Instruction-set subsetting Combination (Arch-tuning + Subsetting)

1. Architectural Tuning Experiment Vary the same parameters Multiplication Support Shifter implementation Pipelining Determine Best overall (general purpose) processor Best per application (application-tuned) Metric: Performance per Area (MIPS/LE) Basically inverse of Area-Delay product

Performance per Area of All Processors 32% 14.1%

2. Instruction-set Subsetting SPREE automatically removes Unused connections Unused components Reduce processor by reducing the ISA Can create application-specific processor Eliminate unused parts of the ISA

Instruction-set Usage of Benchmarks Applications do not use complete ISA Strong potential for hardware reduction

Area Reduction from Subsetting 23% Fraction of Area Area reduced by 60% in some , 23% on average Similar reductions for energy, small impact on performance

3. Combining Application Tuning and Instruction-set Subsetting Subsetting is effective on its own Can apply subsetting on top of tuning Compare different customization methods Tuning Subsetting Tuning + Subsetting

Combining Application Tuning and Instruction-set Subsetting 25% 16% 14% Tuning reduces the waste that subsetting eliminates

Summary of Presented Architectural Conclusions Application tuning 14% average efficiency gain Will increase with more architectural axes Instruction-set Subsetting Up to 60% area & energy savings 16% average efficiency gain Combined Tuning & Subsetting 25% average efficiency gain

Future Work Consider other promising architectural axes Branch prediction, aggressive forwarding ISA changes Datapaths (eg. VLIW) Caches and memory hierarchy Compiler assistance Can improve tuning & subsetting