Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi.

Slides:



Advertisements
Similar presentations
Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC
Advertisements

1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Conjoining Soft-Core FPGA Processors David Sheldon a, Rakesh Kumar b, Frank Vahid a*, Dean Tullsen b, Roman Lysecky c a Department of Computer Science.
VESPA: Portable, Scalable, and Flexible FPGA-Based Vector Processors Peter YiannacourasUniv. of Toronto J. Gregory Steffan Univ. of Toronto Jonathan Rose.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Week 1- Fall 2009 Dr. Kimberly E. Newman University of Colorado.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
- Sam Ganzfried - Ryan Sukauye - Aniket Ponkshe. Outline Effects of asymmetry and how to handle them Design Space Exploration for Core Architecture Accelerating.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Application-Specific Customization of Parameterized FPGA Soft-Core Processors David Sheldon a, Rakesh Kumar b, Roman Lysecky c, Frank Vahid a*, Dean Tullsen.
A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department of Computer Science and Engineering.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang, Vahid F., Lysecky R. Proceedings of Design, Automation and Test in Europe Conference.
A Hybrid Energy-Estimation Technique for Extensible Processors Fei, Y.; Ravi, S.; Raghunathan, A.; Jha, N.K. IEEE Transactions on Computer-Aided Design.
Configurable System-on-Chip: Xilinx EDK
Performance Analysis of Processor Characterization Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor:
Application-Specific Customization of Microblaze Processors, and other UCR FPGA Research Frank Vahid Professor Department of Computer Science and Engineering.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson.
Automatic Tuning of Two-Level Caches to Embedded Applications Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
Specific Choice of Soft Processor Features Mark Grover Prof. Greg Steffan Dept. of Electrical and Computer Engineering.
Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.
FPGA Based Fuzzy Logic Controller for Semi- Active Suspensions Aws Abu-Khudhair.
Hardware/Software Partitioning Greg Stitt ECE Department University of Florida.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Power Reduction for FPGA using Multiple Vdd/Vth
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
Paper Review: XiSystem - A Reconfigurable Processor and System
SPREE Tutorial Peter Yiannacouras April 13, 2006.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Automated Design of Custom Architecture Tulika Mitra
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Data Parallel FPGA Workloads: Software Versus Hardware Peter Yiannacouras J. Gregory Steffan Jonathan Rose FPL 2009.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
Software Defined Radio 長庚電機通訊組 碩一 張晉銓 指導教授 : 黃文傑博士.
SPREE RTL Generator RTL Simulator RTL CAD Flow 3. Area 4. Frequency 5. Power Correctness1. 2. Cycle count SPREE Benchmarks Verilog Results 3. Architecture.
Reconfigurable Computing Zack Smaridge Everett Salley 1/54.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
FPGAs for Temperature-Aware Microarchitecture Research Siva Velusamy, Wei Huang, John Lach, Mircea Stan and Kevin Skadron University of Virginia.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
The Microarchitecture of FPGA-Based Soft Processors Peter Yiannacouras CARG - June 14, 2005.
Paper Review Presentation Paper Title: Hardware Assisted Two Dimensional Ultra Fast Placement Presented by: Mahdi Elghazali Course: Reconfigurable Computing.
1 chapter 1 Computer Architecture and Design ECE4480/5480 Computer Architecture and Design Department of Electrical and Computer Engineering University.
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.
Making Good Points : Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods David Sheldon, Frank.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
MIPS Pipeline and Branch Prediction Implementation Shuai Chang.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.
Lx: A Technology Platform for Customizable VLIW Embedded Processing.
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
1 Scaling Soft Processor Systems Martin Labrecque Peter Yiannacouras and Gregory Steffan University of Toronto FCCM 4/14/2008.
Presenter: Darshika G. Perera Assistant Professor
New Opportunities for Computer Architecture Research Using High-Density FPGAs and Design Tools Nahi Abdul-Ghani, Patrick Akl, Mohammad El-Majzoub, Maroulla.
Time-borrowing platform in the Xilinx UltraScale+ family of FPGAs and MPSoCs Ilya Ganusov, Benjamin Devlin.
ECE354 Embedded Systems Introduction C Andras Moritz.
Application-Specific Customization of Soft Processor Microarchitecture
Head-to-Head Xilinx Virtex-II Pro Altera Stratix 1.5v 130nm copper
Lecture 41: Introduction to Reconfigurable Computing
A High Performance SoC: PkunityTM
Improving Memory System Performance for Soft Vector Processors
Measuring the Gap between FPGAs and ASICs
Automatic Tuning of Two-Level Caches to Embedded Applications
Application-Specific Customization of Soft Processor Microarchitecture
Presentation transcript:

Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi Course: ENGG 6090*6 – Winter07 Date: Apr. 5 th, 2007

Outlines Introduction. Parameterized Soft-cores. Micro-architectural Trade-offs and ISA Sub-setting. Fast Application-specific Customization. Conclusion.

Resources P. Yiannacouras, J. Steffan and J. Rose, “Exploration and Customization of FPGA-Based Soft Processors” in IEEE Transactions on Computer-aided Design of integrated Circuits and Systems, Vol. 26, NO. 2, Feb D. Sheldon, R. Kumar, R. Lysecky, F. Vahid and D. Tullsen, “Application-Specific Customization of Parameterized FPGA Soft-Core Processors” in IEEE/ACM Int. Conf. on Computer- Aided Deisgn, Nov

Soft-core vs. Hard-core A hard-core processor is laid out on the chip next to the FPGA’s configurable logic fabric A soft-core processor is synthesized onto the FPGA’s fabric, just like any other circuit. soft-core processors advantages: Utilizing standard mass-produced Utilizing standard mass-produced Enabling a custom number of microprocessors Enabling a custom number of microprocessors Soft-core processors disadvantages: Reduced processor performance Reduced processor performance Higher power consumption Higher power consumption Larger size. Larger size.

Commercial Soft-cores Xilinx MicroBlaze A 32-bit soft-core processor. A 32-bit soft-core processor. A single-issue in order execution processor. A single-issue in order execution processor. Configurable to five components: multiplier, barrel shifter, divider, floating-point unit (FPU), and data cache. Configurable to five components: multiplier, barrel shifter, divider, floating-point unit (FPU), and data cache. Altera Nios II. It has three mostly unparameterized variations: Nios II/e, a small unpipelined 6 cycles per instruction (CPI) processor with serial shifter and software multiplication; Nios II/e, a small unpipelined 6 cycles per instruction (CPI) processor with serial shifter and software multiplication; Nios II/s, a five-stage pipeline with multiplier-based barrel shifter, hardware multiplication, and instruction cache Nios II/s, a five-stage pipeline with multiplier-based barrel shifter, hardware multiplication, and instruction cache Nios II/f, a large six-stage pipeline with dynamic branch prediction, and instruction and data caches. Nios II/f, a large six-stage pipeline with dynamic branch prediction, and instruction and data caches.

Parameterized Soft-cores Configurability. Application Specific. Size, performance and power constraints. Configurable Parameters: Instantiating Functional Units (0,1). Instantiating Functional Units (0,1). Unit-Specific Parameters (Cache type/size). Unit-Specific Parameters (Cache type/size). Instruction Set Architecture. Instruction Set Architecture. Pipelining (Depth). Pipelining (Depth).

Exploration and Customization of FPGA- Based Soft Processors Exploration of the micro-architectural tradeoffs for soft processors A set of customization techniques: Tuning the micro-architecture to the application. Tuning the micro-architecture to the application. Subsetting the ISA Subsetting the ISA Hybrid approach Hybrid approach To improve the performance/area of a soft processor for a specific application. A CAD Tool.

Approach Developing a customization tool that will generate the most customized soft-core. SPREE (soft-processor rapid exploration environment). Targeting functional unit customization and ISA subsetting.

SPREE Input: Textual Description (ISA& Datapath). ISA & datapath verification. Constructing the Datapath. Control Generation. Synthesizable RTL (Verilog)

Framework Altera Startix I. Comparison with Nios-II variations (e, s and f) MIPS Instructtion Set. Performance Metrics Area in LE Area in LE Performance in MIPS Performance in MIPS Efficiency in MIPS/LE Efficiency in MIPS/LE Equal weight for performance and area Equal weight for performance and areaBenchmark 20 varied applications (fir, FFT, DES, CRC, QSORT, Bubble- sort) 20 varied applications (fir, FFT, DES, CRC, QSORT, Bubble- sort)

SPREE vs. Nios

Micro-architecture Exploration (1) Functional Units Shifter Implementation (serial, shared multiplier) Shifter Implementation (serial, shared multiplier) Multiplication (SW, HW). Multiplication (SW, HW).

Micro-architecture Exploration (2) Pipelining Depth Depth Organization Organization

Micro-architecture Customization 6 micro-architectural axes Exhaustive search for the generated solutions.

ISA Subsetting Eliminate the unused instruction Simplify Control Unit  Reduce Area Simplify Control Unit  Reduce Area Less than 50% utilization of the ISA.

Impact of ISA subsetting Impact on Area Impact on Performance

Results Fine Customization Environment an improvement in performance per area of 14.1% on average across all benchmarks. Combined approach improved the performance per area by 24.5% on average across all applications.

Application-Specific Customization of Parameterized FPGA Soft-Core Processors A methodology for fast application-specific customization of a parameterized FPGA soft core. Targeting 1-2 hours Runtime Near-optimal Results Traditional CAD with 0-1 Knapsack Algorithm Traditional CAD with 0-1 Knapsack Algorithm Synthesis-in-the-loop exploration. Synthesis-in-the-loop exploration.

Framework Xilinx MB on Virtex-II Pro FPGA Comparison with Base and Full MB Performance Metrics Area in equivalent LUTs Area in equivalent LUTs Performance by the application runtime in (ms) Performance by the application runtime in (ms)Benchmark 11 applications from EEMBC 11 applications from EEMBC

Justification

Approach-1 Traditional CAD Approach 0-1 knapsack problem Maximize performance Maximize performance Constraint on area Constraint on area 6 synthesis/execution runs

Approach-2 Synthesis-in-the-loop pre-determines the impact each parameter individually has on design metrics pre-determines the impact each parameter individually has on design metrics then search the parameters in sequence, ordered from highest impact to lowest. then search the parameters in sequence, ordered from highest impact to lowest. Two orders (fixed-ordered and impact-ordered)

Results Exhaustive search took 11 hours. The fixed impact-ordered tree approach had the fastest runtime of 108 minutes. Knapsack algorithm with similar results to the fixed impact-ordered tree approach. Similar results for 50% constraint. No Constraint Fixed 80% constraintPer application 80% constraint

Results Reimplementation on Spartan2 FPGA 1.5 hours runtime for the fixed-order impact-ordered tree 200 minutes for the application-specific impact-ordered tree

Scalability Increasing the number of parameters Increase the runtime. Increase the runtime. Fixed-order impact-ordered tree and knapsack scale well. Fixed-order impact-ordered tree and knapsack scale well.

Conclusion Impact of customization on performance and area. Emphasis on performance. Customizable parameters span the micro-architecture and the ISA. Use of near-optimal solutions to save on runtime. Possibility to look for finer customization, but scalability have to be addressed. Finer customization might consider 0-1 parameters or multi-valued parameters.

THANK YOU Q&A