Specific Choice of Soft Processor Features Mark Grover Prof. Greg Steffan Dept. of Electrical and Computer Engineering.

Slides:



Advertisements
Similar presentations
Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.
Advertisements

Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi.
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
CS107 Introduction to Computer Science
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
IMPLEMENTATION OF µ - PROCESSOR DATA PATH
1 Using A Multiscale Approach to Characterize Workload Dynamics Characterize Workload Dynamics Tao Li June 4, 2005 Dept. of Electrical.
Chapter 4 Assessing and Understanding Performance
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.
1 Computer Performance: Metrics, Measurement, & Evaluation.
SPREE Tutorial Peter Yiannacouras April 13, 2006.
Micron Technology Clinic Effect of Transistor Number and Hierarchy on Simulation Speed Presented by: Jason Oneida Advisor: Dr. Ken Stevens.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Analysis of Algorithms
New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
Fine-Grain Performance Scaling of Soft Vector Processors Peter Yiannacouras Jonathan Rose Gregory J. Steffan ESWEEK – CASES 2009, Grenoble, France Oct.
SPREE RTL Generator RTL Simulator RTL CAD Flow 3. Area 4. Frequency 5. Power Correctness1. 2. Cycle count SPREE Benchmarks Verilog Results 3. Architecture.
A Fast Hardware Approach for Approximate, Efficient Logarithm and Anti-logarithm Computation Suganth Paul Nikhil Jayakumar Sunil P. Khatri Department of.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
Big Oh CS 244 Brent M. Dingle, Ph.D. Game Design and Development Program Department of Mathematics, Statistics, and Computer Science University of Wisconsin.
Computer Performance Computer Engineering Department.
The Microarchitecture of FPGA-Based Soft Processors Peter Yiannacouras CARG - June 14, 2005.
1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca.
Image Comparison Tool Product Proposal Tim La Fond and Peter Beckfield.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Architecture Selection of a Flexible DSP Core Using Re- configurable System Software July 18, 1998 Jong-Yeol Lee Department of Electrical Engineering,
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon.
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.
Sunpyo Hong, Hyesoon Kim
EGRE 426 Computer Organization and Design Chapter 4.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
RTL Hardware Design by P. Chu Chapter 9 – ECE420 (CSUN) Mirzaei 1 Sequential Circuit Design: Practice Shahnam Mirzaei, PhD Spring 2016 California State.
1 A simple parallel algorithm Adding n numbers in parallel.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
GangES: Gang Error Simulation for Hardware Resiliency Evaluation Siva Hari 1, Radha Venkatagiri 2, Sarita Adve 2, Helia Naeimi 3 1 NVIDIA Research, 2 University.
1 Scaling Soft Processor Systems Martin Labrecque Peter Yiannacouras and Gregory Steffan University of Toronto FCCM 4/14/2008.
What’s going on here? Can you think of a generic way to describe both of these?
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Defining Performance Which airplane has the best performance?
Lesson Objectives Aims From the spec:
Application-Specific Customization of Soft Processor Microarchitecture
Complexity In examining algorithm efficiency we must understand the idea of complexity Space complexity Time Complexity.
Realizing the potential of mobile devices as experimental devices: Human computer interface and performance considerations Chiung Ching Ho & C. Eswaran.
A Quantitative Analysis of Stream Algorithms on Raw Fabrics
CS2100 Computer Organisation
Hyperthreading Technology
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Yikes! Why is my SystemVerilog Testbench So Slooooow?
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Performance of computer systems
Application-Specific Customization of Soft Processor Microarchitecture
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

Specific Choice of Soft Processor Features Mark Grover Prof. Greg Steffan Dept. of Electrical and Computer Engineering

Hard and Soft Processors Hard Processors Soft Processors Verilog Made from transistors Cost millions to make Faster in speed Consume less power Made from transistors Cost millions to make Faster in speed Consume less power Built on FPGA Fabric Are customizable Can cater to application specific needs Built on FPGA Fabric Are customizable Can cater to application specific needs Processor Architecture

Research Problem Choose the best micro-architectural features – Want to optimize the use of resources Power consumption(as minimum as possible) Area(as less as possible) Wall Clock Time(lesser the better) Time Spent

SPREE Soft Processor Rapid Exploration Environment Scanned the whole of design space Is it viable enough? – What if a new application comes into picture? – What if the performance criteria changes? Say, the user doesn’t care about area any more?

Enhanced Simulator (MINT) Research Objective Enhanced Simulator Part 1 Maximum power, area Software Application Fastest micro- architectural combination What if a new application comes into picture? What if the performance criteria changes? Enhanced Simulator Part 2 Approximates

Outline Motivation Implementation – Implementation Scheme(in general) – Data deciphering Results – Multiplier option Discussion Conclusion Long Term Goal

Implementation Scheme Experimental Data for some Benchmarks Look for trends and dependencies Propose a suitable relationship Comparing with the trade- offs and providing the best solution

Data Deciphering Multiplier option(Hard/Soft Multiplier) – Approximate cycle count change on using them? Multiplication operation is converted to a set of shifts and adds – Simulated the algorithm to find the equivalent number of instructions – Plotted the number of equivalent instructions vs. the changes in cycle counts(experimental data)

Hard and Soft Multiplier Hard Multiplier Does the multiply operation as a single instruction Occupies finite area Delays the clock by a finite time Consumes finite amount of power Soft Multiplier No dedicated multiplier Each multiply instruction converted into simpler instructions No change in area, frequency or power

Method of Analysis A*B Set of Branches, Shifts and Add instructions For all multiply instructions in the benchmark Plot with the change in cycle count (experimental)for all processor variants Total change in equivalent instructions

Outline Motivation Implementation – Implementation Scheme(in general) – Data deciphering Results – Multiplier option Discussion Conclusion Long Term Goal

Results Gnuplot used to plot graphs on log scale A linear correlation obtained between the points plotted

Example 1 Increase in cycle counts(Log Scale) Change in equivalent instructions from hard- multiplier to soft multiplier on pipe5,barrelshift proc

Example 2 Increase in cycle counts(Log Scale) Change in equi. instructions from hard-multiplier to soft multiplier on serial shift, high rise processor

Outline Motivation Implementation – Implementation Scheme(in general) – Data deciphering Results – Multiplier option Discussion Conclusion Long Term Goal

Discussion “Fit.log” as a good measure of correlation Percentage uncertainty is expressed by Asymptotic Standard Error(A.S.E) Example 1- A.S.E is 4.132% Example 2- A.S.E is 3.166% A linear dependence is found on log scale Generated by gnuplot

A.S.E of all Processor Variants

Outline Motivation Implementation – Implementation Scheme(in general) – Data deciphering Results – Multiplier option Discussion Conclusion Long Term Goal

Conclusion Linear fit enables to predict quite accurately the change in cycle count with change in feature This change for all the features servers as input to part 2 of the enhanced simulator Template for future work

Example 2 Increase in cycle counts(Log Scale) Change in equi. instructions from hard-multiplier to soft multiplier on serial shift, high rise processor From part 1 of MINT by running the application on it This gives the approx. change in cycle count for new application

Future Work Presently, dealt only with the multiplier option Similar analysis on other features Comparison between user demands and approximate cycle counts

References Improving Pipelined Soft Processors with Multithreading, Martin Labrecque and J. Gregory Steffan Application-Specific Customization of Soft Processor Microarchitecture, Peter Yiannacouras, J. Gregory Steffan and Jonathan Rose

Special Thanks Prof. Greg Steffan CARG(Compiler & Architecture Reading- Group) PaCRaT(Parallelism and Customization Research At university of Toronto)

What I learnt? Research is not a 9 to 5 Job, it’s a lifestyle of discovering something small but relevant from time to time At times, you see that nothing is bearing fruits for you, then is the time to get off from your seat

Thanks Any Questions ???