Using Cycle Efficiency as a System Designer Metric to Characterize an Embedded DSP and Compare Hard Core vs. Soft Core Advisor Dr. Vishwani D. Agrawal.

Slides:



Advertisements
Similar presentations
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Advertisements

Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Week 1- Fall 2009 Dr. Kimberly E. Newman University of Colorado.
Polynomial-Time Algorithms for Designing Dual-Voltage Energy Efficient Circuits Master’s Thesis Defense Mridula Allani Advisor : Dr. Vishwani D. Agrawal.
Spring 2008, Jan. 14 ELEC / Lecture 2 1 ELEC / Computer Architecture and Design Spring 2007 Introduction Vishwani D. Agrawal.
Energy Source Lifetime Optimization for a Digital System through Power Management Department of Electrical and Computer Engineering Auburn University,
May 14, ISVLSI 09 Algorithms for Estimating Number of Glitches and Dynamic Power in CMOS Circuits with Delay Variations Jins Davis Alexander Vishwani.
Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani.
Spring 08, Jan 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
March 16, 2009SSST'091 Computing Bounds on Dynamic Power Using Fast Zero-Delay Logic Simulation Jins Davis Alexander Vishwani D. Agrawal Department of.
ASIC vs. FPGA – A Comparisson Hardware-Software Codesign Voin Legourski.
1 Chapter 12 Advanced Topics--Introduction. 2 Overview To achieve higher growth Additional features, software and IP offerings Application: consumer electronics,
Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors Vishwani D. Agrawal.
ECE 232 L1 Intro.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 1 Introduction.
Dynamic Power Consumption In Large FPGAs WILLIAM GARCIA, ANDREW MORTELLARO.
© 2010 Altera Corporation—Public DSP Innovations in 28-nm FPGAs Danny Biran Senior VP of Marketing.
FPGA Based Fuzzy Logic Controller for Semi- Active Suspensions Aws Abu-Khudhair.
ELEC516/10 course_des 1 ELEC516 VLSI System Design and Design Automation Spring 2010 Course Description Chi-ying Tsui Department of Electrical and Electronic.
© 2003 Xilinx, Inc. All Rights Reserved Power Estimation.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
156 / MAPLD 2005 Rollins 1 Reducing Energy in FPGA Multipliers Through Glitch Reduction Nathan Rollins and Michael J. Wirthlin Department of Electrical.
04/26/05 Anthony Singh, Carleton University, MCML - Fixed Point - Integer Divider Presentation #2 High-Speed Low Power VLSI – Prof. Shams By Anthony.
Finding Optimum Clock Frequencies for Aperiodic Test Master’s Thesis Defense Sindhu Gunasekar Dept. of ECE, Auburn University Advisory Committee: Dr. Vishwani.
Managing Performance and Efficiency of a Processor Advisor: Dr. Vishwani Agrawal Committee: Dr. Adit Singh and Dr. Victor Nelson Department of Electrical.
Determining the Optimal Process Technology for Performance- Constrained Circuits Michael Boyer & Sudeep Ghosh ECE 563: Introduction to VLSI December 5.
ENG3050 Embedded Reconfigurable Computing Systems General Information Handout Winter 2015, January 5 th.
SoC TAM Design to Minimize Test Application Time Advisor Dr. Vishwani D. Agrawal Committee Members Dr. Victor P. Nelson, Dr. Adit D. Singh Apr 9, 2015.
Jia Yao and Vishwani D. Agrawal Department of Electrical and Computer Engineering Auburn University Auburn, AL 36830, USA Dual-Threshold Design of Sub-Threshold.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Low Power Architecture and Implementation of Multicore Design Khushboo Sheth, Kyungseok Kim Fan Wang, Siddharth Dantu ELEC6270 Low Power Design of Electronic.
Towards the Design of Heterogeneous Real-Time Multicore System m Yumiko Kimezawa February 1, 20131MT2012.
Towards the Design of Heterogeneous Real-Time Multicore System Adaptive Systems Laboratory, Master of Computer Science and Engineering in the Graduate.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
FPL Sept. 2, 2003 Software Decelerators Eric Keller, Gordon Brebner and Phil James-Roxby Xilinx Research Labs.
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA Project Guide: Smt. Latha Dept of E & C JSSATE, Bangalore. From: N GURURAJ M-Tech,
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
A Design Flow for Optimal Circuit Design Using Resource and Timing Estimation Farnaz Gharibian and Kenneth B. Kent {f.gharibian, unb.ca Faculty.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
Spring 2016, Jan 13 ELEC / Lecture 1 1 ELEC / Computer Architecture and Design Spring 2016 Introduction Vishwani D. Agrawal.
Advanced SW/HW Optimization Techniques for Application Specific MCSoC m Yumiko Kimezawa Supervised by Prof. Ben Abderazek Graduate School of Computer.
11/15/05ELEC / Lecture 191 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
© 2009 Altera Corporation Floating Point Synthesis From Model-Based Design M. Langhammer, M. Jervis, G. Griffiths, M. Santoro.
Click to edit Master title style Literature Review Measuring the Gap Between FPGAs and ASICs Ian Kuon, Jonathan Rose University of Toronto IEEE TCAD/ICAS.
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
Characterizing Processors for Energy and Performance Management Harshit Goyal and Vishwani D. Agrawal Department of Electrical and Computer Engineering,
1 Digital Logic Design (41-135) Introduction Younglok Kim Dept. of Electrical Engineering Sogang University Spring 2006.
M V Ganeswara Rao Associate Professor Dept. of ECE Shri Vishnu Engineering College for Women Bhimavaram Hardware Architecture of Low-Power ALU using Clock.
Department of Electrical and Computer Engineering Auburn University, AL USA.
Measuring Performance II and Logic Design
Presenter: Darshika G. Perera Assistant Professor
ELEC 7770 Advanced VLSI Design Spring 2016 Introduction
Introduction to Programmable Logic
Application-Specific Customization of Soft Processor Microarchitecture
Head-to-Head Xilinx Virtex-II Pro Altera Stratix 1.5v 130nm copper
Morgan Kaufmann Publishers
FPGAs in AWS and First Use Cases, Kees Vissers
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
ELEC 7770 Advanced VLSI Design Spring 2014 Introduction
M.S. Thesis Defense Murali Dharan Advisor: Dr. Vishwani D. Agrawal
ELEC 7770 Advanced VLSI Design Spring 2012 Introduction
ELEC 7770 Advanced VLSI Design Spring 2010 Introduction
Circuit Design Techniques for Low Power DSPs
A High Performance SoC: PkunityTM
Measuring the Gap between FPGAs and ASICs
Application-Specific Customization of Soft Processor Microarchitecture
A Low-Power Analog Bus for On-Chip Digital Communication
Presentation transcript:

Using Cycle Efficiency as a System Designer Metric to Characterize an Embedded DSP and Compare Hard Core vs. Soft Core Advisor Dr. Vishwani D. Agrawal Committee Members Dr. Victor P. Nelson, Dr. Adit D. Singh October 1, 2013 Master’s Project Defense Rathan Raj

Outline  Motivation  Background  Problem Statement  Implementation  Results  Conclusion  Limitations and Future Work  References October 1, 20132

Motivation  Performance, Power and Area are three conflicting goals, and industry demands that all three aspects be co-optimized.  To obtain a complete performance modeling requires marrying everything from high-level modeling and synthesis to better characterization and verification. October 1, Performance

Background October 1,  What is Characterization?  Characterization over Process, Voltage, Temperature  Performance Metric  Energy Efficiency Metric

Background October 1, Performance Metrics :  Clock Frequency  MIPS  MFLOPS  SPEC ratio  Relative Efficiency  SWAP  Performance per Watt  Cycle Efficiency Source: D. A. Patterson and J. L. Hennessy, Computer Organization & Design: The Hardware/Software Interface, 4 th Edition, Morgan Kaufmann Publishers (Elsevier), 2009 A. Shinde and V. D. Agrawal, “Managing Performance and Efficiency of a Processor,” Proc. 45 th IEEE Southeastern Symp. System Theory, March 2013

Background October 1, 20136

Background October 1, Source: A. Shinde and V. D. Agrawal, “Managing Performance and Efficiency of a Processor,” Proc. 45 th IEEE Southeastern Symp. System Theory, March 2013 Dr. Agrawal, Lower Power Design of Electronic Circuits, lecture_8.ppt

Problem Statement  Can we characterize an embedded DSP in an FPGA and use cycle efficiency to analyze its performance? Also, use cycle efficiency to compare the performance of a Hard Core to a Soft Core. October 1, 20138

Implementation  Lattice ECP3 65nm FPGA  Design & Synthesis Tool –Lattice Diamond  Lattice ECP3 DSP unit has cascadable DSP slices that are ideal for power sensitive wireless applications and image signal processing.  Implementation of the function: Multiply Accumulate (MAC) An x Bn + Pn-1 = Pn October 1, Source : Lattice ECP3 SysDSP usage guide

Design Flow October 1, Design Flow October 1, Design Entry Synthesis Functional Simulation Fitting Timing Analysis and Simulation Characterization & Programming Design Correct? Timing requirements met? Yes No

Power Analysis October 1, Power Analysis: 65nm Hard DSP at V dd =1.2V, f=280 MHz, No. of execution cycles= 1.5 x10 6 cycles Typical Worst

Results October 1, Power Dissipation and Cycle efficiency Calculations Temperature( 0 C)P Static (mW)P Dyn (mW)P T (mW)E Total (µJ) EPC (nJ/cycle) Cycle Efficiency (η) 10 9 cycles/J Worst Process, V dd = 1.2 V, Fmax = 280 MHz, No. of execution cycles = 1.5 x 10 6 cycles.

Cycle Efficiency( η) vs. T October 1, V = 1.2V, Fmax = 280 MHz, No. of Execution cycles = 1.5 x 10 6 cycles.

Results October 1, Performance grade(Process Variation) at different Temperatures and Cycle efficiency Performance grade T=0 0 CFmax Etotal (µJ)EPC (nJ)η (10 9 cycles/J) 6 (worst) (typical) (best) Performance grade T=25 0 CFmax Etotal (µJ)EPC (nJ) η (10 9 cycles/J) 6 (worst) (typical) (best) Performance grade T=50 0 CFmax Etotal (µJ) EPC (nJ) η (10 9 cycles/J) 6 (worst) (typical) (best) Performance grade T= CFmax Etotal (µJ) EPC (nJ) η (10 9 cycles/J) 6 (worst) (typical) (best)

Performance Grade and η October 1, Effect of process variation at different Temperatures on Cycle Efficiency V dd = 1.2V, No. of execution cycles = 1.5 x 10 6

Comparison of Hard DSP vs. Soft Core (LUT-based)  Device: 90 nm Stratix II GX FPGA  CAD Tool for Design & Synthesis – Quartus 2  MAC operation on both implementations.  Implementation using only the Embedded DSP unit 4 DSP 9x9 multipliers  Implementation using only Logic Elements 337 LUT + 97 Registers October 1,

Results Comparison of Hard DSP vs. Soft DSP(LUT) October 1, Resource Utilization F max (MHz)P Static (mW)P Dyn (mW)P I/O (mW)P Total (mW)E Total (µJ) EPC (nJ/cycle) Cycle Efficiency (η) mega cycles/J 4 DSP 9x9 multipliers (Hard Core) LUT + 97 registers (Soft Core) V dd = 1.2 V, No. of Execution Cycles = 1.5x10 6, and T = 25 0 C

Summary  As Temperature increases, cycle efficiency decreases.  From 45 0 C C, there is a 40 % decrease in the cycle efficiency.  The Cycle efficiency calculations at different Performance grades were calculated over the operating temperature range.  Hard DSP vs. Soft DSP (LUT): The dynamic power consumed by the Hard Core was 55 % higher than the dynamic power consumed by the Soft Core. The cycle efficiency of the Hard Core implementation was 150% more than the Soft Core. October 1,

 For system designers who are required to design systems which work robustly under extreme temperature conditions, the cycle efficiency calculations provide valuable insight into the power and performance for the design.  Characterization and Performance analysis over Process, Temperature and Voltage allows the designer to effectively optimize the time and energy requirements of an electronic system. October 1, Conclusion

Limitations and Future Work  Characterization was accurate in terms of the design and implementation. However, the Lattice ECP3 device was assumed to be running at a fixed V dd  Tool limitations do not allow frequency and voltage calculations over varying temperature  A Characterization of voltage with varying temperatures and scaling of voltage into the sub-threshold regions will help in better voltage characterization. October 1,

Limitations and Future Work  Cycle efficiency can be used in the industry as a performance metric that not only can be applied in the characterization phase but also in the architectural phase for making better engineering judgments during choices of systems and components October 1,

References Agrawal, V. D., “Low Power Design of Electronic Circuits,” Power Aware Microprocessors, ELEC-6270, Spring 2013 Altera Corporation, “DSP Blocks in Stratix II and Stratix II GX Devices,” January Altera Corporation, “Stratix II Architecture,” May Lattice Semiconductor- Diamond Student Web edition. Lattice Semiconductor, “Lattice ECP3 SysDSP Usage Guide, Technical note TN8112,” February Lattice Semiconductor, “Lattice Power Consumption and Management for LatticeECP3 Devices Usage Guide, Technical note TN1181,” February Mirzaei, Shahnam, “Design Methodologies and Architectures for Digital Signal Processing on FPGAs,” in Doctor of Philosophy’s dissertation, University Of California Santa Barbara, June Patterson, D. A., Hennessy, J. L., Computer Organization & Design: The Hardware/Software Interface, 4 th Edition, Morgan Kaufmann Publishers (Elsevier), 2009 Shinde, A., Agrawal, V. D., “Managing Performance and Efficiency of a Processor,” Proc. 45 th IEEE Southeastern Symp. System Theory, March 2013 October 1,

October 1, Thank You

October 1, Questions?