Lecture 19 FPGA & CPU Performance Comparison FPGA Families

Slides:



Advertisements
Similar presentations
Spartan-3 FPGA HDL Coding Techniques
Advertisements

Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
10/14/2005Caltech1 Reliable State Machines Dr. Gary R Burke California Institute of Technology Jet Propulsion Laboratory.
Lecture 16 RC Architecture Types & FPGA Interns Lecturer: Simon Winberg.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
February 4, 2002 John Wawrzynek
Lecture 3: Computer Performance
Lecture 16 RC Architecture Types & FPGA Interns Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Lecture 6: Design of Parallel Programs Part I Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Copyright Jim Martin Computers Inside and Out Dr Jim Martin
Lecturer: Simon Winberg Lecture 21 Reflections, key steps* and A short comprehensive assignment * Relates to Martinez, Bond and Vai Ch 4. Attribution-ShareAlike.
CSCE 430/830 Course Project Guidelines By Dongyuan Zhan Feb. 4, 2010.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Lecture #3 Page 1 ECE 4110– Sequential Logic Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.No Class Monday, Labor Day Holiday 2.HW#2 assigned.
Intro to Architecture – Page 1 of 22CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Introduction Reading: Chapter 1.
Lecture 13 YODA Project & Discussion of FPGAs Lecturer: Simon Winberg.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
J. Christiansen, CERN - EP/MIC
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Your Own Digital Accelerator Topics & Selection Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Lecture #3 Page 1 ECE 4110–5110 Digital System Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.HW#2 assigned Due.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
FPGA 상명대학교 소프트웨어학부 2007년 1학기.
Lecture 5: Lecturer: Simon Winberg Review of paper: Temporal Partitioning Algorithm for a Coarse-grained Reconfigurable Computing Architecture by Chongyong.
1 The user’s view  A user is a person employing the computer to do useful work  Examples of useful work include spreadsheets word processing developing.
1 Introduction to Engineering Fall 2006 Lecture 17: Digital Tools 1.
Introduction to the FPGA and Labs
EET 1131 Unit 4 Programmable Logic Devices
Lecture 18 FPGA Interns & Performance Comparison
EEE4084F Digital Systems Lecture 21
Topics Coarse-grained FPGAs. Reconfigurable systems.
Objectives Overview Differentiate among various styles of system units on desktop computers, notebook computers, and mobile devices Identify chips, adapter.
Complex Programmable Logic Device (CPLD) Architecture and Its Applications
EEE4084F Digital Systems NOT IN 2017 EXAM Lecture 25
ECE 4110–5110 Digital System Design
Instructor: Dr. Phillip Jones
Cache Memory Presentation I
Instructions at the Lowest Level
Lecture 17 Programmable Logics & FPGAs FPGA Interns
Field Programmable Gate Array
Field Programmable Gate Array
Field Programmable Gate Array
Lecture 41: Introduction to Reconfigurable Computing
EEE4084F Digital Systems NOT IN 2018 EXAM Lecture 24X
XC4000E Series Xilinx XC4000 Series Architecture 8/98
Lecture 14: Reducing Cache Misses
Lecture 18 X: HDL & VHDL Quick Recap
Computer Architecture
Control Unit Introduction Types Comparison Control Memory
CDA 3100 Spring 2010.
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Basic Adders and Counters Implementation of Adders
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
HIGH LEVEL SYNTHESIS.
Registers.
Computers Inside and Out
COMP60621 Fundamentals of Parallel and Distributed Systems
TA David “The Punner” Eitan Poll
Designing a Pipelined CPU
"Computer Design" by Sunggu Lee
Exploring Application Specific Programmable Logic Devices
Registers and Counters
A Level Computer Science Topic 5: Computer Architecture and Assembly
COMP60611 Fundamentals of Parallel and Distributed Systems
Xilinx Alliance Series
Programmable logic and FPGA
Presentation transcript:

Lecture 19 FPGA & CPU Performance Comparison FPGA Families EEE4084F Digital Systems Lecture 19 FPGA & CPU Performance Comparison FPGA Families Lecturer: Simon Winberg Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Lecture Overview FPGA performance evaluation FPGA vs CPU performance FPGA families YODA issues

Evaluating Performance Evaluating synthesis (simplified) of an FPGA design

HDL to FPGA execution & LE cost In order to implement a HDL design, the design need to be decomposed and mapped to the physical LBs on the FPGA and the interconnects need to be appropriately configured. Example: x = AND(e,f,g) y = AND(b,NAND(NAND(b,c),d)) out = NAND((NAND(x,y),NAND(a,y)) Map ‘AND(e,f,g)’ to LB1 Map ‘NAND((NAND(x,y),NAND(a,y))’ to LB2 x out y Map ‘AND(b,NAND(NAND(b,c),d)) ’ to LB3 Costing: 3 LBs, 8 LEs (assuming LBs have LEs that are AND or NAND gates)

Timing calculations The previous slide didn’t show whether the connections were synchronized (i.e., a shared clock) or asynchronous –since they are all logic gates and no clocks show it’s probably asynchronous Determining the timing constrains for synchronous configurations are generally easier, because everything is related to the clock speed. Still, you need to keep in mind cascading calculations. For asynchronous use, the implementation could run faster, but can also become a more complicated design, and be more difficult to work out the timing…

Async Timing calculations Keep in mind that the propagation delays for the various gates / LUTs may be different – for example, in the previous example, let’s assume each AND may take 6ns to stabilise, and the NANDS 10ns. So time to compute out is = MAX OF (time to compute x, time to compute y) + 2x10ns = (2x10ns+6ns) + 20ns = 46ns = pretty fast!! Or is it?? Compared to a 1GHz CPU using just registers (and no mem access)? Try this calculation for yourself ... (assume each instruction takes on avg. 3 clocks due to pipeline, data dependencies, etc, as worst case performance on a RISC processor)

Comparing to CPU speed CPU running at 1GHz  each clock 1ns period Assume each instruction takes ~ 5 clocks each due to pipeline etc CODE: int doit ( unsigned a, b, c, d, e, f, g ) { unsigned x = AND(e,f,g); unsigned y = AND(b,NAND(NAND(b,c),d)) out = NAND((NAND(x,y),NAND(a,y)) return out; } But some of these Can’t be done as just 1 RISC instruction. unsigned t1 = AND(e,f);  1 instruction, i.e. AND t1,e,f unsigned x = AND(t1,g); unsigned t1 = NAND(b,c) unsigned t2 = NAND(t1,d) unsigned y = AND(b,t2) t1 = NAND(x,y) t2 = NAND(a,y) out = NAND(t1,t2) in all 8 instructions  8 x 3 clocks ea. = 24 ns (assuming all registers pre-loaded) A speed-up of 1.92 over the FPGA case

Digital Clock Manager (DCM) blocks An important element included in FPGA designs nowadays are DCM blocks, which are used to eliminate clock distribution delay and can also increase or decrease the frequency of the clock

FPGA Families EEE4084F

The Manufacturers The ‘Big 2’ (most commonly used) Xilinx – Capital $8.52B, 2984 employees Altera – Capital $12B, 2555 employees The others pretty big ones… Actel (Microsemi Corp) – $2B capitalizations, 2250 employees Lattice Semiconductor Corp – $700M capitalizations, 708 employees Sources: “100 Power Tips for FPGA Designers” – based on 2011 stats

About the FPGA Families Xilinx Focusing on high performance and high capacity e.g. Vertex family (such as Vertex 7) Provides lower-cost options with high capacity (e.g. Spartan 6 family) Range of variations, e.g. low power options, economy (lower capacity) models. Note that the top performance FPGA changes over time and is not necessarily consistently one or other of the manufacturers

About the FPGA Families Altera Stratix: higher performance and density models (e.g. Startix-10) Arria: mid-range, lower-power, but also lower performance and denisity compared to Stratix. Cyclone: lowest cost option, also aimed at low power, cost sensitive and mobile applications

About the FPGA Families Actel Focuses on providing the lowest power, and widest range of small packages IGLOO : low power, small footprint SmartFuson : Mixed FPGA and ARM processor RTAX/RTSX : radiation tolerant and very high reliability.

About the FPGA Families Lattice Range of options (low power; high performance; small package) Own specialized development tools (of these four, this one is the only firms not in California; they are currently in Oregon)

About the FPGA Families Others Achronix – focusing on building the fastest FPGAs (not necessarily highest capacity) Tabula – unique FPGA technology ‘SpaceTime’, focusing on highest capacity and memory capabilities

Memory jogger… Q: Name a high-capacity FPGA family. A: Xilinx Vertex (e.g. ver 7+) / Altera Stratix (ver 10+) Q: Which of the following is a FPGA manufacturer ? (a) Acrobatics (b) Geometrix (c) Achronix Note that the producers are constantly bringing out new versions so this slide may get stale quite quickly.

YODA Issues EEE4084F

Project Teams Start ASAP with forming a project team Teams should be 2 or 3 members each Ideally, you want a diverse team, a teammate to bring in different perspectives, alternate experiences and a variety of skills. Allocate team member roles as well (see spreadsheet) Good team dynamics doesn’t necessitate a mad party (To have a ‘special’ team of a different size, e.g. of 4, please check with lecturer)

A whirlwind tour YODA Projects

Forming a team Please use the wiki to capture your team composition and topic. (see Yoda Teams in the wiki)

24 students assigned, still a couple more teams need to be formed! YODA Proj ID & Blog Project/Team Name Team Members Student IDs P00 MP3 Player (example entry) Dodo Johns, John Doe, Jane Doe JOHDXX341, …, … P18 PADAWAN Keegan Crankshaw, Liam Clark, Andrew Olivier CRNKEE002, CLRLIA002, OLVAND008 P10 BCDC - Binary Coded Decimal Converter Mutafa Rashid, Tatenda Muvhu, Petrus Kambala RSHMUS001, MVHADM001, KMBPET001 P14 IMA - Image Masking Accelerator Xolisani Nkwentsha, Sange Maxaku, Thapelo Nthithe NKWXOL003, MXKSAN002, NTHTHA012 P17 VADER - VERSATILE ACCELERATED DIGITAL ENCRYPTION RECOVERY Munsanje Mweene, Claude Betz MWNMUN001, BTZCLA001 P05 DM - DELTA MODULATOR David Fransch, Thato Semoko FRNDAV011, SMKTHA004 P03 IF - Interpolation filter Lebohang Mbele, Tato Moaki, Mashau Zwivhuya MBLLEB006, MKXTAT001, MSHZWI001 P13 MMA Qayyoom Arieff, Ashentha Naidoo, Prej Naidu, Muhammed Razzak ARFABD001, NDXASH016, NDXPRE047, RZZMUH001 P16 DE - DATA ENCRYPTION ACCELERATOR Alexandra Barry, Edwin Samuels, Nicholas Antoniades BRRALE004, SMLEDW002,ANTN IC006 P20 Polyphase Filtering Sylvan Morris, Alex Knemeyer MRRSYL001, KNMALE002 24 students assigned, still a couple more teams need to be formed!

Topic listing YODA PROJECT TOPICS P01: SF - Smoothing Filter P02: CAM - Content Addressable Memory P03: IF - Interpolation filter P04: PRNG - Parallel Random Number Generator P05: DM - Delta modulator P06: ASG - Arithmetic series generator P07: SALG - Selection Address List Generator P08: FSG - Function Samples Generator P09: BSS - Bit Sequence Sniffer P10: BCDC - Binary Coded Decimal Convertor P11: NCSM - Nonlinear Check Sum Module P12: MD5 - Message Digest version 5 P13: MMA - Matrix Multiplier Accelerator P14: IMA - Image Masking Accelerator P15: PSA - Pattern Seek Accelerator P16: DE - Data Encryption Accelerator P17: VADER - Versatile Accelerated Digital Encryption Recovery P18: PADAWAN - Parallel Accelerator for Digitising Audio with Attenuation of Noise P19: DDS - Direct Digital Synthesis

Preparing the Blog View example Blogs in YODA Hall of Fame (see description) Make sure you cover… Preamble Topic name (big and clear) Names and main roles Prototype / Specification Define problem and suggested solution [10 marks] Identified specs/design questions to solve [10 marks] Identify criteria for an acceptable solution [10 marks] References & information sources [5 marks]

Back to some Verilog Or short intermission

References & Acknowledgements References sources used include: Todman, Timothy J., et al. "Reconfigurable computing: architectures and design methods." Computers and Digital Techniques, IEE Proceedings-. Vol. 152. No. 2. IET, 2005. Stavinov, Evgeni. 100 Power Tips for FPGA Designers. Evgeni Stavinov, 2011. Acknowledgement Thanks to John-Philip Taylor (TA) for time taken to review slides and spotting typos and inaccuracies.

Disclaimers and copyright/licensing details I have tried to follow the correct practices concerning copyright and licensing of material, particularly image sources that have been used in this presentation. I have put much effort into trying to make this material open access so that it can be of benefit to others in their teaching and learning practice. Any mistakes or omissions with regards to these issues I will correct when notified. To the best of my understanding the material in these slides can be shared according to the Creative Commons “Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)” license, and that is why I selected that license to apply to this presentation (it’s not because I particularly want my slides referenced but more to acknowledge the sources and generosity of others who have provided free material such as the images I have used). Image sources: man working on laptop – flickr scroll, video reel – Pixabay http://pixabay.com/ (public domain) References: Verilog code adapted from http://www.asic-world.com/examples/verilog