1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.

Slides:



Advertisements
Similar presentations
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Advertisements

Lecture 6: Multicore Systems
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
PipeRench: A Coprocessor for Streaming Multimedia Acceleration Seth Goldstein, Herman Schmit et al. Carnegie Mellon University.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Data Manipulation Computer System consists of the following parts:
Chapter Hardwired vs Microprogrammed Control Multithreading
Chapter 17 Parallel Processing.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
GCSE Computing - The CPU
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Chapter 6 Memory and Programmable Logic Devices
Computer Organization and Assembly language
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Computer Organization & Assembly Language
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
Reconfigurable Computing. Lect-02.2 Course Schedule Introduction to Reconfigurable Computing FPGA Technology, Architectures, and Applications FPGA Design.
1 - ECpE 583 (Reconfigurable Computing): Course overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 1: Wed 8/24/2011 (Course.
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
ECE 465 Introduction to CPLDs and FPGAs Shantanu Dutt ECE Dept. University of Illinois at Chicago Acknowledgement: Extracted from lecture notes of Dr.
Automated Design of Custom Architecture Tulika Mitra
Lecture 1 ECE Spring 2000 ECE 291 Spring 2000 Lecture 1: Microprocessor Evolution & Organization Constantine D. Polychronopoulos Professor, ECE.
CS25212 Coarse Grain Multithreading Learning Objectives: – To be able to describe a coarse grain multithreading implementation – To be able to estimate.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
J. Christiansen, CERN - EP/MIC
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
COMP25212 CPU Multi Threading Learning Outcomes: to be able to: –Describe the motivation for multithread support in CPU hardware –To distinguish the benefits.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
1 - CPRE 583 (Reconfigurable Computing): Reconfiguration Management Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 5: Wed 10/14/2009.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
1 - CPRE 583 (Reconfigurable Computing): System Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 13: Fri 10/8/2010.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Lecture 3: Computer Architectures
1 - CPRE 583 (Reconfigurable Computing): System Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 21: Fri 11/4/2011.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
1 - ECpE 583 (Reconfigurable Computing): CoreGen Overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 18: Wed 10/26/2011 (CoreGen.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
1 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 23:
My Coordinates Office EM G.27 contact time:
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
PipeliningPipelining Computer Architecture (Fall 2006)
Computer Organization and Architecture Lecture 1 : Introduction
Topics Coarse-grained FPGAs. Reconfigurable systems.
ECE354 Embedded Systems Introduction C Andras Moritz.
How do we evaluate computer architectures?
Instructor: Dr. Phillip Jones
Implementation of IDEA on a Reconfigurable Computer
Dynamically Reconfigurable Architectures: An Overview
URECA: A Compiler Solution to Manage Unified Register File for CGRAs
Instructor: Dr. Phillip Jones
A High Performance SoC: PkunityTM
Chapter 1 Introduction.
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Computer Evolution and Performance
What is Computer Architecture?
Hardware Architectures for Deep Learning
Computer Architecture
CSE378 Introduction to Machine Organization
Presentation transcript:

1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive a Reconfigurable Architecture –Price Mass production 100K to millions Experimental 1 to 10’s –Granularity of reconfiguration Fine grain Course Grain –Degree of system integration/coupling Tightly Loosely All are a function of the application that will run on the Architecture

2 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Example Points in (Price,Granularity,Coupling) Space Price $100’s $1M’s Granularity Coarse Fine Coupling Loose Tight Intel / AMD Int float RFU Processor PC ML507 Ethernet Decode Exec Store

3 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) What’s the point of a Reconfigurable Architecture Performance metrics –Computational Throughput Latency –Power Total power dissipation Thermal –Reliability Recovery from faults Increase application performance!

4 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Typical Approach for Increasing Performance Application/algorithm implemented in software –Often easier to write an application in software Profile application (e.g. gprof) –Determine where the application is spending its time Identify kernels of interest –e.g. application spends 90% of its time in function matrix_multiply() Design custom hardware/instruction to accelerate kernel(s) –Analysis to kernel to determine how to extract fine/coarse grain parallelism (does any parallelism even exist?) Amdahl’s Law!

5 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity

6 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Coarse Grain rDPA: reconfigurable Data Path Array Function Units with programmable interconnects ALU Example

7 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Coarse Grain rDPA: reconfigurable Data Path Array Function Units with programmable interconnects ALU Example

8 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Coarse Grain rDPA: reconfigurable Data Path Array Function Units with programmable interconnects ALU Example

9 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Fine Grain FPGA: Field Programmable Gate Array Sea of general purpose logic gates CLB Configurable Logic Block

10 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Fine Grain FPGA: Field Programmable Gate Array Sea of general purpose logic gates CLB Configurable Logic Block

11 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Fine Grain FPGA: Field Programmable Gate Array Sea of general purpose logic gates CLB Configurable Logic Block

12 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Microprocessor

13 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Microprocessor A B op 3

14 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Microprocessor A B op A B A B 3

15 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Microprocessor A B op A B A B

16 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Microprocessor A B op 3

17 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Bit logic and constants

18 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Bit logic and constants (A and “1100”) or (B or “1000”)

19 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Bit logic and constants (A and “1100”) or (B or “1000”) A B

20 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Trade-offs Trade-offs associated with LUT size Example: 2-LUT (4=2x2 bits) vs. 10-LUT (1024=32x32 bits) 1024-bits 2-LUT 10-LUT Bit logic and constants (A and “1100”) or (B or “1000”) A AND OR 1 0 B 4 4 It’s much worse, each 10-LUT only has one output Area that was required using 2-LUTS

21 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: Example Architectures Fine grain: GARP Course grain: PipeRench

22 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: GARP CPU RFU Garp chip Memory I-cache D-cache Config cache

23 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: GARP CPU RFU Garp chip Memory I-cache D-cache Config cache RFU control (1) Execution (16, 2-bit) N PE (Processing Element)

24 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: GARP CPU RFU Garp chip Memory I-cache D-cache Config cache RFU control (1) Execution (16, 2-bit) N PE (Processing Element) Example computations in one cycle A<<10 | (b&c) (A-2*b+c)

25 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: GARP CPU RFU Garp chip Memory I-cache D-cache Config cache Impact of configuration size 1 GHz bus frequency 128-bit memory bus 512Kbits of configuration size On a RFU context switch how long to load a new full configuration? 4 microseconds An estimate of amount of time for the CPU perform a context switch is ~5 microseconds ~2x increase context switch latency!!

26 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: GARP CPU RFU Garp chip Memory I-cache D-cache Config cache RFU control (1) Execution (16, 2-bit) N PE (Processing Element) “The Garp Architecture and C Compiler”

27 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Coarse granularity Higher (higher) level programming Reference papers PipeRench: A Coprocessor for Streaming Multimedia Acceleration (ISCA 1999): PipeRench Implementation of the Instruction Path Coprocessor (Micro 2000): pdf pdf

28 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Interconnect 8-bit ALU Reg file PE 8-bit ALU Reg file PE 8-bit ALU Reg file PE Interconnect 8-bit ALU Reg file PE 8-bit ALU Reg file PE 8-bit ALU Reg file PE 8-bit ALU Reg file PE 8-bit ALU Reg file PE 8-bit ALU Reg file PE Global bus

29 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE Cycle Pipeline stage

30 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE 0 Cycle Pipeline stage

31 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE 0 Cycle Pipeline stage

32 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE 0 Cycle Pipeline stage

33 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE 0 Cycle Pipeline stage

34 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE 0 Cycle Pipeline stage

35 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE 0 Cycle Pipeline stage

36 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE 0 Cycle Pipeline stage Cycle Pipeline stage

37 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE 0 Cycle Pipeline stage Cycle Pipeline stage

38 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE 0 Cycle Pipeline stage Cycle Pipeline stage

39 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE 0 Cycle Pipeline stage Cycle Pipeline stage

40 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE 0 Cycle Pipeline stage Cycle Pipeline stage

41 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE 0 Cycle Pipeline stage Cycle Pipeline stage

42 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench PE 0 Cycle Pipeline stage Cycle Pipeline stage

43 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Independent Reconfigurable Coprocessor –Reconfigurable Fabric does not have direct communication with the CPU Processor + Reconfigurable Processing Fabric –Loosely coupled on the same chip –Tightly coupled on the same chip

44 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Main Memory CPU Fetch Decode Execute Memory Write Back L1 Cache L2 Cache Memory Controller DMA Controller I/O Controller USB PCI PCI-ExpressSATA Hard Drive NIC ALU FPU

45 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Main Memory CPU Fetch Decode Execute Memory Write Back L1 Cache L2 Cache Memory Controller DMA Controller I/O Controller USB PCI PCI-ExpressSATA Hard Drive NIC ALU FPU RPF

46 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Main Memory CPU Fetch Decode Execute Memory Write Back L1 Cache L2 Cache Memory Controller DMA Controller I/O Controller USB PCI PCI-ExpressSATA Hard Drive NIC ALU FPU RPF

47 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Main Memory CPU Fetch Decode Execute Memory Write Back L1 Cache L2 Cache Memory Controller DMA Controller I/O Controller USB PCI PCI-ExpressSATA Hard Drive NIC ALU FPU RPF Config I/F

48 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Main Memory CPU Fetch Decode Execute Memory Write Back L1 Cache L2 Cache Memory Controller DMA Controller I/O Controller USB PCI PCI-ExpressSATA Hard Drive NIC ALU FPU RPF Config I/F

49 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Main Memory CPU Fetch Decode Execute Memory Write Back L1 Cache L2 Cache Memory Controller DMA Controller I/O Controller USB PCI PCI-ExpressSATA Hard Drive NIC ALU FPU RPF I/O Config I/F

50 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Degree of Integration/Coupling Main Memory CPU Fetch Decode Execute Memory Write Back L1 Cache L2 Cache Memory Controller DMA Controller I/O Controller USB PCI PCI-ExpressSATA Hard Drive NIC ALU FPU RFU

51 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames)

52 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Next Class Reconfiguration Management –Chapter 4

53 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Questions/Comments/Concerns Write down –Main point of lecture –One thing that’s still not quite clear –If everything is clear, then give an example of how to apply something from lecture OR

54 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Lecture notes

55 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: PipeRench Scheduling virtual stage on to physical Partial/Dynamically reconfig (each cycle)

56 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Granularity: GARP Impact of configuration size on performance Context switching Garp feature Dynamic reconfigurable Store multiple configurations in an on chip cache (4) One configuration at a time Example app mapping to GARP (loop) Amdahl's Law The Garp Architecture and C Compiler

57 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Overview Dimensions –Price –Granularity –Coupling –To optimize App Performance (compute (throughput, latency), Power, reliability) RPF to efficiently implement VICs –Main picture authors' wants to convey What’s the point or having a Reconfigure arch –Example (Increase App performance) App -> SW/CPU Profile ID kernels of intense compute Design custom hardware/instruction (Amdels law) –Intel FPL paper, great example for reading by Friday

58 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures RPF -> VIC (short slide)