Presentation is loading. Please wait.

Presentation is loading. Please wait.

CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #15 – Midterm.

Similar presentations


Presentation on theme: "CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #15 – Midterm."— Presentation transcript:

1 CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #15 – Midterm Review

2 Lect-15.2CprE 583 – Reconfigurable ComputingOctober 10, 2006 Project Proposals Group 1 – FPGA Implementation of Frequency- Domain Audio Effects Processor Five-band equalizer Frequency shifter

3 Lect-15.3CprE 583 – Reconfigurable ComputingOctober 10, 2006 Project Proposals (cont.) Group 2 – Transparent FPGA-Based Network Analyzer Layer I pass-through Layer II passive analyzer

4 Lect-15.4CprE 583 – Reconfigurable ComputingOctober 10, 2006 Project Proposals (cont.) Group 3 – FPGA-Based Library Design for Linear Algebra Applications Floating-point sparse matrix-vector multiplication Floating-point banded matrix-vector multiplication Floating-point lower-upper matrix decomposition

5 Lect-15.5CprE 583 – Reconfigurable ComputingOctober 10, 2006 Project Proposals (cont.) Group 4 – An Improved Approach of Configuration Compression for FPGA-Based Embedded Systems Improved compression algorithms LUT-reordering techniques

6 Lect-15.6CprE 583 – Reconfigurable ComputingOctober 10, 2006 Project Proposals (cont.) Others Projects: Group 5 – FPGA Ternary Data Conversion Group 6 – Analysis of Sobel Edge Detection Implementations Group 7 – Design and Analysis of Artificial Neural Networks on FPGAs Reminders: 11/16 – Project Updates (10 minutes) 12/5-12/7 – Final Presentations (25 minutes) 12/15 – Final Reports

7 Lect-15.7CprE 583 – Reconfigurable ComputingOctober 10, 2006 $ Midterm Review Using the Silicon PE $ MPP PE More Cache PE $ MMX SSE FFTAES CISC Superscalar PE $ Vector PE Reconfigurable Fabric PE Reconfigurable Processor $

8 Lect-15.8CprE 583 – Reconfigurable ComputingOctober 10, 2006 Computational Density (Qualitative) FPGAs can complete more work per unit time than a processor or DSP: Less instruction overhead More active computation onto the same silicon area (allows for more parallelism) Can control operations at the bit level (as opposed to word level) Intel Pentium 4 Actel ProASIC

9 Lect-15.9CprE 583 – Reconfigurable ComputingOctober 10, 2006 Coupling in a Reconfigurable System Standalone Processing Unit I/O Interface Attached Processing Unit Workstation Memory Caches Coprocessor CPU FU Many places to put reconfigurable computing components Most implementations involve multiple discrete devices How should these devices be connected together?

10 Lect-15.10CprE 583 – Reconfigurable ComputingOctober 10, 2006 Generic FPGA Architecture CLB IOB Input/Output Buffers (IOBs) Configurable Logic Blocks (CLBs) Programmable interconnect mesh Island-style FPGA architecture FPGA = Field-Programmable Gate Array

11 Lect-15.11CprE 583 – Reconfigurable ComputingOctober 10, 2006 FPGA Technology Various FPGA programming technologies (Anti-fuse, (E)EPROM, Flash, SRAM): SRAM most popular

12 Lect-15.12CprE 583 – Reconfigurable ComputingOctober 10, 2006 LUTs and Digital Logic k inputs  2 k possible input values k-LUT corresponds to 2 k x 1 bit memory Truth table is stored 2 2 k possible functions – O(2 2 k / k!) unique F = A0A1A2 + Ā0A1Ā2 + Ā0 Ā1 Ā2 0 0 0 0 0 1 0 1 0 0 1 1 A 0 A 1 A 2 00000000 0 10001000 1 01000100 2 11001100 3 00100010 4 10101010 5 01100110 6 11101110 7 00010001 8 10011001 9 01010101 10 11011101 11 00110011 12 10111011 13 01110111 14 11111111 15 1 0 0 1 0 1 1 1 0 1 1 1 11111111 01110111 10111011 00110011 11011101 01010101 10011001 00010001 11101110 01100110 10101010 00100010 11001100 01000100 10001000 00000000 10101010 255 00010001...

13 Lect-15.13CprE 583 – Reconfigurable ComputingOctober 10, 2006 Architectural Issues [AhmRos04A] What values of N, I, and K minimize the following parameters? Area Delay Area-delay product Assumptions All routing wires length 4 Fully populated IMUX Wiring is half pass transistor, half tri-state

14 Lect-15.14CprE 583 – Reconfigurable ComputingOctober 10, 2006 FPGA Arithmetic Traditional microprocessors, DSPs, etc. don’t use LUTs Instead use a w-bit Arithmetic and Logic Unit (ALU) Carry connections are hard-wired No switches, no stubs, short wires (1) AND2 OR2 XOR2 (2) ADD SUB CMP 3-LUT (2) ALU AB Out Op (1, 2) 2-LUT AB Out (1) AB Cin Sum Cout Sum Cout / Cin AB

15 Lect-15.15CprE 583 – Reconfigurable ComputingOctober 10, 2006 FPGA Arithmetic (cont.) Hard-wired carry logic support Altera FLEX 8000Xilinx XCV4000

16 Lect-15.16CprE 583 – Reconfigurable ComputingOctober 10, 2006 Arithmetic (cont.) + X0X0 Y0Y0 X1X1 X2X2 X3X3 Z0Z0 Y1Y1 X0X0 + X1X1 + X2X2 + X3X3 Z1Z1 + Y2Y2 ++ + + Y3Y3 ++ + Z2Z2 Carry save multiplication

17 Lect-15.17CprE 583 – Reconfigurable ComputingOctober 10, 2006 LUT-Based Constant Multipliers Constants can be changed in the LUTs to program new multipliers 4-LUT N 0 –N 7 10101011 x NNNNNNNN AAAAAAAAAAAA (N * 1011 (LSN)) + BBBBBBBBBBBB (N * 1010 (MSN)) SSSSSSSSSSSSSSSS Product 4-LUT N 0 –N 7 4-LUT + S 0 –S 15 A 0 –A 11 B 4 –B 15

18 Lect-15.18CprE 583 – Reconfigurable ComputingOctober 10, 2006 Capacity Trends Year 1985 Xilinx Device Complexity XC2000 50 MHz 1K gates XC4000 100 MHz 250K gates Virtex 200 MHz 1M gates Virtex-II 450 MHz 8M gates Spartan 80 MHz 40K gates Spartan-II 200 MHz 200K gates Spartan-3 326 MHz 5M gates 19911987 XC3000 85 MHz 7.5K gates Virtex-E 240 MHz 4M gates XC5200 50 MHz 23K gates 199519981999200020022003 Virtex-II Pro 450 MHz 8M gates* 20042006 Virtex-4 500 MHz 16M gates* Virtex-5 550 MHz 24M gates*

19 Lect-15.19CprE 583 – Reconfigurable ComputingOctober 10, 2006 Splash 1 Architecture F3F3 M3M3 VME Bus Interface VSB Bus Control Interface FIFO IN FIFO OUT M4M4 F4F4 F 11 M 11 M 12 F 12 F0F0 M0M0 M7M7 F7F7 F8F8 M8M8 M 15 F 15 F1F1 M1M1 M6M6 F6F6 F9F9 M9M9 M 14 F 14 F2F2 M2M2 M5M5 F5F5 F 10 M 10 M 13 F 13 F 31 M 31 M 24 F 24 F 23 M 23 M 16 F 16 F 28 M 28 M 27 F 27 F 20 M 20 M 19 F 19 F 29 M 29 M 26 F 26 F 21 M 21 M 18 F 18 F 30 M 30 M 25 F 25 F 22 M 22 M 17 F 17

20 Lect-15.20CprE 583 – Reconfigurable ComputingOctober 10, 2006 FPGA-based Router FPX module contains two FPGAs NID – network interface device Performs data queuing RAD – reprogrammable application device Specialized control sequences

21 Lect-15.21CprE 583 – Reconfigurable ComputingOctober 10, 2006 Mesh Topology Chips are connected in a nearest-neighbor pattern Simplicity is key Linear array is essentially a 1- dimensional mesh ABCDEFGHI

22 Lect-15.22CprE 583 – Reconfigurable ComputingOctober 10, 2006 Other Topologies Crossbar topology: Devices A-D are routing only Gives predictable performance Potential waste of resources for near-neighbor connections ABCD WXYZ

23 Lect-15.23CprE 583 – Reconfigurable ComputingOctober 10, 2006 Logic Emulation Emulation takes a sizable amount of resources Compilation time can be large due to FPGA compiles

24 Lect-15.24CprE 583 – Reconfigurable ComputingOctober 10, 2006 Systolic Architectures Goal – general methodology for mapping computations into hardware (spatial computing) structures Composition: Simple compute cells (e.g. add, sub, max, min) Regular interconnect pattern Pipelined communication between cells I/O at boundaries x x +x min xc

25 Lect-15.25CprE 583 – Reconfigurable ComputingOctober 10, 2006 Finite Impulse Response Sequential Memory bandwidth per output – 2k+1 O(k) cycles per output O(1) hardware x + xixi w1w1 x + w2w2 x + w3w3 x + w4w4 yiyi Systolic Memory bandwidth per output – 2 O(1) cycles per output O(k) hardware

26 Lect-15.26CprE 583 – Reconfigurable ComputingOctober 10, 2006 Matrix-Vector Product t = 3 t = 2 t = 1 t = 4 a 31 a 21 a 11 a 41 a 22 a 12 – a 23 a 13 – – a 23 – – – a 14 x1x1 x2x2 x3x3 x4x4 y2y2 y3y3 y4y4 y1y1 t = n+1 t = n+2 t = n+3 t = n xnxn … – – – –

27 Lect-15.27CprE 583 – Reconfigurable ComputingOctober 10, 2006 Circuit Netlist and Mapping LUT2 LUT3 LUT4LUT5 FF1 FF2 LUT1 LUT0

28 Lect-15.28CprE 583 – Reconfigurable ComputingOctober 10, 2006 Placing and Routing Programmable Connections FPGA

29 Lect-15.29CprE 583 – Reconfigurable ComputingOctober 10, 2006 Next Steps VHDL / VHDL for Synthesis LIBRARY ieee ; USE ieee.std_logic_1164.all ; ENTITY implied IS PORT ( A, B : IN STD_LOGIC ; AeqB: OUT STD_LOGIC ) ; END implied ; ARCHITECTURE Behavior OF implied IS BEGIN PROCESS ( A, B ) BEGIN IF A = B THEN AeqB <= '1' ; END IF ; END PROCESS ; END Behavior ; A B AeqB

30 Lect-15.30CprE 583 – Reconfigurable ComputingOctober 10, 2006 HW/SW Co-Design ARM Core SOCKET #1 Cache ARM Local Memory Comm. Buffer Socket Handler Mem. Access Socket Handler SOCKET #2 ARMulator ARM core simulator ARMulator API Modelsim Modelsim FLI HDL simulator AHB Slave I/F AHB Master I/F AMBAAMBA AHB Slave I/F ASIC / FPGA Shared Memory Comm. Buffer

31 Lect-15.31CprE 583 – Reconfigurable ComputingOctober 10, 2006 Multi-Context FPGAs

32 Lect-15.32CprE 583 – Reconfigurable ComputingOctober 10, 2006 Function Unit Architectures RaPiD: Reconfigurable Pipelined Datapath Linear array of function units Function type determined by application Function units are connected together as needed using segmented buses Data enters the pipeline via input streams and exits via output streams

33 Lect-15.33CprE 583 – Reconfigurable ComputingOctober 10, 2006 High-Level Compilation C Program SUIF to GCC C to RTL VHDL/Verilog VHDL to FPGA Synthesis Binaries for FPGAs (Xilinx) Object code for Embedded (SA) GCC compiler for embedded C Libraries on various Targets Directives and Automation VHDL to ASIC Synthesis Chip layouts (0.18u TSMC) C to RTL VHDL/Verilog HW / SW Partitioner SUIF frontend

34 Lect-15.34CprE 583 – Reconfigurable ComputingOctober 10, 2006 Other Topics? Second course survey next week Provide general feedback, suggest additional topics

35 Lect-15.35CprE 583 – Reconfigurable ComputingOctober 10, 2006 Midterm Exam Three questions Review Analysis Extension Any paper mentioned in class is fair game Due in 48 hours (10/12 – 2:00pm) No class on Thursday! Some restrictions: Work alone Can ask if something is unclear (“what does this mean?” questions, not “how do I do this?” questions) No late submissions – strict WebCT deadline


Download ppt "CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #15 – Midterm."

Similar presentations


Ads by Google