Instructor: Dr. Phillip Jones CPRE 583 Reconfigurable Computing Lecture 8: Fri 10/30/2009 (System Architectures) Instructor: Dr. Phillip Jones (phjones@iastate.edu) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA http://class.ee.iastate.edu/cpre583/
Overview Class Projects Common System Architectures
Project Grading Breakdown 60% Final Project Demo 30% Final Project Report 30% of your project report grade will come from your 5 project updates. Friday’s midnight 10% Final Project Presentation
Project Update The current state of your project write up Even in the early stages of the project you should be able to write a rough draft of the Introduction and Motivation section The current state of your Final Presentation What things are work & not working What roadblocks are you running into
What you should learn Introduction to common System Architectures
Outline System Architectures Why are they useful? Examples
References Reconfigurable Computing (2008) [1] Chapter 5: Compute Models and System Architectures Scott Hauck, Andre DeHon
System Architectures Compute Models: Help express the parallelism of an application System Architecture: How to organize application implementation
Efficient Application Implementation Compute model and system architecture should work together Both are a function of The nature of the application Required resources Required performance The nature of the target platform Resources available
Efficient Application Implementation (Image Processing) Platform 1 (Vector Processor) Platform 2 (FPGA)
Efficient Application Implementation (Image Processing) Compute Model System Architecture Platform 1 (Vector Processor) Platform 2 (FPGA)
Efficient Application Implementation (Image Processing) Compute Model System Architecture Platform 1 (Vector Processor) Platform 2 (FPGA)
Efficient Application Implementation (Image Processing) Data Flow Compute Model Streaming Data Flow System Architecture Platform 1 (Vector Processor) Platform 2 (FPGA)
Efficient Application Implementation (Image Processing) Data Flow Compute Model Streaming Data Flow System Architecture Platform 1 (Vector Processor) Platform 2 (FPGA)
Efficient Application Implementation (Image Processing) Compute Model System Architecture Platform 1 (Vector Processor) Platform 2 (FPGA)
Efficient Application Implementation (Image Processing) Data Parallel Compute Model Vector System Architecture Platform 1 (Vector Processor) Platform 2 (FPGA)
Efficient Application Implementation (Image Processing) Data Flow Compute Model Streaming Data Flow System Architecture Platform 1 (Vector Processor) Platform 2 (FPGA)
Efficient Application Implementation (Image Processing) Data Flow Compute Model Streaming Data Flow System Architecture Platform 1 (Vector Processor) Platform 2 (FPGA)
Implementing Streaming Dataflow Data presence variable length connections between operators data rates vary between operator implementations data rates varying between operators Datapath sharing not enough spatial resources to host entire graph balanced use of resources (e.g. operators) cyclic dependencies impacting efficiency Interconnect sharing Interconnects are becoming difficult to route Links between operators infrequently used High variability in operator data rates Streaming coprocessor Extreme resource constraints
Data Presence X X +
Data Presence X X data_ready data_ready + data_ready
Data Presence X X FIFO FIFO data_ready data_ready + FIFO data_ready
Data Presence X X stall stall FIFO FIFO data_ready data_ready + FIFO
Data Presence Flow control: Term typical used in networking X X stall FIFO FIFO data_ready data_ready + FIFO stall data_ready Flow control: Term typical used in networking
Data Presence Flow control: Term typical used in networking Increase flexibility of how application can be implemented X X stall stall FIFO FIFO data_ready data_ready + FIFO stall data_ready Flow control: Term typical used in networking
Implementing Streaming Dataflow Data presence variable length connections between operators data rates vary between operator implementations data rates varying between operators Datapath sharing not enough spatial resources to host entire graph balanced use of resources (e.g. operators) cyclic dependencies impacting efficiency Interconnect sharing Interconnects are becoming difficult to route Links between operators infrequently used High variability in operator data rates Streaming coprocessor Extreme resource constraints
Datapath Sharing X X +
Datapath Sharing Platform may only have one multiplier X X +
Datapath Sharing Platform may only have one multiplier X +
Datapath Sharing Platform may only have one multiplier REG X REG +
Datapath Sharing Platform may only have one multiplier REG X FSM REG +
Datapath Sharing Platform may only have one multiplier REG X FSM REG + Important to keep track of were data is coming!!
Implementing Streaming Dataflow Data presence variable length connections between operators data rates vary between operator implementations data rates varying between operators Datapath sharing not enough spatial resources to host entire graph balanced use of resources (e.g. operators) cyclic dependencies impacting efficiency Interconnect sharing Interconnects are becoming difficult to route Links between operators infrequently used High variability in operator data rates Streaming coprocessor Extreme resource constraints
Interconnect sharing X X +
Interconnect sharing Need more efficient use of interconnect X X +
Interconnect sharing Need more efficient use of interconnect X X +
Interconnect sharing Need more efficient use of interconnect X X FSM +
Implementing Streaming Dataflow Data presence variable length connections between operators data rates vary between operator implementations data rates varying between operators Datapath sharing not enough spatial resources to host entire graph balanced use of resources (e.g. operators) cyclic dependencies impacting efficiency Interconnect sharing Interconnects are becoming difficult to route Links between operators infrequently used High variability in operator data rates Streaming coprocessor Extreme resource constraints
Streaming coprocessor
Sequential Control Typically thought of in the context of sequential programming on a processor (e.g. C, Java programming) Key to organizing synchronizing and control over highly parallel operations Time multiplexing resources: when task to too large for computing fabric Increasing data path utilization
Sequential Control X + A B C
Sequential Control X + A B C A*x2 + B*x + C
Sequential Control X + A B C C A B X X + A*x2 + B*x + C A*x2 + B*x + C
Finite State Machine with Datapath (FSMD) B X X + A*x2 + B*x + C
Finite State Machine with Datapath (FSMD) B X FSM X + A*x2 + B*x + C
Sequential Control: Types Finite State Machine with Datapath (FSMD) Very Long Instruction Word (VLIW) data path control Processor Instruction augmentation Phased reconfiguration manager Worker farm
Very Long Instruction Word (VLIW) Datapath Control
Processor
Instruction Augmentation
Phased Configuration Manager
Worker Farm
Bulk Synchronous Parallelism
Data Parallel Single Program Multiple Data Single Instruction Multiple Data (SIMD) Vector Vector Coprocessor
Data Parallel
Data Parallel
Data Parallel
Data Parallel
Cellular Automata
Multi-threaded
Next Lecture Evolvable Hardware (Chapter 33)
Slides in Progress Need to revise this lecture with figures, and useful animations Add some non-FPGA systems, maybe not since GARP, and PipeRench were discussed in last lecture. Perhaps just mention again Main reason other archs are not used is economy of scales. Lots of FPGAs are manufacture, thus lowing cost and enable the use of state of the art fab technology (given high performance