The Microprocessor is no more General Purpose
Design Gap
Problems with Fine Grained Approach FPGAs Area in-efficient – Percentage of chip area for wiring far too high Too slow – Unavoidable critical paths too long Routing and Placement is very complex
Problems with Fine Grained FPGAs
Coarse Grained Reconfigurable computing Uses reconfigurable arrays with path-widths greater than 1 bit More area-efficient Massive reduction in configuration memory and configuration time Drastic reduction in complexity of Placement & Routing
Coarse Grained Architectures Classification Mesh-based Linear Arrays based Cross-bar based
Mesh Based Architectures Arranges PEs in a 2-D array Encourages nearest neighbor links between adjacent PEs Eg. KressArray, Matrix, RAW, CHESS
Matrix – Mesh based Architecture
Matrix – Mesh Based Architecture
Architectures based on Linear Arrays Aimed at mapping pipelines on linear arrays If pipeline has forks longer lines spanning whole or part of the array are used Eg. RaPiD, PipeRench
PipeRench – Linear Array based architecture
PipeRench – Linear Array Based Architecture
Cross-bar based Architectures Communication Network is easy to route Uses restricted cross-bars with hierarchical interconnect to save area Eg. PADDI-1, PADDI-2, Pleiades
PADDI-2 – Cross-bar based architecture
PADDI-2 Cross-bar based Architecture
Coarse Grained Architectures
EGRA Architectural template to enable design space exploration Execute expressions as opposed to operations Supports heterogeneous cells and various memory interfaces
EGRA
Evolution of fine grained and coarse grained architectures
EGRA – at Cell Level
Architectural Exploration
Architectural exploration
EGRA vs CGRA vs FPGA
EGRA – at array level Organized as a mesh of cells of three types – RACs – Memories – Multipliers Cells are connected using both nearest neighbor and horizontal-vertical buses Each cell has a I/O interface, context memory and core
Control Unit
EGRA Operation DMA mode – Used to transfer data in bursts to EGRA – To program cells and to read/write from scratchpad memories Execution mode – Control unit orchestrates data flow between cells
EGRA – at array level
Experimental Results
EGRA Memory Interface Data register at the output of computational cells Memory cells can be scattered around in the array A scratchpad memory outside reconfigurable mesh
Architectural exploration - Area
Architectural exploration - Delay
MORA
The reconfigurable Cell
Operating modes of RC
Interconnection Topology Hierarchical – Level 1 used within 4x4 quadrant to provide nearest neighbor connectivity – Interleaved Horizontal and Vertical connectivity of length two – Each RC can receive data from at most two other RCs and send data to at-most four other RCs – Data and control across quadrants is guaranteed over Level 2 interconnection
Interconnection Topology
Computational Strategies Temporal computational load balancing Spatial computational load balancing