CS294-6 Reconfigurable Computing Day 25 Heterogeneous Systems and Interfacing.

CS294-6 Reconfigurable Computing Day 25 Heterogeneous Systems and Interfacing

Previously Homogenous model of computational array –single word granularity, depth, interconnect –all post-fabrication programmable Understand tradeoffs of each

Today Heterogeneous architectures –Why? –How? catalog of techniques fit in framework optimization and mapping

Why? Why would we be interested in heterogeneous architecture? –E.g.

Why? Applications have a mix of characteristics Already accepted –seldom can afford to build most general (unstructured) array bit-level, deep context, p=1 –=> are picking some structure to exploit May be beneficial to have portions of computations optimized for different structure conditions.

Examples Processor+FPGA Processors or FPGA add –multiplier or MAC unit –FPU –Motion Estimation coprocessor

Optimization Prospect Less capacity for composite than either pure –(A 1 +A 2 )T 12 < A 1 T 1 –(A 1 +A 2 )T 12 < A 2 T 2

Optimization Prospect Example Floating Point –Task: I integer Ops + F FP-ADDs –A proc =125M 2 –A FPU =40M 2 –I cycles / FP Ops = 60 –125(I+60F)  165(I+F) (7500-165)/40 = I/F 183  I/F

How? Design issues: –Interconnect space and time –Control –Instructions configuration path and control Mapping Cost/Benefits: –Costs Area, Power –Performance Bandwidth, latency

Interconnect Bus (degenerate network) Memory (shared retiming resource) RF/Coproc (traditional processor inter.) Network

Interconnect: Bus Minimal physical network –shared with memory and/or other peripherals –10s-100s of cycles away from processor (fpga) –low->moderate bandwidth –can handle multiple, different functional units but serial bottleneck of bus prevents simultaneous communication among devices

Interconnect: Bus Example XC6200

Interconnect: Memory Use memory (retiming) block to buffer data between heterogeneous regions –DMA (usually implies shared bus) –FIFO –dual port or shared RAM decoupled, moderate latency (10-100cycles) moderate bandwidth

Interconnect: Memory Example PAM, SPLASH

Interconnect: RF/Coproc Coupled directly to processor datapath –low latency (1-2 cycles) –moderately high bandwidth limited by RF ports and control

Interconnect: RF/Coproc Examples GARP, Chimaera –(more on this case Thursday)

Interconnect: Network Unify spatial network composing various heterogeneous components –high bandwidth –latency vary with distance –support simultaneous operate and data transfer –potentially dominant cost A interconnect > A function –Granularity question coarse (large blocks of each type) fine (interleaved)

Interconnect: Network Coarse Cheops, Pleiades

HSRA Heterogeneous Blocks

Interconnect: Network Coarse vs. Fine Multiplier/FPGA example

Interconnect: Network Coarse vs. Fine Fine –possibly share interconnect –locality –uniform tiling –not shared may get concentrations of heavy/no use interconnect limit use as independent resources –ratio less flexible? –More difficult design Coarse –flexible ratio –easier to keep dense homogeneous blocks –requires own interconnect –doesn’t disrupt base layout(s) –non-local route to/from more/longer wires –boundaries in net

Admin For POWER: Update on –rcore simulation –HSRA energy –Jsim size problems? Fix in works

Control As before –How many controllers? –How many pinsts slaved off of each? Common classes: –Single Controller / Lock-step –Decoupled, datastream –Autonomous MIMD

Control: Lockstep Master controller (usually processor) –issues instruction (instruction tag) every cycle explicitly when device should operate –Single thread of control everything known to be in synch –Idle while processor doing other tasks –Ex. VLIW (TriMedia), PRISC, GARP

Control: Data Stream Configure then run on data decoupled from control processor –run parallel with processor processor run orthogonal tasks maybe several simultaneous tasks running on spatial –unit not typically fed by processor directly –need to synchronize data transfer and operation polling, interrupt, semaphore –Ex. Cheops, PADDI-2, Pleiades, SPLASH

Control: Autonomous (MIMD) Multiple (potential) control processors –not necessarily slaved –distribute control –more care in synchronization –Ex. Floe (MagicEight)

HSRA Multi-Hetero Coupling –unifying networks Balance –sequential/spatial –control units w/ management task

Configuration Share interface bus –config and data XC6200, PAM, SPLASH –config and memory path GARP Separate path / network –VLIW, Pleiades Explicit –XC6200, PAM, SPLASH, …. Implicit –GARP/PRISC

Mapping –often option on where runs –must sort out what goes where faster in one resource? …but limited number of each resource

Mapping: Limited Resource What runs on faster, limited resource? –E.g. Tim’s C extraction last-time –General: what allocated to resource when reconfigure N candidate ops -> each choice –Greedy Break into temporal regions –local working set and points of reconfiguration while resource available –add op offering most benefit

Mapping: Spatial Choice Different kinds of resources –(e.g. LUTs, multipliers) Multiple resources can solve same problem Limited number of each resource Match users with resources

Mapping: Bipartite Partitioning =>Bipartite matching –deal with unit resource consumption –also w/ regional/interconnect constraints –not directly deal with performance… Postpass(?) allocate –faster resources to critical path ? N of R1 vs. M of R2 Example/Details: Liu FPGA98

Mapping More common: –can solve with: 12A’s and 2B’s or 4 A’s and 4 B’s –common need 4 A’s and 2 B’s –choice 8 A’s vs. 2 B’s

Highlights Fit into existing framework –not that much new here –new issue: who and how share resources Issues: interconnect, control + density when hit balance - efficiency when balance mismatch - harder mapping (resource sharing)

CS294-6 Reconfigurable Computing Day 25 Heterogeneous Systems and Interfacing.

Similar presentations

Presentation on theme: "CS294-6 Reconfigurable Computing Day 25 Heterogeneous Systems and Interfacing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS294-6 Reconfigurable Computing Day 25 Heterogeneous Systems and Interfacing.

Similar presentations

Presentation on theme: "CS294-6 Reconfigurable Computing Day 25 Heterogeneous Systems and Interfacing."— Presentation transcript:

Similar presentations

About project

Feedback