Download presentation
Presentation is loading. Please wait.
1
Peer-to-peer Hardware-Software Interfaces for Reconfigurable Fabrics Mihai Budiu Mahim Mishra Ashwin Bharambe Seth Copen Goldstein Carnegie Mellon University
2
Peer-to-peer hw/sw interfaces Reconfigurable Hardware CacheLogic Resources Galore 20022007
3
Peer-to-peer hw/sw interfaces Fixed Why RH: Computational Bandwidth CPU “Unbounded” RH
4
Peer-to-peer hw/sw interfaces Partition Application C ProgramHDL CADCompiler OS support communication Using RH Today
5
Peer-to-peer hw/sw interfaces Computer System Tomorrow high-ILP computation low-ILP computation + OS + VM CPURH Memory Tight coupling
6
Peer-to-peer hw/sw interfaces This Work HLL Program Partitioning We suggest a high-level mechanism (not a policy). CPURH Memory ccCAD
7
Peer-to-peer hw/sw interfaces Outline Motivation Interfacing RH & CPU Opportunities Conclusions
8
Peer-to-peer hw/sw interfaces Premises RH is large –can implement large program fragments RH can access memory –does not require CPU support to access data –coherent memory view with CPU RH seen through clean abstraction –interface portability
9
Peer-to-peer hw/sw interfaces Unit of Partitioning: Procedure library leaves recursive hot spot high ILP Program call-graph:
10
Peer-to-peer hw/sw interfaces Production-Quality Software int foo(….) { highly parallel computation; …. if (!r) { fprintf(stderr, “Unexpected input”); return E_BADIN; } …. }
11
Peer-to-peer hw/sw interfaces Peering a( ) { b( ); } b( ) { c( ); } c( ) { d( ) } d( ) { } Program CPURH a b c d
12
Peer-to-peer hw/sw interfaces marshalling, control transfer Stubs software procedure call hardware dependent RH “RPC” CPU a b c d b’ c’ d’
13
Peer-to-peer hw/sw interfaces RH a( ) { r = b’(b_args); } b’(b_args) { } CPU b Stubs a( ) { r = b(b_args); } b(b_args) { } Program send_rh(b_args); invoke_rh(b); r = receive_rh( ); return r;
14
Peer-to-peer hw/sw interfaces Required Stubs 1 stub to call each RH procedure 1 stub for each procedure called by RH CPURH
15
Peer-to-peer hw/sw interfaces policy Compiling Procedures for RH Synthesis Procedures for CPU Program Partitioning Stubs Configuration Linker Executable automatic HLL to HDL
16
Peer-to-peer hw/sw interfaces Outline Motivation Interfacing RH & CPU Opportunities Conclusions
17
Peer-to-peer hw/sw interfaces Evaluation How much can be mapped to RH? SpecInt95 & Mediabench Partition strictly on procedure boundaries Limit RH to 10 6 bit-operations
18
Peer-to-peer hw/sw interfaces Coverage a( ) { b( ); } b( ) { c( ); } c( ) {} On RH Method1Method2 N N YY Y N 40%75% Total 100% 40% 35% 25% Running Time
19
Peer-to-peer hw/sw interfaces Coverage a( ) { b( ); } b( ) { c( ); } c( ) {} Running Time 40% 35% 25% On RH Method1Method2 N N YY N Y 25%65% Total 100%
20
Peer-to-peer hw/sw interfaces Policies leaves on RH RH X CPU arbitrary
21
Peer-to-peer hw/sw interfaces RH Stack Models Locals in registers f() { int local; g(&local); } Locals statically allocated f(x) { return x+1; } f(x) { f(x+1); } Dynamic stack
22
Peer-to-peer hw/sw interfaces Potential RH Coverage: SpecINT95 % Running time leaves CPU->RH CPU->RH->CPU dynamic stack static stack frames no stack
23
Peer-to-peer hw/sw interfaces Potential RH Coverage: Mediabench dynamic stack static stack frames no stack leaves CPU->RH CPU->RH->CPU
24
Peer-to-peer hw/sw interfaces Conclusions Stubs make RH/CPU interface transparent Stubs are automatically generated RH and CPU as peers RH/CPU interface: (remote) procedure call RPC used for control transfer (not data) Peering gives partitioner freedom
25
Peer-to-peer hw/sw interfaces The End
26
Peer-to-peer hw/sw interfaces
27
Independent of b Dispatcher Stubs a( ) { r = b(b_args); } b(b_args) { if (x) c( ); return r; } c( ) { } Program b’(b_args) { send_rh(b_args); invoke_rh(b); while (1) { com = get_rh_command( ); if (! com) break; (*com)( ); } r = receive_rh( ); return r; } c’s stub
28
Peer-to-peer hw/sw interfaces C’s Stub a( ) { r = b(b_args); } b(b_args) { if (x) c( ); return r; } c( ) { } Program c’( ) { receive_rh(c_args); r = c(c_args); send_rh(r); invoke_rh(return_to_rh); } back
29
Peer-to-peer hw/sw interfaces Attempt 1 Manual partitioning Interface: ad hoc Ex: OneChip, NAPA, PAM Advantage: huge speed-ups Problem: very hard work RH Program
30
Peer-to-peer hw/sw interfaces Attempt 2 Select small computations Interface: RH = functional unit Ex: PRISC, Chimaera Advantage: easy to automate Problem: low speed-up + >> Program + >> *
31
Peer-to-peer hw/sw interfaces Attempt 3 while (b) { b[ j+5]; } Select loop body Deeply pipelined implementation No memory access Interface: I/O or Functional Unit or Coprocessor Ex: PipeRench Advantage: very high speed-up Problems: cannot be automated loop-carried dependences few opportunities Program
32
Peer-to-peer hw/sw interfaces Attempt 4 Select whole loop Pipelined implementation Autonomous memory access Interface: coprocessor Ex: GARP Advantage: many opportunities Problems: complicated algorithm requires exceptional loop exits while (b) { if (error) printf(“err”); a[x] = y; } Program
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.