Presentation is loading. Please wait.

Presentation is loading. Please wait.

This research was supported by NIH grant RR020209-01, “FPGA-Based Computational Accelerators.” Accelerator design isn’t logic design. Accelerators require.

Similar presentations


Presentation on theme: "This research was supported by NIH grant RR020209-01, “FPGA-Based Computational Accelerators.” Accelerator design isn’t logic design. Accelerators require."— Presentation transcript:

1 This research was supported by NIH grant RR020209-01, “FPGA-Based Computational Accelerators.” Accelerator design isn’t logic design. Accelerators require skilled logic AND requires domain specialists design for high performance for tailoring to details of specific applications. The semantic gap isn’t going away. FPGAs are near a crossing point. LAMP: A Tool Suite for Families of FPGA-based Computation Accelerators * M. Gokhale, J. Stone, J. Arnold, and M. Kalinowski. Stream-oriented FPGA computing in the Streams-C high-level language. Proc. FCCM. 2000 Until now “10  -100  of performance... has been at the cost of 10  -100  increase in difficulty in application development” * What changed? FPGA capacity is exploding. “An order of magnitude increase in any computing resource changes the way in which that resource is used” Cost vs. value of design effort Effort in designing leaf components is about the same … Effort in designing an array is largely independent of array size … Larger FPGAs hold larger computation arrays Repetition increases value of the design effort FPGA capacity increases → Value of FPGA acceleration Cost of FPGA design Semantic gapGulf between high-level design and low-level implementation Compiled programs C++, Java, high level programming languages vs. Compiled machine code FPGA accelerators Application-specific knowledge in framework vs. Gate-level implementation primitives Why not compile C into logic? Forty years of research haven’t solved the problem. Logic design:Accelerators: Optimizes individual problems Optimize families of problems Reuse of leaf components Reuse of control components Specific to hardware platform Should be portable between platforms Stable implementations Flexible, user-defined applications Parallelism defined by designer Degree of parallelism undefined As much as this FPGA will fit for this app Implement applications as families. Case study: Dynamic Programming for Approximate String Matching – Choose: Character by character alignment or goodness-of-match only Global alignment (with end-rule options) or local, gap parameters Character type DNA[2 bits]IUPAC wildcards [4] Amino acid [5]Codons [6] Ascii text [8]Unicode 3.0 text[16] Mismatch scoring, may be parameterized Score OnlyAlignment Result Type Alignment Type Nucleotide Character Type Amino acid… Exact Match Gonnet Automated replication makes maximum use of FPGA fabric. Local (Smith-Waterman) Global (Needleman-Wunsch) CodonWildcard PAM-NBLOSUM-N… Smaller PEs - Higher parallelism Larger fabric - Increased computing capacity Larger PEs - Don’t constrain other implementations Create a model with behavior left as parameter to be provided. Logic designer provides Annotated VHDLReusable control and interface components App AbstractionInterface definition of application classes and operations HW AbstractionAbstract definition of FPGA hardware resources HW ConcretionActual resources present in the FPGA platform Application specialist provides App ConcretionActual definitions specific to the application instance Model InstanceGeneric accelerator bound to specific HW and application logic Subclassing creates application- specific data types and behaviors. Application-specific implementation can give acceleration > 100 . Application acceleration Xilinx VP70 Virtex-II Pro relative to 3GHz Intel Xeon Every different application gets individually tuned performance. Simple applications don’t have to run at ‘worst case’ speed. Approximate matching application family: Each component varies individually Combinatorics work in our favor Each user creates new possibilities! Accelerator model for application family HwAbstraction AppAbstraction HwConcretion AppConcretion Model instance Annotated VHDL components BOSTO N UNIVERSITY Tom VanCourt Martin Herbordt Abstract definition of character type class CharType { abstract type Ref, Que, Score; abstract Score compare( Ref refCh, Que queryCh); abstract const Score zeroScore; } Concrete definition (partial) class IUPAC extends CharType { type Ref {bool: a, c, g, t}; type Que int 0.. 3; type Score int -1000.. 1000; const Score scoreZero = 0; match = +1, miss = -10; Score compare(Ref r, Que q) { bool isMatch = (r.a & q==0) | (r.c & q==1) | (r.g & q==2) | (r.t & q==3); … Semantic complexity increases → Semantic Gap Machine codeC++, Java Gates Domain Knowledge Compiled code Synthesized logic DNA alignment 152  to 215  Protein Alignment 77  to 175  Rigid Docking ~100  to ~500  2 result types 17 alignment types 15character types 510different accelerators created on demand


Download ppt "This research was supported by NIH grant RR020209-01, “FPGA-Based Computational Accelerators.” Accelerator design isn’t logic design. Accelerators require."

Similar presentations


Ads by Google