Presentation is loading. Please wait.

Presentation is loading. Please wait.

Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob.

Similar presentations


Presentation on theme: "Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob."— Presentation transcript:

1 Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

2 Outline Motivation Smith-Waterman Solution System Architecture Overview Functional Unit Instruction Controller Processing Element Memory Controller ISA Results Future Research

3 Motivation Smith-Waterman sequence alignment

4 Motivation Smith-Waterman sequence alignment

5 Motivation Smith-Waterman sequence alignment

6 Motivation Smith-Waterman sequence alignment

7 Motivation Smith-Waterman sequence alignment

8 Motivation Smith-Waterman sequence alignment

9 Motivation Smith-Waterman sequence alignment

10 Motivation Smith-Waterman sequence alignment

11 Motivation Smith-Waterman sequence alignment

12 Motivation Smith-Waterman sequence alignment

13 Motivation Smith-Waterman sequence alignment

14 Motivation Similar Problems HMMer, BLAST, RNA Secondary Structure Prediction Smith-Waterman sequence alignment

15 Our Solution Softcore Vector Processor Massively Parallel Software programmable Configurable Instantiation Why Softcore? Optimize for specific applications Adapt to changes in algorithms FPGA technology improves with time

16 Architectural Overview Streaming Architecture Memory Mapped FIFOs Read Once Data Write Once Data Provides communication between components SoftwareDMA SVP Functional Unit DMASoftware SVP Functional Unit

17 Architectural Overview SoftwareDMA SVP Functional Unit DMASoftware SVP Functional Unit Streaming Architecture Memory Mapped FIFOs Read Once Data Write Once Data Provides communication between components

18 Functional Unit Instruction Controller Instr. Mem Processing Element Processing Element Processing Element Memory Controller Shared Local MemoryStream InStream Out Reg File Reg File Reg File

19 Processing Element Processing Element Processing Element R0: 0 R1: 1 R2: R3: R4: R5: R5 10 addiR1addiR1 Instruction Controller SIMD Instruction Broadcast addi10R5R1 R0: 0 R1: 0 R2: R3: R4: R5: R0: 0 R1: 2 R2: R3: R4: R5: 0 12 101112

20 Processing Element Processing Element Processing Element R2LdR20LdR30 SIMD Instruction Broadcast R0: 0 R1: 0 R2: R3: ptr1 R4: R5: R0: 0 R1: 0 R2: R3: ptr1 R4: R5: R0: 0 R1: 0 R2: R3: ptr1 R4: R5: R20LdR3 ptr1 Instruction Controller

21 Processing Element Processing Element Processing Element R2LdirIR3R0 Instruction Controller SIMD Instruction Broadcast Instruction Register Broadcast 40% Register Savings R0: 0 R1: 0 R2: R3: R4: R5: R0: 0 R1: 0 R2: R3: R4: R5: R0: 0 R1: 0 R2: R3: R4: R5: ptr1 R0: 0 R1: R2: R3: R4: R5: Ld

22 Processing Element Processing Element Processing Element R2R0 Instruction Controller SIMD Instruction Broadcast Instruction Register Broadcast 40% Register Savings R0: 0 R1: 0 R2: R3: R4: R5: R0: 0 R1: 0 R2: R3: R4: R5: R0: 0 R1: 0 R2: R3: R4: R5: ptr1 R0: 0 R1: R2: R3: R4: R5: ptr1Ld

23 Processing Element Register File Register File Ra AddrRb Addr Data Select Pipeline Register ALU Pipeline Register Compare Write Enables Data Ra Data Left Rb Data Left Rb Data Right Ra Data Right Immediate Ra AddrRb Addr Wr Enable LeftWr En Right Memory Controller Mem Wr Enable bmsetiR17EQ16 1 1 1 1 1 12 2 16 0 1 0 0 0

24 Functional Unit Reg File Instruction Controller Instr. Mem Reg File Processing Element Reg File Processing Element Processing Element Memory Controller Shared Local Memory Stream InStream Out

25 Functional Unit Reg File Instruction Controller Instr. Mem Reg File Processing Element Reg File Processing Element Processing Element Memory Controller Shared Local Memory Stream InStream Out

26 Memory Controller Dual Ported Block RAM ICPE 0-3 Single Cycle Read

27 Memory Controller Dual Ported Block RAM ICPE 0-3 Multiple Cycle Write

28 Instruction Set Architecture Custom ISA Two Sets of Instruction Types Instruction Controller Processing Element Optimized for target applications Max, Min, Loop Expandable Core vs. Application Specific

29 Sample Code _query_loop: subir%r8, %r3, %ir10 nop max%r4, %r4, %r8 add%r3, %r19, PE_ZERO_REG bmsetiPE_ID_REG EQ PE_NUM_ELEMENTS - 1 icaddi%ir15, %ir8, PE_NUM_ELEMENTS - 1 nop ldirPE_MEM_REG, PE_ZERO_REG(%ir15) nop addi%r3, PE_MEM_REG, 0 bmend ldPE_MEM_REG, PE_ZERO_REG(DB_ADDRESS) icaddi%ir7, %ir7, 1 icaddi%ir9, %ir9, 1 icloop%ir4, %ir5, _query_loop _query_loop: icaddi%ir15, %ir8, PE_NUM_ELEMENTS - 1 subir%r8, %r3, %ir10 add%r3, %r19, PE_ZERO_REG ldirPE_MEM_REG, PE_ZERO_REG(%ir15) max%r4, %r4, %r8 bmsetiPE_ID_REG EQ PE_NUM_ELEMENTS - 1 icaddi%ir7, %ir7, 1 icaddi%ir9, %ir9, 1 addi%r3, PE_MEM_REG, 0 bmend ldPE_MEM_REG, PE_ZERO_REG(DB_ADDRESS) icloop%ir4, %ir5, _query_loop

30 Results VHDL Implementation Simulated Synthesized Smith-Waterman 16 PE version tested Millions of Cell Updates Per Second (MCUPS)

31 Smith-Waterman Speedup SystemFreqMCUPSSpeedup P41.8 GHz151 SVP16150 MHz523.47 SVP32150 MHz1036.87 SVP64125 MHz16711.13 SVP128120 MHz30220.13 SVP128150 MHz37825.20

32 Comparative Performance System*FreqPEs/Chip MCUPS/ PE Chips MCUPS/ Chip Cost ($1000) MCUPS/ $1000 SVP128150 MHz1282.951378575 SVP128120 MHz1282.361302560 SVP64125 MHz642.611167533 SVP32150 MHz323.221103520 Kestrel20 MHz640.7885025 † 16 GeneMatcher2 192 MHz1925.211610006914 Fuzion 150 200 MHz15361.6312500?? * Reference [1] † Estimated

33 Performance PEsFreq (MHz)AreaBRAM 1615013%22 3215022%38 6412541%70 12812080%134 Hardware Xilinx Vertex 4 VLX200

34 Future Work Software Development How can HMMer and other systolic algorithms be implemented? ISA Expansion What additional instructions are needed? What instructions can be added to optimize? Hardware Development How can we optimize the hardware to make it faster and smaller? What hardware can we add to enhance performance? How can we take advantage of advances in FPGAs, such as DSP48s?

35 Acknowledgments Special Thanks Young Cho Roger Chamberlain Jeremy Buhler Joseph Lancaster References Di Blas et al, “The Kestrel Parallel Processor,” IEEE Transactions on Parallel and Distributed Systems, January 2005 A. Jacob et al, “Whole Genome Comparison Using Commodity Workstations,” Technical Report, 2003

36 Questions? Team ASP Brandon Harris Arpith Jacob


Download ppt "Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob."

Similar presentations


Ads by Google