Download presentation
Presentation is loading. Please wait.
Published bySamson Manning Modified over 9 years ago
1
Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob
2
Outline Motivation Smith-Waterman Solution System Architecture Overview Functional Unit Instruction Controller Processing Element Memory Controller ISA Results Future Research
3
Motivation Smith-Waterman sequence alignment
4
Motivation Smith-Waterman sequence alignment
5
Motivation Smith-Waterman sequence alignment
6
Motivation Smith-Waterman sequence alignment
7
Motivation Smith-Waterman sequence alignment
8
Motivation Smith-Waterman sequence alignment
9
Motivation Smith-Waterman sequence alignment
10
Motivation Smith-Waterman sequence alignment
11
Motivation Smith-Waterman sequence alignment
12
Motivation Smith-Waterman sequence alignment
13
Motivation Smith-Waterman sequence alignment
14
Motivation Similar Problems HMMer, BLAST, RNA Secondary Structure Prediction Smith-Waterman sequence alignment
15
Our Solution Softcore Vector Processor Massively Parallel Software programmable Configurable Instantiation Why Softcore? Optimize for specific applications Adapt to changes in algorithms FPGA technology improves with time
16
Architectural Overview Streaming Architecture Memory Mapped FIFOs Read Once Data Write Once Data Provides communication between components SoftwareDMA SVP Functional Unit DMASoftware SVP Functional Unit
17
Architectural Overview SoftwareDMA SVP Functional Unit DMASoftware SVP Functional Unit Streaming Architecture Memory Mapped FIFOs Read Once Data Write Once Data Provides communication between components
18
Functional Unit Instruction Controller Instr. Mem Processing Element Processing Element Processing Element Memory Controller Shared Local MemoryStream InStream Out Reg File Reg File Reg File
19
Processing Element Processing Element Processing Element R0: 0 R1: 1 R2: R3: R4: R5: R5 10 addiR1addiR1 Instruction Controller SIMD Instruction Broadcast addi10R5R1 R0: 0 R1: 0 R2: R3: R4: R5: R0: 0 R1: 2 R2: R3: R4: R5: 0 12 101112
20
Processing Element Processing Element Processing Element R2LdR20LdR30 SIMD Instruction Broadcast R0: 0 R1: 0 R2: R3: ptr1 R4: R5: R0: 0 R1: 0 R2: R3: ptr1 R4: R5: R0: 0 R1: 0 R2: R3: ptr1 R4: R5: R20LdR3 ptr1 Instruction Controller
21
Processing Element Processing Element Processing Element R2LdirIR3R0 Instruction Controller SIMD Instruction Broadcast Instruction Register Broadcast 40% Register Savings R0: 0 R1: 0 R2: R3: R4: R5: R0: 0 R1: 0 R2: R3: R4: R5: R0: 0 R1: 0 R2: R3: R4: R5: ptr1 R0: 0 R1: R2: R3: R4: R5: Ld
22
Processing Element Processing Element Processing Element R2R0 Instruction Controller SIMD Instruction Broadcast Instruction Register Broadcast 40% Register Savings R0: 0 R1: 0 R2: R3: R4: R5: R0: 0 R1: 0 R2: R3: R4: R5: R0: 0 R1: 0 R2: R3: R4: R5: ptr1 R0: 0 R1: R2: R3: R4: R5: ptr1Ld
23
Processing Element Register File Register File Ra AddrRb Addr Data Select Pipeline Register ALU Pipeline Register Compare Write Enables Data Ra Data Left Rb Data Left Rb Data Right Ra Data Right Immediate Ra AddrRb Addr Wr Enable LeftWr En Right Memory Controller Mem Wr Enable bmsetiR17EQ16 1 1 1 1 1 12 2 16 0 1 0 0 0
24
Functional Unit Reg File Instruction Controller Instr. Mem Reg File Processing Element Reg File Processing Element Processing Element Memory Controller Shared Local Memory Stream InStream Out
25
Functional Unit Reg File Instruction Controller Instr. Mem Reg File Processing Element Reg File Processing Element Processing Element Memory Controller Shared Local Memory Stream InStream Out
26
Memory Controller Dual Ported Block RAM ICPE 0-3 Single Cycle Read
27
Memory Controller Dual Ported Block RAM ICPE 0-3 Multiple Cycle Write
28
Instruction Set Architecture Custom ISA Two Sets of Instruction Types Instruction Controller Processing Element Optimized for target applications Max, Min, Loop Expandable Core vs. Application Specific
29
Sample Code _query_loop: subir%r8, %r3, %ir10 nop max%r4, %r4, %r8 add%r3, %r19, PE_ZERO_REG bmsetiPE_ID_REG EQ PE_NUM_ELEMENTS - 1 icaddi%ir15, %ir8, PE_NUM_ELEMENTS - 1 nop ldirPE_MEM_REG, PE_ZERO_REG(%ir15) nop addi%r3, PE_MEM_REG, 0 bmend ldPE_MEM_REG, PE_ZERO_REG(DB_ADDRESS) icaddi%ir7, %ir7, 1 icaddi%ir9, %ir9, 1 icloop%ir4, %ir5, _query_loop _query_loop: icaddi%ir15, %ir8, PE_NUM_ELEMENTS - 1 subir%r8, %r3, %ir10 add%r3, %r19, PE_ZERO_REG ldirPE_MEM_REG, PE_ZERO_REG(%ir15) max%r4, %r4, %r8 bmsetiPE_ID_REG EQ PE_NUM_ELEMENTS - 1 icaddi%ir7, %ir7, 1 icaddi%ir9, %ir9, 1 addi%r3, PE_MEM_REG, 0 bmend ldPE_MEM_REG, PE_ZERO_REG(DB_ADDRESS) icloop%ir4, %ir5, _query_loop
30
Results VHDL Implementation Simulated Synthesized Smith-Waterman 16 PE version tested Millions of Cell Updates Per Second (MCUPS)
31
Smith-Waterman Speedup SystemFreqMCUPSSpeedup P41.8 GHz151 SVP16150 MHz523.47 SVP32150 MHz1036.87 SVP64125 MHz16711.13 SVP128120 MHz30220.13 SVP128150 MHz37825.20
32
Comparative Performance System*FreqPEs/Chip MCUPS/ PE Chips MCUPS/ Chip Cost ($1000) MCUPS/ $1000 SVP128150 MHz1282.951378575 SVP128120 MHz1282.361302560 SVP64125 MHz642.611167533 SVP32150 MHz323.221103520 Kestrel20 MHz640.7885025 † 16 GeneMatcher2 192 MHz1925.211610006914 Fuzion 150 200 MHz15361.6312500?? * Reference [1] † Estimated
33
Performance PEsFreq (MHz)AreaBRAM 1615013%22 3215022%38 6412541%70 12812080%134 Hardware Xilinx Vertex 4 VLX200
34
Future Work Software Development How can HMMer and other systolic algorithms be implemented? ISA Expansion What additional instructions are needed? What instructions can be added to optimize? Hardware Development How can we optimize the hardware to make it faster and smaller? What hardware can we add to enhance performance? How can we take advantage of advances in FPGAs, such as DSP48s?
35
Acknowledgments Special Thanks Young Cho Roger Chamberlain Jeremy Buhler Joseph Lancaster References Di Blas et al, “The Kestrel Parallel Processor,” IEEE Transactions on Parallel and Distributed Systems, January 2005 A. Jacob et al, “Whole Genome Comparison Using Commodity Workstations,” Technical Report, 2003
36
Questions? Team ASP Brandon Harris Arpith Jacob
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.