Download presentation
Presentation is loading. Please wait.
Published byRobyn Thornton Modified over 9 years ago
1
Massively Parallel Computing for Protein Alignment Bertil Schmidt School of Computer Engineering Nanyang Technological University Singapore
2
Contents zMotivation zSmith-Waterman Algorithm zParallelization on the Hybrid Architecture zParallelization on the Fuzion 150 zPerformance Evaluation zConclusion and Future Work
3
Motivation zGenetic sequence databases are growing exponentially zDatabase growth rate will continue for the foreseeable future, since multiple concurrent genome projects have begun, with more to come
4
Motivation zDiscovered sequences are analyzed by comparison with databases zComplexity of sequence comparison is proportional to the product of query size times database size Analysis too slow on sequential computers z Analysis too slow on sequential computers zTwo possible approaches yHeuristics yHeuristics, e.g. BLAST,FastA, but the more efficient the heuristics, the worse the quality of the results yParallel Processing yParallel Processing, get high-quality results in reasonable time
5
Full Genome Comparison zrelated Organisms, but Tuberculosis causes a disease find common and different parts z16 10 6 pairwise sequence comparisons zProject with IMCB, Thomas Dick 3918 Protein Sequences 1.329.298 AminoAcids 4289 Protein Sequences 1.359.008 AminoAcids
6
Protein Sequence Alignment zBLAST, FastA, Smith-Waterman GGHSRLILSQLGEEG.RLLAIDRDPQAIAVAKT....IDDPRFSII GGHAERFL.E.GLPGLRLIGLDRDPTALDVARSRLVRFAD.RLTLV |||::::| : |::| ||:::||||:|:|||:: ::| |:::: BLAST FastA Smith- Waterman Slower Faster Search Speed Data Quality LowerHigher
7
Smith-Waterman Algorithm zOptimal local alignment of two sequences zPerforms an exhaustive search for the optimal local alignment yComplexity O(n m) for sequence lengths n and m zBased on the 'dynamic programming' (DP) algorithm yFill the DP matrix using a substitution (mutation) matrix yFind the maximal value (score) in the matrix yTrace back from the score until a 0 value is reached
8
Smith-Waterman Algorithm zAligning S1 and S2 of length l1 and l2 using Recurrences: zCalculate three possible ways to extend the alignment yby one AminoAcid (AA) in each sequence yby one AA in the first sequence and align it with a gap in the second yby one AA in the second sequence and align it with a gap in the first
9
Smith-Waterman Algorithm ATCTCGTATGATGGTCTATCAC Align S1=ATCTCGTATGATG S2=GTCTATCAC G T C T A T C A C ATCTCGTATGATG 000002100210 0 0 0 0 0 0 0 0 0 0 0000000000000 2 0212114321132 0 0 2 1 0 2 1 1 2 2 4 3 2 1 4 3 2 3 6 5 4 3 6 5 4 5 5 4 4 5 5 4 6 5 7 3 4 4 4 5 5 6 3 5 4 6 5 4 5 3 4 7 5 5 7 6 2 5 6 9 8 7 6 1 4 5 8 8 7 6 0 3 6 7 7 10 9 2 2 5 8 7 9 9 2 1 4 7 7 8 8 8 9 7 5 34 2 0 =1, =1 A T C T C G T A T G A T G G T C T A T C A C G T C T A T C A C
10
Parallel Architectures for Bioinformatics zEmbedded Massively Parallel Accelerators yFuzion 150: 1536 processors on a single chip yOther accelerators: Decypher, Biocellerator, GeneMatcher2, Kestrel, SAMBA ySystola 1024: PC add- on board with 1024 processors
11
Parallel Architectures for Bioinformatics High speed Myrinet switch Systola 1024 Hybrid Computer ycombines SIMD and MIMD paradigm within a parallel architecture Hybrid Computer
12
Previous Applications zVolume Visualization zAutomatic Visual Quality Control (Opel) zCryptography zComputer Tomography zVideo Compression zRange of Transforms (Fourier, Wavelet, Hough, Radon) zComputer Graphics
13
Architecture of Systola 1024 zInstruction Systolic Array: y32 32 mesh of processing elements ywavefront instruction execution
14
14 Instruction Systolic Array + row selectors column selectors instructions * - + - * - + * + + * - + + * * +- + + * - + * + * + * - ++ * * -* - + + * + * - - - + * + * - + * - - zwavefront instruction execution fast accumulation operations (e.g. row sum, broadcast, ringshift)
15
Parallelization of Smith- Waterman zmatrix cells along a single diagonal are computed in parallel zcomparison is performed in l1+l2 1 steps on l1 PEs G T C T A T C A C ATCTCGTATGATG 000002100210 0 0 0 0 0 0 0 0 0 0 0000000000000 2 0212114321132 0 0 2 1 0 2 1 1 2 2 4 3 2 1 4 3 2 3 6 5 4 3 6 5 4 5 5 4 4 5 5 4 6 5 7 3 4 4 4 5 5 6 3 5 4 6 5 4 5 3 4 7 5 5 7 6 2 5 6 9 8 7 6 1 4 5 8 8 7 6 0 3 6 7 7 10 9 2 2 5 8 7 9 9 2 1 4 7 7 8 8 0 000 0 2 0 0 1 1 4 2 2 2 0 3 2 1 3 2 1 5 2 4 3 l2l2 l1l1 P1P1 P2P2 P 13
16
Mapping onto Systola 1024 a 30 a 31 a0a0 a 63 a 62 a 32 a 992 a 1022 a 1023 b k ….b 1 b 0 …c 1 c 0 X b b: subject sequence a a: query sequence (equal to 1024) zSubject sequences can be pipelined with only step delay k steps for subject sequence of length k zEfficient routing on the ISA: Row Ringshift and Broadcast
17
Performance Evaluation zScan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths Query sequence length256512102420484096 Systola 1024 speedup to PIII 850 294 5 577 6 1137 6 2241 6 4611 6 Cluster of 16 Systolas speedup to PIII 850 20 81 38 86 73 91 142 94 290 94 zParallel implementation scales linearly with sequence length and number of PCs zComputing time dominates data transfer time
18
Fuzion 150 Architecture z0.25- m, single-chip, SIMD architecture z1536 PEs @ 200 MHz 300 GOPS z600 GB/s on-chip, 6.4 GB/s off-chip bandwidth zMultithreading (control units interact via semaphores) zdeveloped by Clearspeed Technology (UK) for graphics, networking processing Linear SIMD Array 1536 PEs each with 2 Kbytes DRAM Linear SIMD Array 1536 PEs each with 2 Kbytes DRAM FUZION Bus 32-bit EPU (ARC) 32-bit EPU (ARC) Video I/O Video I/O Display Instruction Fetch SIMD Controller Local Memory Local Memory 1,2 or 4 Channels (6.4 GB/s) Host AGPRambus
19
Fuzion 150 Architecture PE (0,0) PE (0,1) PE (0,255) Fuzion Bus PE (1,0) PE (1,1) PE (1,255) PE (5,0) PE (5,1) PE (5,255) Local Memory Local Memory Block 5 Block 1 Block 0 ALU (8 bits) Register file 32 Bytes PE Memory 2 KByte DRAM Right PE Instructions Block I/O Channel Left PE
20
Fuzion 150 - Debugger
21
Mapping onto the Fuzion 150 Block 5 Block 1 Block 0 b b: subject sequence b k ….b 1 b 0 a1a1 a0a0 a 255 a 511 a 510 a 256 a 1280 a 1534 a 1535 a a: query sequence (equal to 1536) …c 1 c 0 X zNo fast global communication 2-step local communcication zSubject sequence can be pipelined with only step delay
22
Performance Evaluation zScan times in seconds for TrEMBL 14 (351’834 Protein Sequences) for various query sequence lengths Query sequence length256512102420484096 Fuzion 150 speedup to PIII 850 12 136 22 151 42 157 82 163 162 165 zParallel implementation scales linearly with sequence length zComputing time dominates data transfer time
23
Performance Evaluation zNormalized time Comparison for a 10 Mbase search on different parallel architectures with different query length z4 faster than 16K-PE MasPar z6 faster than Kestrel z5 faster than SAMBA (special-purpose 3-board architecture)
24
Conclusions and Future Work zDemonstrated how fine-grained parallel architectures can be applied efficiently for Comparative Genomics zSignificant runtime savings for full genome comparisons and database searching More Discovery Is Possible at a good price-performance ratio zAccelerating other Bioinformatics Applications, e.g. Hidden Markov Models zBuild a next generation architecture at Center for High Performance Embedded Systems, NTU zIntegration of accelerators in a Grid Environment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.