Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster Department of Computer Science and Engineering, Washington University in Saint Louis, MO Supported by an NIH STTR Grant & NSF Grants DBI , ITR , CCR
Slide # 2Washington University in St. Louis Outline Overview of BLAST Overview of the Mercury system Description of BLASTN algorithm Algorithmic changes to BLASTN Improvement in performance Related work Conclusion
Slide # 3Washington University in St. Louis Basic Local Alignment Search Tool Biosequence comparison software –Query sequence (new genome) to large database of known biosequences Look for similar regions Exponential growth of genomic databases –Longer time for searches to complete –Solutions Perform comparison over multiple machines Specialized hardware - Our Approach
Slide # 4Washington University in St. Louis The Mercury System
Slide # 5Washington University in St. Louis The Mercury System Proximity to disk –Simple operations performed close to disk Avoids CPU use –400 Mbytes/s throughput from the disk Concurrent Independent operation –Does not use processor cache cycles, memory or I/O buses Reconfigurable logic –Logic can be tuned to the particular need of the application
Slide # 6Washington University in St. Louis BLASTN –Both the query and the database are long DNA strings –Consist of {A, C, T, G} and some unknowns Each stage processes lesser data The stages become more computationally expensive
Slide # 7Washington University in St. Louis BLASTN - Terminology … ACTGTGTTTCACTGACGGGTGT … … CTGTGTCCCCAACACTGCTGACGTAGAATCGTGTAG … Query Database ‘w-mer’ is a sequence of ‘w’ consecutive bases
Slide # 8Washington University in St. Louis BLASTN - Pipeline - Stage 1 Matches each ‘11-mer’ in query to database –Exact string matching 83% of overall time is spent in this stage Filters 92% of data entering this stage –Only 8% of data proceeds to the next stage
Slide # 9Washington University in St. Louis BLASTN - Pipeline - Stage 2 Extends the matches from stage 1 … ACTGTGTTTCACTGACGGGTGT … … GTGTCCCCAACATTTCACTGACGAGAATCGTGTAG …
Slide # 10Washington University in St. Louis BLASTN - Pipeline - Stage 2 Extends the matches from stage 1 –Allows mismatches of individual bases –Does not allow gaps in either the query or the database –Match score should be higher than threshold to proceed 16% of pipeline time is spent is this stage Only 2/100,000 of data entering this stage proceeds to the next stage
Slide # 11Washington University in St. Louis BLASTN - Pipeline - Stage 3 Extends the matches from stage 2 … ACCACTGTTTCACTGACG_GA_T_GT … … CTGTGTCCCCAC_GTTTCACTGACGAGAATCGTGTAG …
Slide # 12Washington University in St. Louis BLASTN - Pipeline - Stage 3 Extends the matches from stage 2 –Scores matches with Gaps inserted in both the sequences –Smith-Waterman dynamic programming algorithm <1% of pipeline time is spent is this stage
Slide # 13Washington University in St. Louis NCBI - BLASTN Stage 1 (word matching) is implemented as a lookup table –Efficient only for certain word lengths (w= 11) Performance degrades dramatically for larger query sizes Query Size 10 Kbases 100 Kbases 1 MbasesUnits Throughput Mbases/s Pentium-4 2.6GHz 1Gbyte RAM
Slide # 14Washington University in St. Louis Firmware implementation - Stage 1 Bloom Filters Hash Lookup Redundancy Eliminator Eliminates false-positives from Bloom filters, obtain offset in query Discards matches that are close to one another Matches ‘11-mers’ to query, but generates false-positives
Slide # 15Washington University in St. Louis Bloom filters operation ‘11-mer’ K Hash Functions Programming the query into the bloom filter (processing query) ‘m-bit’ vector query
Slide # 16Washington University in St. Louis Bloom filters operation database K Hash Functions Finding matches in the database ‘m-bit’ vector ? ? ? 1: Potential match 0: Not a match ‘11-mer’
Slide # 17Washington University in St. Louis Bloom filters operation K Hash Functions Finding matches in the database ‘m-bit’ vector ? ? ? 1*: Potential match 0: Not a match * False positives are eliminated using a hash table database ‘11-mer’
Slide # 18Washington University in St. Louis Bloom filter performance
Slide # 19Washington University in St. Louis Performance analysis Firmware Vs. Software Stage 1 Query Size 10 Kbases 100 Kbases 1 Mbases Units NCBI BLASTN Mbases/s Mercury BLASTN Mbases/s Speedup
Slide # 20Washington University in St. Louis Overall system throughput Query Size 10 Kbases 100 Kbases 1 MbasesUnits NCBI BLASTN Mbases/s Mercury BLASTN Mbases/s Speedup Tput overall = min (Tput 1, Tput (2&3) )
Slide # 21Washington University in St. Louis Stage 2 in firmware - Throughput
Slide # 22Washington University in St. Louis Stage 2 in firmware - Speedup
Slide # 23Washington University in St. Louis Related work Hardware based commercial systems –Paracel GeneMatcher TM, used ASIC, and hence is inflexible –RDisk, FPGA based system with throughput of 60 Mbases/s for stage 1 High-end commercial system –Paracel BLASTMachine2 TM, 32 CPU linux cluster 2.93 Mbases/s for 2.8 Mbase query 2 times faster than 1-node Mercury BLASTN –TimeLogic DeCypherBLAST TM, FPGA based 213 Kbases/s for a 16 Mbase query Comparable to 1-node Mercury BLASTN
Slide # 24Washington University in St. Louis Conclusion BLASTN on the Mercury system –Bloom filters to improve performance of stage 1 Efficient hash functions in hardware –7x improvement in speed with only stage 1 firmware –>50x speedup with stage 2 implemented in firmware Future work –Algorithmic changes to stage 2 Efficient use of hardware capabilities –Other apps BLASTP, BLASTX etc.
Slide # 25Washington University in St. Louis Thank you