Download presentation
Presentation is loading. Please wait.
1
Parallel System for BLAST
Department of Computer Engineering Chulalongkorn University ประภาส จงสถิตย์วัฒนา วรินทร์ วัฒนพรพรหม
2
Introduction What is BLAST? BLAST – Basic Local Alignment Search Tools
matches between genetic materials or proteins Helps scientists to make inferences about the structures and functions of proteins screen new sequences for further investigation
3
Introduction Problem found
takes a very long time to search a large number of queries in a large database genome databases are enormous and doubling in size every 1.3 years
4
Background Knowledge DNA
5
Background Knowledge Gene Finding Search by contents Search by signal
Search by Sequence Similarity (BLAST)
6
Background Knowledge BLAST
7
Background Knowledge BLAST Algorithm
8
Background Knowledge BLAST Algorithm blastn -> Compare DNA bases
blastx -> Compare proteins for all 6 reading frames
9
Background Knowledge BLAST Algorithm
T T C T C G G A C C T G G A G A T T C A C A G T A A G A G C C T G G A C C T C T A A G T G T C A A A G A G C C T G G A C C T C T A A G T G T C T T T C T C G G A C C T G G A G A T T C A C A G T T T C T C G G A C C T G G A G A T T C A C A G U T T C T C G G A C C T G G A G A T T C A C A G U
10
Performance in memory search limited memory faster CPU
11
Cache memory small and fast to access memory embeded on the same chip as CPU store DATA and PROGRAM make CPU access DATA upto 10 times faster
12
Preliminary Study
13
Preliminary Study No. of Cut DB Part No. DB size (MB) Time used (min)
Time used to concat (min) Total time used (min) 1 234 70 2 109 17 38 125 20 3 100 16 1.5 38.5 15 34
14
Time(Original) Time(Improved)
Preliminary Study Speedup Improved System Time(Original) Time(Improved) SpeedUp = = 351/(180+5) = 1.9 times
15
Parallel BLAST separate Databases separate Queries hybrid
16
Parallel System for BLAST
Parallel Query Improving the performance of BLAST in a Memory Limited Environment
17
Parallel System for BLAST
18
Parallel System for BLAST
Considerations Compatibility Scalability Fault-tolerance Flexibility
19
Parallel System for BLAST
Processes Preprocessing Splitting database, Distribute, and create index for each of them Splitting queries Distributing Query Distribute the splited query to each node Postprocessing Merge the searched output together
20
Parallel System for BLAST
Data Structure ... … : .:. DB1 DB2 DBn Max Clients Number of databases
21
Parallel System for BLAST
Data Structure … Sequence1 Sequence2 : .:. Sequencen DB1 DB2 DBn Number of Sequences Number of databases
22
Parallel System for BLAST
BLASTServer
23
Parallel System for BLAST
BLASTClient
24
Parallel System for BLAST
-1 CID02 CID03 CID04 CID05 CID06 CID01 CID02 CID03 CID04 CID05 CID02 CID03 CID04 CID05 CID01 CID02 CID03 CID04 -1 CID02 CID03 CID04 CID05 -1 CID02 CID03 CID04 -1 CID04 CID02 CID05 CID03 CID06 CID01 CID02 CID03 CID01 CID06 CID02 CID03 CID04 CID05 CID01 CID02 BLASTClient no CID01-05 registers respectively BLASTClient no CID06 registers and was assign a task 2 1 4 3 BLASTServer assign tasks to each client Tasks for DB1 have been finished, BLASTClient no CID05 and CID06 was assign a new Database Each BLASTClient finish its task and request for a new task Tasks for DB2 and DB3 have been finished BLASTClient no CID01 disconnect from the System Each BLASTClient finish its task and request for a new task Seq1 -1 Seq2 Seq3 CID02 CID03 CID04 Seq4 Seq5 Seq6 Seq1 -1 Seq2 Seq3 Seq4 Seq5 CID04 Seq6 Seq1 -1 Seq2 Seq3 Seq4 Seq5 CID04 Seq6 CID02 Seq1 -1 Seq2 Seq3 CID05 CID02 CID03 CID04 Seq4 Seq5 CID06 Seq6 Seq1 -1 Seq2 Seq3 CID02 CID03 CID04 Seq4 CID05 CID06 Seq5 Seq6 Seq1 -1 Seq2 CID02 CID03 CID04 Seq3 Seq4 CID05 Seq5 Seq6 Seq1 -1 CID03 CID04 Seq2 CID05 CID02 Seq3 CID01 Seq4 Seq5 Seq6 Seq1 -1 CID02 CID03 CID04 Seq2 CID05 Seq3 CID01 Seq4 Seq5 Seq6 Seq5 Seq4 Seq6 Seq3 CID05 Seq2 CID04 CID03 CID02 CID01 Seq1 Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq1 -1 CID04 Seq2 CID05 CID02 CID03 Seq3 CID01 Seq4 Seq5 Seq6 Seq1 -1 Seq2 CID05 CID02 CID03 CID04 Seq3 CID01 Seq4 Seq5 Seq6 Seq1 -1 Seq2 CID04 Seq3 CID05 CID02 CID03 Seq4 Seq5 Seq6 Seq1 -1 Seq2 CID03 CID04 Seq3 CID05 CID02 Seq4 Seq5 Seq6 Seq1 -1 Seq2 CID03 CID04 Seq3 CID02 Seq4 CID05 Seq5 Seq6 Seq1 -1 Seq2 Seq3 CID05 CID02 CID03 CID04 Seq4 Seq5 Seq6 Seq1 -1 Seq2 CID02 CID03 CID04 Seq3 CID01 Seq4 CID05 Seq5 Seq6 2 1 4 3
25
Pre processing Time (min) Post processing Time (min)
Performance No of BlastClient Pre processing Time (min) Search Time (min) Post processing Time (min) Total Time (min) Speed up (times) 1 5 421 10 436 1.84 2 191 206 3.9 3 122 137 5.8 4 88 103 7.8 78 93 8.6 6 65 80 7 58 73 11 8 51 66 12
26
Performance
27
Performance
28
Performance
29
Performance
30
Performance
31
Conclusion The Superlinear speedup gain from the eliminating of the swapping time The performance of the system is limited due to the preprocessing and postprocessing still be sequential tasks.
32
Suggestions for further study
Should be able to support more BLAST algorithms and parameters Recovery system incase of BLASTServer or SharedStorage crashes Estimate the search time Improve the performance by eliminating the sequential tasks Dynamic database distribution during search
33
More Information Master thesis: Warin Wattanapornprom
A parallel processing system for a BLAST program Faculty of Engineering, Chulalongkorn University, 2002 ISBN
34
Thank you for your attention
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.