Presentation is loading. Please wait.

Presentation is loading. Please wait.

TurboBLAST: A Parallel Implementation of BLAST Built on the TurboHub Bin Gan CMSC 838 Presentation.

Similar presentations


Presentation on theme: "TurboBLAST: A Parallel Implementation of BLAST Built on the TurboHub Bin Gan CMSC 838 Presentation."— Presentation transcript:

1 TurboBLAST: A Parallel Implementation of BLAST Built on the TurboHub Bin Gan CMSC 838 Presentation

2 CMSC 838T – Presentation Motivation u NCBI BLAST on a single processor has become too costly, inefficient, and time-consuming.  Sequence database are exploding in size.  Growing at an exponential rate  Exceeds the rate of increase in hardware capabilities ( Moores’ Law)  Thrashing and buffer management u Goals  Faster results for life science laboratories  Do not change the BLAST algorithm  Avoid using costly multiprocessor machines  Cheap alternatives of clusters of machines

3 CMSC 838T – Presentation Talk Overview u Overview of talk  Motivation  Techniques l Database partition l Use the sequential BLAST l Merge results l TurboHub infrastructure  Evaluation l 3 test runs and analysis  Related work l Powerblast l Paracel’s BLAST Machine l mpiBLAST l Words of Bill Pearson ( auther of FASTA)  Observations

4 CMSC 838T – Presentation Techniques u Approach u Main intuition u Implementation  Clients, master, and workers u TurboHub System  Load balance  Fault recovery u Dynamic database partitioning  Binary tree analogy

5 CMSC 838T – Presentation Techniques u Approach  Split databases instead of query sequences in binary tree fashion  Algorithms to decide how to split with the goal of balance overhead & load  Each processor runs complete sequential BLAST using database subsets  Merge the result into XML format  Adjust BLAST statistics for database sizes  TurboHub provide backend support for scheduling, fault recovery, etc. u Main Intuitions  Divide and Conquer  BLAST compares target sequence with each sequence in the database individually  Very little communication is needed, and the communication is not order dependant  Easy to achieve parallelism by splitting the database and assembling the result

6 CMSC 838T – Presentation Techniques u Implementations  3 tier system  Client l End user submitting job to the system  Master l Java application accepts the job l Sets up for processing l Uses TurboHub u Manage task execution u Coordinate the workers u Support dynamic change in set of workers, fault tolerances, etc.  Workers

7 CMSC 838T – Presentation Techniques u Implementations Cont.  Workers l Has a local copy of NCBI blastall l Partition the database so that the resulting portion can fit into available physical memory l Initial task group of 10-20 sequences against all the databases to avoid startup cost l Some worker process will merge the results l Parse the output (store as XML format) l Adjust BLAST statistics for database size l Scheduling using Piranha models u not talked in paper, but very important

8 CMSC 838T – Presentation Techniques u TurboHub System  Developed by Scientific Computing Associates  Capabilities l Pipelining l Component Replication l Parallel Components in combination with tools from SCA, MPI, PVM, OpenMP  Application in this topic l Worker is a wrapped-up blastall components l Component scheduling l Fault recovery

9 CMSC 838T – Presentation Techniques u Task/Database Splitting  2 options  Large Task l Advantage u Maximize resource utilization u Minimize task startup overhead l Disadvantage u Load imbalance u Limit the performance gain  Small Task l Advantage and disadvantage are reverse of the above

10 CMSC 838T – Presentation Techniques u Task/Database Splitting cont.  The paper’s intermediate approach  Create large initial task by experience l Communication and program startup are trivial for at least 10-20 input query sequences with 256M memory  If the task is too large, split the databases l For multiple databases, create roughly half of databases in each sub database l For single database, split the database by half l Uses virtual shared memory l The actual database files are never sent to a worker until it actually requires them

11 CMSC 838T – Presentation Techniques u Database Splitting  Split using NCBI database formatting program formatdb  Analogy of binary tree  All the combined leaves are the database  The portion of the database to access depends on which node the worker has decided to be at  Uses all leaves under the chosen node  Advantage: l Flexibility l Deliver exact amount of data as needed l Single copy of database

12 CMSC 838T – Presentation Evaluation u Experimental environment for test one  Input data sets: 50 Expressed Sequence Tags (ESTs)  Database used: Drosophila (1,170 sequences, 123 million nucleotides), GSS Division of GENBANK (1.27 million sequences, 651 million nucleotides) E Coli (400 sequences, 4.6 million nucleotides)  A group of 500 Mhz PIII with 512K cache, 256M Memory,100Mb Ethernet u Performance result for test one  Serial version: 2131.8 second (wall clock time)  Parallel version with 11 workers: 130.0 second. (Speedup = 16)

13 CMSC 838T – Presentation Evaluation u Experimental environment for test two  Input data sets: Chromosomes 1, 2, 4 from the Arabidopsis Genome  Database used: Swiss-Prot Protein database (12.8 Million peptides)  A group of 500 Mhz PIII with 512K cache, 256M Memory,100Mb Ethernet u Performance result for test two  Serial version: 5 Days 19 hours and 13 minutes  Parallel version with 11 workers: 12 hours, 54 minutes. (Speedup = 10.8)

14 CMSC 838T – Presentation Evaluation u Experimental environment for test three  Input data sets: 500 mouse ESTs with 200-400 nucleotides each  Database used: An NT database from NCBI (1,681,522,266 nucleotides)  IBM linux cluster of 8 dual processor workstation  Each workstation contains 2 996 PIII’s with 2 G memory, 100 Mbit ethernet u Performance result for test three  Serial version: 4945 second  Parallel version with 8 workstations(16 workers): 357.03 second. (Speedup = 13.85)

15 CMSC 838T – Presentation Evaluation Analysis u Memory size vs. database size u Thrashing avoidance for superlinear speedups  Single query at a time  Single query at each node u Overhead  Need to combine results  TurboHub overhead  Database transmission overhead

16 CMSC 838T – Presentation Related Work u Other parallel BLAST  Blackstone's PowerBLAST (part of PowerCloud) l Automate the splitting of query databases into smaller chunks l Spread out over the cluster nodes' local disks for querying l Automates the merging of BLAST results l Use disk caching and scheduling techniques to speed up future queries of the same datasets  Paracel's BLAST Machine l Paracel actually got inside BLAST and parallelized the code l Post impressive speed up numbers and the statistics l Same as an unaltered BLAST query  mpiBLAST l Splits the database across each node in the cluster, so it can usually reside in the buffer-cache

17 CMSC 838T – Presentation Related Work u Words of Bill Pearson (FASTA) in response to why there are no MPI or PVM parallelized versions of BLAST  Note: Paracel’s types of parallelization  It is too fast and there is not much demand for it  95% of the time, BLAST is almost an in-memory grep  Sequence comparison is embarrassingly parallel, and very easily threaded  Distributing the sequence databases and collecting results has more overhead  FASTA is 5 - 10X slower than BLAST  Smith-Waterman is 5-20X slower than FASTA  The communications overhead is low, and distributed systems work OK for FASTA, and great for Smith-Waterman

18 CMSC 838T – Presentation Observations u Observation  Efficient due to the parallelism embedded in the BLAST algorithm l Different database splitting techniques l Feasible in practice (in computing power, user effort, etc…)  Similar result to previous work u Improvement  Due to the requirement of not changing code on BLAST, superlinear speedup is only possible if existing thrashing is avoided.  Larger memory and cache size  Better load balancing technique  Overhead reduction, flexibility vs performance


Download ppt "TurboBLAST: A Parallel Implementation of BLAST Built on the TurboHub Bin Gan CMSC 838 Presentation."

Similar presentations


Ads by Google