Download presentation
Presentation is loading. Please wait.
Published byStephany York Modified over 9 years ago
1
BLAST benchmarks George Coulouris NCBI/NLM/NIH coulouri@ncbi.nlm.nih.gov June 2005
2
Motivation and goal It’s hard to define what constitutes a “typical” search. NCBI BLAST processes over 150,000 searches per day. Large scale characteristics of this workload are stable over time. Goal: Design a test suite that approximates this workload.
3
Applications Evaluate the relative performance of BLAST running on different hardware Evaluate the relative performance of different BLAST implementations
4
Components Databases Queries Tasks Driver
5
Databases Protein “nr” and nucleotide “nt” account for >80% of all searches; good choice for representative databases. Sequences are constantly added and removed; databases are updated daily. The volatility and large size of these databases make them unsuitable for benchmarking purposes.
6
Databases Solution: Generate benchmark databases from subsets of “nr” and “nt”. Non-redundant proteins are sampled from “nr”. Size ratio of nucleotide to protein databases is preserved to avoid skewing runtime results.
7
Queries >90% of protein queries are <1000 residues in length >90% of nucleotide queries are <2000 base pairs in length Should cover major model organisms Solution: Sample 200 queries from refseq_rna and refseq_protein. Resulting set covers many organisms and has a typical length distribution.
8
Tasks Program distribution: blastn50% megablast10% blastp20% blastx10% tblastn5% tblastx5%
9
Driver script Executes 200 searches according to above program distribution. Runs in 35 minutes on current hardware. Can be used to measure speed or throughput.
10
Sample results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.