Download presentation
Presentation is loading. Please wait.
1
A Study of GeneWise with the Drosophila Adh Region Asta Gindulyte CMSC 838 Presentation Authors: Yi Mo, Moira Regelson, and Mike Sievers Paracel Inc., Pasadena, CA
2
CMSC 838T – Presentation Motivation u Genome annotation Extraction of biologically relevant knowledge from raw genomic sequence data u Need faster genome annotation methods DNA sequences are very long (millions of nucleotides) Current methods are computationally too expensive u Approach/Solution GeneMatcher2 hardware acceleration of GeneWise
3
CMSC 838T – Presentation Outline u Motivation Genome annotation u GeneMatcher2 Design ASIC hardware u Comparison GeneWise algorithm HalfWise algorithm Performance (time, precision) u Observations Performance improvement Cost effectiveness
4
CMSC 838T – Presentation Approach u Problem: make GeneWise run faster “Embarassingly parallel” algorithm Computationally too expensive when run in parallel on PC’s u Paracell’s solution: hardware acceleration Don’t change the algorithm Produce an implementation on the GeneMatcher2 supercomputer that works as much like the original software as possible 6LITE algorithm, now also in Wise2
5
CMSC 838T – Presentation GeneMatcher Architecture
6
CMSC 838T – Presentation ASIC Hardware u ASIC – application specific integration circuit Designed to speed up dynamic programming algorithms l (could be used for Smith-Waterman) Each ASIC board has 3072 processors System has up to 9 boards Cost per board around $40K
7
CMSC 838T – Presentation GeneWise Algorithm u Perform a search of genomic DNA sequence data using a protein HMM Build HMMs from protein families Scan genome using HMM l Look for start codon l “GT” sequence signals possible 5’ splice site l “AG” sequence signals possible 3’ splice site Dynamic programming used in the scanning process l Obtain probability of the most likely path in HMM generating the sequence l Obtain alignment by backtracking
8
CMSC 838T – Presentation GeneWise model on GeneMatcher2
9
CMSC 838T – Presentation HalfWise Algorithm u Reduce cost by running BLAST to select HMMs with possible hits u Use these HMMs with GeneWise database search and sequence alignment algorithm u May miss some genes due to BLAST misses
10
CMSC 838T – Presentation Evaluation u Test data set A genomic DNA sequence contig of about 2.9 Mb from the Drosophila Adh region Focuss on finding all Pfam (Protein families database of alignments and HMMs) protein profile-HMMs that occur in the Adh genomic sequence
11
CMSC 838T – Presentation Evaluation: Speed
12
CMSC 838T – Presentation Evaluation: Score
13
CMSC 838T – Presentation Evaluation: Sensitivity and Specificity
14
CMSC 838T – Presentation Observations u Performance improvement The speedup is several orders of magnitude. l Makes real target applications possible Accuracy might be improved over HalfWise algorithm u Cost effectiveness System used costs around $500K 500K worth Linux PC’s (500 processors at $1K each) would run about 10 times slower u Weaknesses Cannot modify the algorithm Not enough data to assess scalability
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.