Download presentation
Presentation is loading. Please wait.
Published byTamsin Williamson Modified over 9 years ago
1
BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本
2
BLAT overview Use an index to find regions in genome homologous to query. Do a detailed alignment between query and homologous regions. Use dynamic programming to stitch together detailed alignments regions into detailed alignment of whole.
3
Index Database : non-overlapping Query : overlapping K-mer … …
4
Example Database: cacaattatcacgaccgc 3-mers: cac aat tat cac gac cgc Index: aat 3 gac 12 cac 0,9 tat 6 cgc 15 Query: aattctcac 3-mers: aat att ttc tct ctc tca cac 0 1 2 3 4 5 6
5
Search Criteria Single Perfect Matches Single Near Perfect Matches Multiple Perfect Matches
6
Notation K : K-mer size M : The match ratio between homologous area H : Homologous region size G : Query sequence size A : The alphabet size
7
Single Perfect Matches (1) K-mer Perfect Match Homologous region
8
Single Perfect Matches (2) Homologous region The prob of at least one k-mer perfect match : H KKKKKKK (Sensitivity)
9
Single Perfect Matches (3) The number of k-mer in the database = G / K The number of k-mer in the query = Q – K + 1 The number of k-mer that are expected to matched by chance : (Specificity)
10
Single Perfect Nucleotide K-mer Matches as Search Criterion
11
Case (perfect match) Comparing mouse and human coding sequences at the nucleotide level : H = 100 M = 86% Sensitivity = 0.99 max K = 7 chance matches = 13078962 (query = 500, database = 3 billion)
12
Single Near Perfect Matches (1) K-mer Near Perfect Match Homologous region Almost Perfect : One letter may mismatch
13
Single Near Perfect Matches (2) Sensitivity Specificity
14
Case (near perfect match) Comparing mouse and human coding sequences at the nucleotide level : H = 100 M = 86% Sensitivity = 0.99 max K = 12 chance matches = 275671 (query = 500, database = 3 billion)
15
Single Near Perfect Nucleotide K-mer Matches as Search Criterion
16
Multiple Perfect Matches Hit is triggered : –there must be N perfect matches –each no further than W letters from each other in the database coordinate –have the same diagonal coordinate
17
Example W a b c d The hits a, b, c, and d are all k letters long. Hits b and d have the same diagonal coordinate within W letters of each other. Therefore, they would match the 2 perfect K-mer search criteria. Target Coordinate Query Coordinate
18
Multiple Perfect Nucleotide K-mer Matches as Search Criterion
19
Default Nucleotide –two perfect 11-mer Protein –single perfect 5-mer for standalone version –three perfect 4-mer for client/server version
20
BLAST 1)Build the hash table for Sequence A. 2)Scan Sequence B for hits. 3)Extend hits.
21
BLAST Step 1: Build the hash table for Sequence A. (3-tuple example) For DNA sequences: Seq. A = AGATCGAT 12345678 AAA AAC.. AGA 1.. ATC 3.. CGA 5.. GAT 2 6.. TCG 4.. TTT For protein sequences: Seq. A = ELVIS Add xyz to the hash table if Score(xyz, ELV) ≧ T; Add xyz to the hash table if Score(xyz, LVI) ≧ T; Add xyz to the hash table if Score(xyz, VIS) ≧ T;
22
BLAST Step2: Scan sequence B for hits.
23
BLAST Step2: Scan sequence B for hits. Step 3: Extend hits. hit Terminate if the score of the sxtension fades away. (That is, when we reach a segment pair whose score falls a certain distance below the best score found for shorter extensions.) BLAST 2.0 saves the time spent in extension, and considers gapped alignments.
24
Algorithm 1.Search Stage –Use an index to find regions in genome homologous to query 2.Alignment Stage –Do a detailed alignment between query and homologous regions 3.Stitching and Filling In –Use dynamic programming to stitch together detailed alignments regions into detailed alignment of whole
25
Search Stage Build an index which contains positions of each K-mer in database. Step through each overlapping K-mer in query and look it up in index Get list of ‘hits’ - positions in query and in database that match for K bases Cluster hits to find homologous regions
26
Search Stage Clump hits
27
Clump ‘clumps’ Eliminate small clumps homologous region Search Stage
28
Alignment Stage (nucleotide) Start from scratch with regions defined with K-mers Index on smaller K-mers, but extend each K- mer until it becomes specific Extend in both direction without mismatches or gaps and merge overlapping or continues alignments Recurse on gaps with smaller K until gap or hits are eliminated
29
Alignment Stage (nucleotide) recursive
30
Alignment Stage (protein) Extend hits into maximal scoring ungapped alignment (HSPs) with +2/-1 scoring scheme Create a graph of all possible HSP merges Use dynamic programming to traverse the graph
31
Alignment Stage (protein)
32
query homologous region HSP
33
Stitching and Filling In The alignment of gene is often scattered across multiple homologous regions found in the search stage query database
34
Stitching and Filling In query database homologous region
35
Evaluation Comparison with Other Tools: –mRNA/Genome Alignments –Remapped 713 mRNAs corresponding to annotated chromosome 22 –BLAT took 26 sec while Sim4 took 17,468 sec (almost 5h) Est_genomeSim4BLAT Relative speed1333223,000 Base accuracyN/A99.66%99.99% Gene accuracy77.7%93.4%99.5%
36
Evaluation Comparison with Other Tools: –Translated Mouse/Human Alignments –13 million mouse genomic reads vs. human chromosome 22 WU-TBLASTXBLAT Relative Speed1x73x % RefSeq Covered84.5%86.7% % Genome Covered2.67%2.89%
37
BLAT vs. BLAST Index –Query vs. Database Hits –Perfect vs. Near Perfect Alignment –Separate vs. Together
38
Magic Time !
39
Magic 4 4 3 3.5 Prediction !No mind !Great !
40
Reference http://amber.cs.umd.edu/class/838- s04/nada.ppthttp://amber.cs.umd.edu/class/838- s04/nada.ppt http://bioportal.weizmann.ac.il/course/ATIB/A TIB03_lecture3.print.pdfhttp://bioportal.weizmann.ac.il/course/ATIB/A TIB03_lecture3.print.pdf
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.