BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res. 2002 12: 656-664 Presenter: 巨彥霖 田知本.

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
BLAST Sequence alignment, E-value & Extreme value distribution.
Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.
Heuristic Local Alignerers 1.The basic indexing & extension technique 2.Indexing: techniques to improve sensitivity Pairs of Words, Patterns 3.Systems.
Searching Sequence Databases
. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.
. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.
Index-based search of single sequences Omkar Mate CS 374 Stanford University.
Similar Sequence Similar Function Charles Yan Spring 2006.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Practical algorithms in Sequence Alignment Sushmita Roy BMI/CS 576 Sep 16 th, 2014.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
CENTRFORINTEGRATIVE BIOINFORMATICSVU E [1] Sequence Analysis C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Master.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
Practical algorithms in Sequence Alignment Sushmita Roy BMI/CS 576 Sep 17 th, 2013.
Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Outline. 1. what is BLAT & why we need it 2
Exploration Session Week 8: Computational Biology Melissa Winstanley: (based on slides by Martin Tompa,
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Fast Sequence Search Multiple Sequence Alignment Xiaole Shirley Liu STAT115/STAT215, 2010.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
Indexing DNA sequences for local similarity search Joint work of Angela, Dr. Mamoulis and Dr. Yiu 17/5/2007.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
BLAST, which stands for basic local alignment search tool, is a heuristic algorithm that is used to find similar sequences of amino acids or nucleotides.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
BLAST, which stands for basic local alignment search tool, is a heuristic algorithm that is used to find similar sequences of amino acids or nucleotides.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Doug Raiford Phage class: introduction to sequence databases.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
Local alignment and BLAST Usman Roshan BNFO 601. Local alignment Global alignment recursions: Local alignment recursions.
Heuristic Alignment Algorithms Hongchao Li Jan
CS 6293 AT: Current Bioinformatics HW2 Papers 1
What is sequencing? Video: WlxM (Illumina video) WlxM.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
BLAST BNFO 236 Usman Roshan. BLAST Local pairwise alignment heuristic Faster than standard pairwise alignment programs such as SSEARCH, but less sensitive.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
A Music Search Engine for Plagiarism Detection
Homology Search Tools Kun-Mao Chao (趙坤茂)
Basics of BLAST Basic BLAST Search - What is BLAST?
Homology Search Tools Kun-Mao Chao (趙坤茂)
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Homology Search Tools Kun-Mao Chao (趙坤茂)
Fast Sequence Alignments
SMA5422: Special Topics in Biotechnology
Sequence Alignment Kun-Mao Chao (趙坤茂)
Sequence alignment, Part 2
Basic Local Alignment Search Tool (BLAST)
BIOINFORMATICS Fast Alignment
Basic Local Alignment Search Tool
Homology Search Tools Kun-Mao Chao (趙坤茂)
Sequence alignment, E-value & Extreme value distribution
CSE 5290: Algorithms for Bioinformatics Fall 2009
Presentation transcript:

BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res : Presenter: 巨彥霖 田知本

BLAT overview Use an index to find regions in genome homologous to query. Do a detailed alignment between query and homologous regions. Use dynamic programming to stitch together detailed alignments regions into detailed alignment of whole.

Index Database : non-overlapping Query : overlapping K-mer … …

Example Database: cacaattatcacgaccgc 3-mers: cac aat tat cac gac cgc Index: aat 3 gac 12 cac 0,9 tat 6 cgc 15 Query: aattctcac 3-mers: aat att ttc tct ctc tca cac

Search Criteria Single Perfect Matches Single Near Perfect Matches Multiple Perfect Matches

Notation K : K-mer size M : The match ratio between homologous area H : Homologous region size G : Query sequence size A : The alphabet size

Single Perfect Matches (1) K-mer Perfect Match Homologous region

Single Perfect Matches (2) Homologous region The prob of at least one k-mer perfect match : H KKKKKKK (Sensitivity)

Single Perfect Matches (3) The number of k-mer in the database = G / K The number of k-mer in the query = Q – K + 1  The number of k-mer that are expected to matched by chance : (Specificity)

Single Perfect Nucleotide K-mer Matches as Search Criterion

 Case (perfect match) Comparing mouse and human coding sequences at the nucleotide level : H = 100 M = 86% Sensitivity = 0.99  max K = 7 chance matches = (query = 500, database = 3 billion)

Single Near Perfect Matches (1) K-mer Near Perfect Match Homologous region Almost Perfect : One letter may mismatch

Single Near Perfect Matches (2) Sensitivity Specificity

Case (near perfect match) Comparing mouse and human coding sequences at the nucleotide level : H = 100 M = 86% Sensitivity = 0.99  max K = 12 chance matches = (query = 500, database = 3 billion)

Single Near Perfect Nucleotide K-mer Matches as Search Criterion

Multiple Perfect Matches Hit is triggered : –there must be N perfect matches –each no further than W letters from each other in the database coordinate –have the same diagonal coordinate

Example W a b c d The hits a, b, c, and d are all k letters long. Hits b and d have the same diagonal coordinate within W letters of each other. Therefore, they would match the 2 perfect K-mer search criteria. Target Coordinate Query Coordinate

Multiple Perfect Nucleotide K-mer Matches as Search Criterion

Default Nucleotide –two perfect 11-mer Protein –single perfect 5-mer for standalone version –three perfect 4-mer for client/server version

BLAST 1)Build the hash table for Sequence A. 2)Scan Sequence B for hits. 3)Extend hits.

BLAST Step 1: Build the hash table for Sequence A. (3-tuple example) For DNA sequences: Seq. A = AGATCGAT AAA AAC.. AGA 1.. ATC 3.. CGA 5.. GAT TCG 4.. TTT For protein sequences: Seq. A = ELVIS Add xyz to the hash table if Score(xyz, ELV) ≧ T; Add xyz to the hash table if Score(xyz, LVI) ≧ T; Add xyz to the hash table if Score(xyz, VIS) ≧ T;

BLAST Step2: Scan sequence B for hits.

BLAST Step2: Scan sequence B for hits. Step 3: Extend hits. hit Terminate if the score of the sxtension fades away. (That is, when we reach a segment pair whose score falls a certain distance below the best score found for shorter extensions.) BLAST 2.0 saves the time spent in extension, and considers gapped alignments.

Algorithm 1.Search Stage –Use an index to find regions in genome homologous to query 2.Alignment Stage –Do a detailed alignment between query and homologous regions 3.Stitching and Filling In –Use dynamic programming to stitch together detailed alignments regions into detailed alignment of whole

Search Stage Build an index which contains positions of each K-mer in database. Step through each overlapping K-mer in query and look it up in index Get list of ‘hits’ - positions in query and in database that match for K bases Cluster hits to find homologous regions

Search Stage Clump hits

Clump ‘clumps’ Eliminate small clumps homologous region Search Stage

Alignment Stage (nucleotide) Start from scratch with regions defined with K-mers Index on smaller K-mers, but extend each K- mer until it becomes specific Extend in both direction without mismatches or gaps and merge overlapping or continues alignments Recurse on gaps with smaller K until gap or hits are eliminated

Alignment Stage (nucleotide) recursive

Alignment Stage (protein) Extend hits into maximal scoring ungapped alignment (HSPs) with +2/-1 scoring scheme Create a graph of all possible HSP merges Use dynamic programming to traverse the graph

Alignment Stage (protein)

query homologous region HSP

Stitching and Filling In The alignment of gene is often scattered across multiple homologous regions found in the search stage query database

Stitching and Filling In query database homologous region

Evaluation Comparison with Other Tools: –mRNA/Genome Alignments –Remapped 713 mRNAs corresponding to annotated chromosome 22 –BLAT took 26 sec while Sim4 took 17,468 sec (almost 5h) Est_genomeSim4BLAT Relative speed ,000 Base accuracyN/A99.66%99.99% Gene accuracy77.7%93.4%99.5%

Evaluation Comparison with Other Tools: –Translated Mouse/Human Alignments –13 million mouse genomic reads vs. human chromosome 22 WU-TBLASTXBLAT Relative Speed1x73x % RefSeq Covered84.5%86.7% % Genome Covered2.67%2.89%

BLAT vs. BLAST Index –Query vs. Database Hits –Perfect vs. Near Perfect Alignment –Separate vs. Together

Magic Time !

Magic Prediction !No mind !Great !

Reference s04/nada.ppthttp://amber.cs.umd.edu/class/838- s04/nada.ppt TIB03_lecture3.print.pdfhttp://bioportal.weizmann.ac.il/course/ATIB/A TIB03_lecture3.print.pdf