Sequence alignment, Part 2

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
Last lecture summary.
Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.
Sequence alignment Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Sequence Similarity Searching Class 4 March 2010.
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
We continue where we stopped last week: FASTA – BLAST
Database searching. Purposes of similarity search Function prediction by homology (in silico annotation) Function prediction by homology (in silico annotation)
Slide 1 EE3J2 Data Mining Lecture 20 Sequence Analysis 2: BLAST Algorithm Ali Al-Shahib.
. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Sequence alignment, E-value & Extreme value distribution
Sequence alignment Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
Bioinformatics and BLAST
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
BLAST What it does and what it means Steven Slater Adapted from pt.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Part 2- OUTLINE Introduction and motivation How does BLAST work?
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
What is BLAST? Basic BLAST search What is BLAST?
Heuristic Alignment Algorithms Hongchao Li Jan
CISC667, S07, Lec7, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Sequence pairwise alignment Score statistics: E-value and p-value Heuristic algorithms:
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
What is BLAST? Basic BLAST search What is BLAST?
Courtesy of Jonathan Pevsner
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Welcome to Introduction to Bioinformatics
Sequence comparison: Local alignment
Identifying templates for protein modeling:
Bioinformatics and BLAST
Comparative Genomics.
Sequence alignment BI420 – Introduction to Bioinformatics
Basic Local Alignment Search Tool (BLAST)
Sequence alignment BI420 – Introduction to Bioinformatics
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
CSE 5290: Algorithms for Bioinformatics Fall 2009
Presentation transcript:

Sequence alignment, Part 2 BI420 – Introduction to Bioinformatics Sequence alignment, Part 2 BI420 Spring 2012 Department of Biology, Boston College

Similar algorithms can be used for multiple alignment The multiple alignment of 24 hexokinase protein sequences from various species. However, real multiple alignment programs (e.g. clustalw) are usually heuristic, rather than exact

Applications of Alignment

Alignment is used for mapping sequence reads to the genome

Alignment is used in similarity search Alignment: determining how sequences have descended from a common ancestor Similarity search: determining which sequences are related to one another. Requires scoring of each alignment. query database

Alignment Exercises

Visualizing pair-wise alignments Visit a web server running a dot-plotter: http://bioweb.pasteur.fr/seqanal/interfaces/dotmatcher.html Upload hba_human and hbb_human, and create dot-plot:

MATLAB example MATLAB bioinformatics toolbox sequence analysis demo: Aligning pairs of sequences

BLAST Basic Local Alignment Search Tool

Purpose of BLAST Exact alignment algorithms, such as Needleman-Wunsch for global and Smith-Waterman for local, are slow: O(length1 * length2) . Alignment speed can be increased by using statistical properties of sequences to estimate alignment quality. This requires pre-processing of the sequence database.

The BLAST algorithms Program Database Query Typical Uses BLASTN Nucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying related transcripts. BLASTP Protein Identifying common regions between proteins. Collecting related proteins for phylogenetic analysis. BLASTX Finding protein-coding genes in genomic DNA. TBLASTN Identifying transcripts similar to a known protein (finding proteins not yet in GenBank). Mapping a protein to genomic DNA. TBLASTX Cross-species gene prediction. Searching for genes missed by traditional methods.

BLAST report BLAST example with hba_Human http://www.ncbi.nlm.nih.gov/BLAST/ BLAST example with hba_Human

BLAST report

The BLAST algorithm Sequence alignment takes place in a 2d space where diagonal lines represent regions of similarity. Gaps break up the diagonals. The search space can be considered as seq1 vs seq2, or as seq1 vs a database of a sequences. Global alignment vs. local alignment BLAST is local Maximum scoring pair (MSP) vs. High-scoring pair (HSP) BLAST finds HSPs (usually the MSP too) Gapped vs. ungapped BLAST can do both

The BLAST algorithm Alignments require word (segment pair) hits Database is preprocessed for word content of each sequence. This speeds up later calculations.

BLOSUM62 neighborhood of RGD The BLAST algorithm BLOSUM62 neighborhood of RGD RGD 17 KGD 14 QGD 13 RGE 13 EGD 12 HGD 12 NGD 12 RGN 12 AGD 11 MGD 11 RAD 11 RGQ 11 RGS 11 RND 11 RSD 11 SGD 11 TGD 11 For a given word, assign a score to neighborhood words based on scoring matrix. W (word length) and T (threshold for a word match) modulate speed and sensitivity T=12

Word length As the threshold score for a word match is increased, there are fewer matches. This makes the search more specific, but less sensitive.

2-hit seeding Alignments often have multiple word hits in clusters. Isolated word hits are frequently false leads. Most alignments have large ungapped regions. Requiring 2 word hits on the same diagonal greatly increases speed at a slight cost in sensitivity. Similar to paired-end read mapping concept.

Extension of the seed alignments Alignments are extended from seeds in each direction. Extension is terminated when the maximum score drops below X. Example The quick brown fox jumps over the lazy dog. The quiet brown cat purrs when she sees him.

BLAST statistics How significant is this similarity? >gi|23098447|ref|NP_691913.1| (NC_004193) 3-oxoacyl-(acyl carrier protein) reductase [Oceanobacillus iheyensis] Length = 253 Score = 38.9 bits (89), Expect = 3e-05 Identities = 17/40 (42%), Positives = 26/40 (64%) Frame = -1 Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++I Sbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49 How significant is this similarity?

Scoring the alignment S (score) 4 -1 4 Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++I Sbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49 4 -1 4 S (score)

Evaluating an alignment How many alignments of a given score would be expected by chance, i.e. without common evolutionary history? We expect more chance hits when the search database or when the query sequence(s) is longer. For a higher score threshold, we expect fewer chance hits.

The Karlin-Altschul equation The “Expect” or “E-value” A minor constant Scaling factor Normalized score Expected number of alignments Raw score Length of query Length of database Search space The “P-value”

The sum-statistics Multiple aligning regions from a single sequence should increase our belief that the sequence is evolutionarily related to our query. Sum statistics merge the significance (decrease the E-value) for groups of consistent alignments.

The sum-statistics The sum score is not reported by BLAST!