Sequence alignment Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
1 CAP5510 – Bioinformatics Database Searches for Biological Sequences or Imperfect Alignments Tamer Kahveci CISE Department University of Florida.
1 Chapter 2 Data Searches and Pairwise Alignments 暨南大學資訊工程學系 黃光璿 2004/03/08.
Searching Sequence Databases
Universiteit Utrecht BLAST CD Session 2 | Wednesday 4 May 2005 Bram Raats Lee Provoost.
. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.
Lecture outline Database searches
. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Sequence alignment, E-value & Extreme value distribution
Sequence alignment Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Bioinformatics and BLAST
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Indexing DNA sequences for local similarity search Joint work of Angela, Dr. Mamoulis and Dr. Yiu 17/5/2007.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
1 Data structure:Lookup Table Application:BLAST. 2 The Look-up Table Data Structure A k-mer is a string of length k. A lookup table is a table of size.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Construction of Substitution Matrices
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
From Smith-Waterman to BLAST
Sequence Alignment.
Doug Raiford Phage class: introduction to sequence databases.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens.
Pairwise Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 4, 2004 ChengXiang Zhai Department of Computer Science University.
Step 3: Tools Database Searching
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
Genome Revolution: COMPSCI 004G 8.1 BLAST l What is BLAST? What is it good for?  Basic.
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Sequence similarity, BLAST alignments & multiple sequence alignments
Sequence comparison: Local alignment
Genome organization and Bioinformatics
Sequence alignment, Part 2
Sequence alignment BI420 – Introduction to Bioinformatics
Sequence alignment BI420 – Introduction to Bioinformatics
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
CSE 5290: Algorithms for Bioinformatics Fall 2009
Searching Sequence Databases
Presentation transcript:

Sequence alignment Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics

Biologically significant alignment hba_human hbb_human

Biologically plausible alignment

Spurious alignment (BRCA1 variant) Examples from: Biological sequence analysis. Durbin, Eddy, Krogh, Mitchison

Alignment types Examples from: BLAST. Korf, Yandell, Bedell How do we align the words: CRANE and FRAME? CRANE || | FRAME 3 matches, 2 mismatches How do we align words that are different in length? COELACANTH || ||| P-ELICAN-- COELACANTH || ||| -PELICAN-- 5 matches, 2 mismatches, 3 gaps In this case, if we assign +1 points for matches, and -1 for mismatches or gaps, we get 5 x x (-1) + 3 x (-1) = 0. This is the alignment score.

Finding the “best” alignment COELACANTH || ||| P-ELICAN-- COELACANTH | ||| PE-LICAN-- COELACANTH || P-EL-ICAN- COELACANTH PELICAN-- S=-2 S=-6S=-10 S=0

Global alignment – Needleman-Wunsch Example from: Higgs and Attwood Aligning words: SHAKE and SPEARE

Local alignment – Smith-Waterman Example from: Higgs and Attwood

Visualizing pair-wise alignments

Sequence similarity and scoring Match-mismatch-gap penalties: e.g. Match = 1 Mismatch = -5 Gap = -10 Scoring matrices

Multiple alignments clustalW

Anchored multiple alignment

Similarity searching vs. alignment Alignment Similarity search query database

The BLAST algorithms ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying related transcripts. BLASTPProtein Identifying common regions between proteins. Collecting related proteins for phylogenetic analysis. BLASTXProteinNucleotideFinding protein-coding genes in genomic DNA. TBLASTNNucleotideProteinIdentifying transcripts similar to a known protein (finding proteins not yet in GenBank). Mapping a protein to genomic DNA. TBLASTXNucleotide Cross-species gene prediction. Searching for genes missed by traditional methods.

BLAST report

gi|

The BLAST algorithm Sequence alignment takes place in a 2-dimensional space where diagonal lines represent regions of similarity. Gaps in an alignment appear as broken diagonals. The search space is sometimes considered as 2 sequences and somtimes as query x database. Global alignment vs. local alignment –BLAST is local Maximum scoring pair (MSP) vs. High-scoring pair (HSP) –BLAST finds HSPs (usually the MSP too) Gapped vs. ungapped –BLAST can do both

The BLAST algorithm RGD17 KGD14 QGD13 RGE13 EGD12 HGD12 NGD12 RGN12 AGD11 MGD11 RAD11 RGQ11 RGS11 RND11 RSD11 SGD11 TGD11 BLOSUM62 neighborhood of RGD T=12 Speed gained by minimizing search space Alignments require word hits Neighborhood words W and T modulate speed and sensitivity

Word length

2-hit seeding Alignments tend to have multiple word hits. Isolated word hits are frequently false leads. Most alignments have large ungapped regions. Requiring 2 word hits on the same diagonal (of 40 aa for example), greatly increases speed at a slight cost in sensitivity.

Extension of the seed alignments Alignments are extended from seeds in each direction. Extension is terminated when the maximum score drops below X. The quick brown fox jumps over the lazy dog. The quiet brown cat purrs when she sees him. Text example match +1 mismatch -1 no gaps

BLAST statistics >gi| |ref|NP_ | (NC_004193) 3-oxoacyl-(acyl carrier protein) reductase [Oceanobacillus iheyensis] Length = 253 Score = 38.9 bits (89), Expect = 3e-05 Identities = 17/40 (42%), Positives = 26/40 (64%) Frame = -1 Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++I Sbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49 How significant is this similarity?

Scoring the alignment Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++I Sbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI S (score)

The Karlin-Altschul equation A minor constant Expected number of alignments Length of query Length of database Search space Raw score Scaling factor Normalized score The “Expect” or “E-value” The “P-value”

The sum-statistics Sum statistics increases the significance (decreases the E- value) for groups of consistent alignments.

The sum-statistics The sum score is not reported by BLAST!