Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Dynamic Programming: Sequence alignment
Bayesian Evolutionary Distance P. Agarwal and D.J. States. Bayesian evolutionary distance. Journal of Computational Biology 3(1):1— 17, 1996.
Lecture 8 Alignment of pairs of sequence Local and global alignment
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Heuristic alignment algorithms and cost matrices
Sequence Alignment Algorithms in Computational Biology Spring 2006 Edited by Itai Sharon Most slides have been created and edited by Nir Friedman, Dan.
1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.
CPM '05 Sensitivity Analysis for Ungapped Markov Models of Evolution David Fernández-Baca Department of Computer Science Iowa State University (Joint work.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Sequence Alignment.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Sequence Alignment Oct 9, 2002 Joon Lee Genomics & Computational Biology.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Recap 3 different types of comparisons 1. Whole genome comparison 2. Gene search 3. Motif discovery (shared pattern discovery)
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
Introduction to Bioinformatics Algorithms Sequence Alignment.
PAM250. M. Dayhoff Scoring Matrices Point Accepted Mutations or PAM matrices Proteins with 85% identity were used -> the function is not significantly.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Sequence Alignment - III Chitta Baral. Scoring Model When comparing sequences –Looking for evidence that they have diverged from a common ancestor by.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Chapter 3 Computational Molecular Biology Michael Smith
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
An Improved Search Algorithm for Optimal Multiple-Sequence Alignment Paper by: Stefan Schroedl Presentation by: Bryan Franklin.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Pairwise Sequence Alignment and Database Searching
Multiple sequence alignment (msa)
Sequence Alignment.
Sequence Alignment Using Dynamic Programming
Computational Biology Lecture #6: Matching and Alignment
Pair Hidden Markov Model
#7 Still more DP, Scoring Matrices
Computational Biology Lecture #6: Matching and Alignment
Intro to Alignment Algorithms: Global and Local
Sequence Alignment.
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Pairwise Sequence Alignment (II)
Presentation transcript:

Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor Istrail

Sequence Comparison Biomolecular sequences  DNA sequences (string over 4 letter alphabet {A, C, G, T})  RNA sequences (string over 4 letter alphabet {ACGU})  Protein sequences (string over 20 letter alphabet {Amino Acids}) Sequence similarity helps in the discovery of genes, and the prediction of structure and function of proteins. Algorithmic Functions of Computational Biology Professor Istrail

The Basic Similarity Analysis Algorithm Global Similarity Scoring Schemes Edit Graphs Alignment = Path in the Edit Graph The Principle of Optimality The Dynamic Programming Algorithm The Traceback Algorithmic Functions of Computational Biology – Professor Istrail

Jupiter’s code: Alignment Massa Master Mas-sa Master Mass-a Master Massa- Master Algorithmic Functions of Computational Biology Professor Istrail

Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example:  GCGCATTTGAGCGA  TGCGTTAGGGTGACCA A possible alignment: - GCGCATTTGAGCGA - - TGCG - - TTAGGGTGACC match mismatch indel Algorithmic Functions of Computational Biology – Professor Istrail

Consider two sequences Over the alphabet belong to Algorithmic Functions of Computational Biology – Professor Istrail

Scoring Schemes Unit-score AC G T A C G T Algorithmic Functions of Computational Biology – Professor Istrail

Alignment ACG | | | AGG Score =(A,A)(C,G)(G,G) ++ = = 2 Unit-cost A|AA|A A is aligned with A C|GC|G C is aligned with G G is aligned with G G|GG|G Algorithmic Functions of Computational Biology – Professor Istrail

Gaps ACATGGAAT ACAGGAAAT ACAT GG - AAT ACA - GG AAAT OPTIMAL ALIGNMENTS SCORE78 AAAGGG GGGAAA SCORE AAAGGG GGGAAA “-” is the gap symbol Algorithmic Functions of Computational Biology – Professor Istrail

(x,y) = the score for aligning x with y (-,y) = the score for aligning - with y (x,-) = the score for aligning x with - Algorithmic Functions of Computational Biology – Professor Istrail

A-CG - G ATCGTG Alignment Score (A,A) +(G,G) +(C,C) +(-,T) + (G,G) THE SUM OF THE SCORES OF THE PAIRWISE ALIGNED SYMBOLS Algorithmic Functions of Computational Biology – Professor Istrail

Scoring Scheme Dayhoff score... PTIPLSRLFDNAMLRAHRLHQ SAIENQRLFNIAVSRVQHLHL Partial alignment for Monkey and Trout somatotropin proteins - A R N D C Q E G H I L K M F P S T W Y V ARNDARND 6 4 Algorithmic Functions of Computational Biology – Professor Istrail

Scoring Functions Scoring function = a sum of a terms each for a pair of aligned residues, and for each gap The meaning = log of the relative likelihood that the sequences are related, compared to being unrelated Identities and conservative substitutions are Positive terms Non-conservative substitutions are Negative terms Algorithmic Functions of Computational Biology – Professor Istrail

The Edit Graph Suppose that we want to align AGT with AT We are going to construct a graph where alignments between the two sequences correspond to paths between the begin and and end nodes of the graph. This is the Edit Graph Algorithmic Functions of Computational Biology – Professor Istrail

AGT has length 3 AT has length 2 The Edit graph has (3+1)*(2+1) nodes The sequence AGT The sequence AT Algorithmic Functions of Computational Biology – Professor Istrail

A G T A T AGT indexes the columns, and AT indexes the rows of this “table” Algorithmic Functions of Computational Biology – Professor Istrail

A G T A T The Graph is directed. The nodes (i,j) will hold values. Algorithmic Functions of Computational Biology – Professor Istrail

T A G A T Algorithmic Functions of Computational Biology Professor Istrail

T A T A G A-A- -A-A AAAA -A-A -A-A -A-A -A-A G-G- A-A- A-A- G-G- G-G- T-T- T-T- T-T- -T-T -T-T -T-T -T-T ATAT GTGT TTTT GAGA TATA Directed edges get as labels pairs of aligned letters. Algorithmic Functions of Computational Biology – Professor Istrail

Alignment = Path in the Edit Graph T A T A G A-A- -A-A AAAA -A-A -A-A -A-A -A-A G-G- A-A- A-A- G-G- G-G- T-T- T-T- T-T- -T-T -T-T -T-T -T-T ATAT GTGT TTTT GAGA TATA AGT A-T Every path from Begin to End corresponds to an alignment Every alignment corresponds to a path between Begin and End Algorithmic Functions of Computational Biology – Professor Istrail

The Principle of Optimality The optimal answer to a problem is expressed in terms of optimal answer for its sub-problems Algorithmic Functions of Computational Biology – Professor Istrail

Dynamic Programming Part 1: Compute first the optimal alignment score Part 2: Construct optimal alignment We are looking for the optimal alignment = maximal score path in the Edit Graph from the Begin vertex to the End vertex Given: Two sequences X and Y Find: An optimal alignment of X with Y Algorithmic Functions of Computational Biology – Professor Istrail

The DP Matrix S(i,j) A G T A T S(2,1) S(1,0) Algorithmic Functions of Computational Biology – Professor Istrail

The DP Matrix Matrix S =[S(i,j)] S(i,j) = The score of the maximal cost path from the Begin Vertex and the vertex (i,j) (i,j) (i,j-1) (i-1,j) (i-1,j-1) The optimal path to (i,j) must pass through one of the vertices (i-1,j-1) (i-1,j) (i,j-1) Algorithmic Functions of Computational Biology – Professor Istrail

Opt path (i,j) (i,j-1) (i-1,j) (i-1,j-1) Optimal path to (i-1,j) + (-, yj) - xi yj - S(i-1,j) + (-, yj) Algorithmic Functions of Computational Biology – Professor Istrail

Optimal path (i,j) (i-1,j) (i,j-1) (i-1,j-1) Optimal path to (i-1,j-1) + (xi,yj) S(i-1,j-1) + (xi, yj) Algorithmic Functions of Computational Biology – Professor Istrail

Optimal path (i,j) (i,j-1) (I-1,j) (i-1,j-1) Optimal path to (i,j-1) + (xi,-) S(i,j-1) + (xi, -) Algorithmic Functions of Computational Biology – Professor Istrail

The Basic ALGORITHM S(i,j) = S(i-1, j-1) + (xi, yj) S(i-1, j) + (xi, -) S(i, j-1) + (-, yj) MAX Algorithmic Functions of Computational Biology – Professor Istrail

T A T A G A-A- -A-A AAAA -A-A -A-A -A-A -A-A G-G- A-A- A-A- G-G- G-G- T-T- T-T- T-T- -T-T -T-T -T-T -T-T ATAT GTGT TTTT GAGA TATA AGT A - T Optimal Alignment Optimal Alignment and Tracback Algorithmic Functions of Computational Biology – Professor Istrail

S(i,j) = S(i-1, j-1) + (xi, yj), S(i-1, j) + (xi, -), S(i, j-1) + (-, yj) MAX 0, We add this The Basic ALGORITHM: Local Similarity Algorithmic Functions of Computational Biology – Professor Istrail

General Scoring Schemes 1. Independence of mutations at different sites Additive scoring scheme 2. Gaps of any length are considered one mutation All of the efficient alignment algorithms -- employing on the dynamic programming method --are based fundamentally on the of the fact that the scoring function is additive. Assumptions Algorithmic Functions of Computational Biology – Professor Istrail

Substitutions Matrices belong to Consider ungapped alignment of equal length sequences Compute the probability that the two sequences are related Compute the probability that the two sequences are not related Compute the ratio of the two probabilities Algorithmic Functions of Computational Biology Professor Istrail

Random Model R Every letter z occurs independently with probability q z Algorithmic Functions of Computational Biology - Course 3 Professor Istrail

Match Model M Aligned pairs of residues occur with joint probability abab p ab Algorithmic Functions of Computational Biology - Course 3 Professor Istrail

= log = i where Log-odds ratio s(a,b) = the substitution matrix Algorithmic Functions of Computational Biology - Course 3 Professor Istrail