Multiple Sequence Alignment (I)

Slides:



Advertisements
Similar presentations
Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Advertisements

Multiple Sequence Alignment (MSA) I519 Introduction to Bioinformatics, Fall 2012.
Multiple Sequence Alignment
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Space Efficient Alignment Algorithms and Affine Gap Penalties
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Introduction to Bioinformatics Algorithms Multiple Alignment.
CPM '05 Sensitivity Analysis for Ungapped Markov Models of Evolution David Fernández-Baca Department of Computer Science Iowa State University (Joint work.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Introduction to Bioinformatics Algorithms Multiple Alignment.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Aligning Alignments Soni Mukherjee 11/11/04. Pairwise Alignment Given two sequences, find their optimal alignment Score = (#matches) * m - (#mismatches)
Multiple Sequence alignment Chitta Baral Arizona State University.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Aligning Alignments Exactly By John Kececioglu, Dean Starrett CS Dept. Univ. of Arizona Appeared in 8 th ACM RECOME 2004, Presented by Jie Meng.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Multiple sequence alignment methods 1 Corné Hoogendoorn Denis Miretskiy.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Multiple Sequence Alignment
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Introduction to Bioinformatics Algorithms Multiple Alignment.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Introduction to Bioinformatics Algorithms Multiple Alignment.
Developing Pairwise Sequence Alignment Algorithms
Multiple Alignment Modified from Tolga Can’s lecture notes (METU)
Sequence Alignment.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Some Independent Study on Sequence Alignment — Lan Lin prepared for theory group meeting on July 16, 2003.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Chapter 3 Computational Molecular Biology Michael Smith
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Introduction to Bioinformatics Algorithms Multiple Alignment Lecture 20.
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Pairwise Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 4, 2004 ChengXiang Zhai Department of Computer Science University.
Protein Sequence Alignment Multiple Sequence Alignment
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Introduction to Bioinformatics Algorithms Multiple Alignment.
Aligning Genomes Genome Analysis, 12 Nov 2007 Several slides shamelessly stolen from Chr. Storm.
Multiple sequence alignment (msa)
The ideal approach is simultaneous alignment and tree estimation.
Sequence Alignment.
Sequence Alignment 11/24/2018.
Computational Biology Lecture #6: Matching and Alignment
Computational Biology Lecture #6: Matching and Alignment
CSE 5290: Algorithms for Bioinformatics Fall 2011
Intro to Alignment Algorithms: Global and Local
In Bioinformatics use a computational method - Dynamic Programming.
Pairwise Sequence Alignment (cont.)
Multiple Alignment.
CSE 589 Applied Algorithms Spring 1999
Multiple Sequence Alignment (II)
Introduction to Bioinformatics
Computational Genomics Lecture #3a
MULTIPLE SEQUENCE ALIGNMENT
Pairwise Sequence Alignment (II)
Presentation transcript:

Multiple Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Oct. 4, 2005 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

Outline Motivation Scoring of multiple sequence alignments Algorithms Dynamic programming Progressive alignment (next class)

Why Multiple Alignments? Characterize protein families: Identify shared regions of homology in a multiple sequence alignment Determination of the consensus sequence of several aligned sequences. Help predict the secondary and tertiary structures of new sequences Help predict the function of new sequences Preliminary step in molecular evolution analysis using phylogenetic trees.

Example of Multiple Alignment The selected region is highly conserved with a generic globin. Multiple sequence alignment of 7 neuroglobins using clustalx (Slide from Craig A. Struble)

4 Basic Questions in Multiple Alignment Q1: How should we define s? Q2: How should we define A? Model: scoring function s: A X1=x11,…,x1m1 X1=x11,…,x1m1 Possible alignments of all Xi’s: A ={a1,…,ak} Find the best alignment(s) X2=x21,…,x2m2 X2=x21,…,x2m2 … … S(a*)= 21 XN=xN1,…,xNmN XN=xN1,…,xNmN Q4: Is the alignment biologically Meaningful? Q3: How can we find a* quickly?

Defining Multi-Sequence Alignment We may generalize our definition of pairwise sequence alignment Alignment of 2 sequences is represented as a 2-row matrix In a similar way, we represent alignment of 3 sequences as a 3-row matrix A T _ G C G _ A _ C G T _ A A T C A C _ A A column must have at least one nucleotide Question: How many possible global alignments are there for 3 sequences each of length 2?

How do we score a multiple alignment?

Scoring a Multiple Alignment Ideally, it should be based on evolutionary models In practice, We often assume columns are independent Use “Sum of Pairs” (SP scores) G is the gap score

Minimum Entropy Scoring Intuition: A perfectly aligned column has one single symbol (least uncertainty) A poorly aligned column has many distinct symbols (high uncertainty) Count of symbol a in column i This is related to the HMM formulation of the alignment problem, which we will cover later …

Entropy: Example Best case Worst case

Entropy of an Alignment: Example column entropy: -( pAlogpA + pClogpC + pGlogpG + pTlogpT) A C G T Column 1 = -[1*log(1) + 0*log0 + 0*log0 +0*log0] = 0 Column 2 = -[(1/4)*log(1/4) + (3/4)*log(3/4) + 0*log0 + 0*log0] = -[ (1/4)*(-2) + (3/4)*(-.415) ] = +0.811 Column 3 = -[(1/4)*log(1/4)+(1/4)*log(1/4)+(1/4)*log(1/4) +(1/4)*log(1/4)] = 4* -[(1/4)*(-2)] = +2 Alignment Entropy = 0 + 0.811 + 2 = +2.811

How can we find a multiple alignment quickly? Can we generalize the dynamic programming algorithm used for pairwise alignment?

Alignments = Paths in… Align 3 sequences: ATGC, AATC,ATGC A -- T G C A

Alignment Paths 1 2 3 4 x coordinate A -- T G C A T -- C -- A T G C

Alignment Paths Align the following 3 sequences: ATGC, AATC,ATGC 1 2 3 4 x coordinate A -- T G C y coordinate 1 2 3 4 A T -- C -- A T G C

Alignment Paths Resulting path in (x,y,z) space: 1 2 3 4 x coordinate A -- T G C y coordinate 1 2 3 4 A T -- C 1 2 3 4 z coordinate -- A T G C Resulting path in (x,y,z) space: (0,0,0)(1,1,0)(1,2,1) (2,3,2) (3,3,3) (4,4,4)

2-D vs 3-D Alignment Grid V W 2-D edit graph 3-D?

Architecture of 3-D Alignment Grid In 2-D, 3 edges in each unit square In 3-D, 7 edges in each unit cube

A Cell of 3-D Alignment Grid (i-1,j,k-1) (i-1,j-1,k-1) (i-1,j-1,k) (i-1,j,k) (i,j,k-1) (i,j-1,k-1) (i,j,k) (i,j-1,k)

Multiple Alignment: Dynamic Programming cube diagonal: no indels si,j,k = max (x, y, z) is an entry in the 3-D scoring matrix and can be computed using sum of pairs or entropy si-1,j-1,k-1 + (vi, wj, uk) si-1,j-1,k + (vi, wj, _ ) si-1,j,k-1 + (vi, _, uk) si,j-1,k-1 + (_, wj, uk) si-1,j,k + (vi, _ , _) si,j-1,k + (_, wj, _) si,j,k-1 + (_, _, uk) face diagonal: one indel edge diagonal: two indels

Multiple Alignment: Running Time For 3 sequences of length n, the run time is 7n3; O(n3) For k sequences, building a k-dimensional edit graph has run time (2k-1)(nk); O(2knk) Conclusion: dynamic programming approach for alignment between two sequences is easily extended to k sequences but it is impractical due to exponential running time

In the next class, we will cover more efficient algorithms -- progressive alignment ….

What You Should Know How to score a multi-sequence alignment How the dynamic programming algorithm works Computational complexity of dynamic programming algorithms