Multiple Sequence Alignments. Lecture 12, Tuesday May 13, 2003 Reading Durbin’s book: Chapter 6.1-6.4 Gusfield’s book: Chapter 14.1, 14.2, 14.5, 14.6.1.

Slides:



Advertisements
Similar presentations
Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.
Advertisements

Algorithms for Alignment of Genomic Sequences Michael Brudno Department of Computer Science Stanford University PGA Workshop 07/16/2004.
Multiple Sequence Alignment
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
CS262 Lecture 9, Win07, Batzoglou History of WGA 1982: -virus, 48,502 bp 1995: h-influenzae, 1 Mbp 2000: fly, 100 Mbp 2001 – present  human (3Gbp), mouse.
BNFO 602 Multiple sequence alignment Usman Roshan.
Sequence Similarity. The Viterbi algorithm for alignment Compute the following matrices (DP)  M(i, j):most likely alignment of x 1 …x i with y 1 …y j.
Genomic Sequence Alignment. Overview Dynamic programming & the Needleman-Wunsch algorithm Local alignment—BLAST Fast global alignment Multiple sequence.
Welcome to CS262!. Goals of this course Introduction to Computational Biology  Basic biology for computer scientists  Breadth: mention many topics &
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
Computational Genomics Lecture 1, Tuesday April 1, 2003.
Methods to CHAIN Local Alignments Sparse Dynamic Programming O(N log N)
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
CS262 Lecture 9, Win07, Batzoglou Fragment Assembly Given N reads… Where N ~ 30 million… We need to use a linear-time algorithm.
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
Lecture 8: Multiple Sequence Alignment
CS273a Lecture 8, Win07, Batzoglou Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion.
Some new sequencing technologies. Molecular Inversion Probes.
CS262 Lecture 14, Win07, Batzoglou Multiple Sequence Alignments.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Sequence Alignment. CS262 Lecture 2, Win06, Batzoglou Complete DNA Sequences More than 300 complete genomes have been sequenced.
Rapid Global Alignments How to align genomic sequences in (more or less) linear time.
Bioinformatics and Phylogenetic Analysis
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
CS262 Lecture 9, Win07, Batzoglou Multiple Sequence Alignments.
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
CS262 Lecture 9, Win07, Batzoglou Phylogeny Tree Reconstruction
CS273a Lecture 10, Aut 08, Batzoglou Multiple Sequence Alignment.
Alignments and Comparative Genomics. Welcome to CS374! Today: Serafim: Alignments and Comparative Genomics Omkar: Administrivia.
Aligning Alignments Soni Mukherjee 11/11/04. Pairwise Alignment Given two sequences, find their optimal alignment Score = (#matches) * m - (#mismatches)
Rapid Global Alignments How to align genomic sequences in (more or less) linear time.
CS262 Lecture 14, Win06, Batzoglou Multiple Sequence Alignments.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Multiple Sequence Alignments Algorithms. MLAGAN: progressive alignment of DNA Given N sequences, phylogenetic tree Align pairwise, in order of the tree.
[Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask.
Sequence Alignment Slides courtesy of Serafim Batzoglou, Stanford Univ.
Multiple sequence alignment
CS262 Lecture 12, Win07, Batzoglou Some new sequencing technologies.
BNFO 602 Multiple sequence alignment Usman Roshan.
Putting Together Alignments & Comparing Assemblies Michael Brudno Department of Computer Science University of Toronto 6.095/ Computational Biology:
Computational Genomics Lecture 1, Tuesday April 1, 2003.
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
Multiple sequence alignment methods 1 Corné Hoogendoorn Denis Miretskiy.
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple Sequence Alignments
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Multiple Sequence Alignment. Definition Given N sequences x 1, x 2,…, x N :  Insert gaps (-) in each sequence x i, such that All sequences have the.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple sequence alignment
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Multiple Sequence Alignment
1 MAVID: Constrained Ancestral Alignment of Multiple Sequence Author: Nicholas Bray and Lior Pachter.
Multiple Sequence Alignment
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
CS 5263 Bioinformatics Lecture 7: Heuristic Sequence Alignment Tools (BLAST) Multiple Sequence Alignment.
Multiple Sequence Alignments. The Global Alignment problem AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC x y z.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Multiple sequence alignment (msa)
Alignment of Long Sequences
Multiple Sequence Alignment
Computational Genomics Lecture #3a
Presentation transcript:

Multiple Sequence Alignments

Lecture 12, Tuesday May 13, 2003 Reading Durbin’s book: Chapter Gusfield’s book: Chapter 14.1, 14.2, 14.5, Papers: avid, lagan Optional: All of Gusfield chapter 14, Papers: tcoffee, slagan, scl

Lecture 12, Tuesday May 13, 2003 Definition Given N sequences x 1, x 2,…, x N : –Insert gaps (-) in each sequence x i, such that All sequences have the same length L Score of the global map is maximum The sum-of-pairs score of an alignment is the sum of the scores of all induced pairwise alignments S(m) =  k<l s(m k, m l ) s(m k, m l ):score of induced alignment (k,l)

Lecture 12, Tuesday May 13, 2003 Consensus -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC CAG-CTATCAC--GACCGC----TCGATTTGCTCGAC CAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Find optimal consensus string m * to maximize S(m) =  i s(m *, m i ) s(m k, m l ):score of pairwise alignment (k,l)

Multiple Sequence Alignments Algorithms

Lecture 12, Tuesday May 13, 2003 Example: in 3D (three sequences): 7 neighbors/cell F(i,j,k) = max{ F(i-1,j-1,k-1)+S(x i, x j, x k ), F(i-1,j-1,k )+S(x i, x j, - ), F(i-1,j,k-1)+S(x i, -, x k ), F(i-1,j,k )+S(x i, -, - ), F(i,j-1,k-1)+S( -, x j, x k ), F(i,j-1,k )+S( -, x j, x k ), F(i,j,k-1)+S( -, -, x k ) } Multidimensional Dynamic Programming

Lecture 12, Tuesday May 13, 2003 Progressive Alignment Multiple Alignment is NP-complete Most used heuristic: Progressive Alignment Algorithm: 1.Align two of the sequences x i, x j 2.Fix that alignment 3.Align a third sequence x k to the alignment x i,x j 4.Repeat until all sequences are aligned Running Time: O( N L 2 )

Lecture 12, Tuesday May 13, 2003 Progressive Alignment When evolutionary tree is known: –Align closest first, in the order of the tree Example: Order of alignments: 1. (x,y) 2. (z,w) 3. (xy, zw) x w y z

Lecture 12, Tuesday May 13, 2003 Progressive Alignment: CLUSTALW CLUSTALW: most popular multiple protein alignment Algorithm: 1.Find all d ij : alignment dist (x i, x j ) 2.Construct a tree (Neighbor-joining hierarchical clustering) 3.Align nodes in order of decreasing similarity + a large number of heuristics

Lecture 12, Tuesday May 13, 2003 CLUSTALW & the CINEMA viewer

Lecture 12, Tuesday May 13, 2003 Iterative Refinement One problem of progressive alignment: Initial alignments are “frozen” even when new evidence comes Example: x:GAAGTT y:GAC-TT z:GAACTG w:GTACTG Frozen! Now clear correct y = GA-CTT

Lecture 12, Tuesday May 13, 2003 Iterative Refinement Algorithm (Barton-Stenberg): 1.Align most similar x i, x j 2.Align x k most similar to (x i x j ) 3.Repeat 2 until (x 1 …x N ) are aligned 4.For j = 1 to N, Remove x j, and realign to x 1 …x j-1 x j+1 …x N 5.Repeat 4 until convergence Note: Guaranteed to converge

Lecture 12, Tuesday May 13, Iterative Refinement (cont’d) For each sequence y 1.Remove y 2.Realign y (while rest fixed) x y z x,z fixed projection allow y to vary

Lecture 12, Tuesday May 13, 2003 Iterative Refinement Example: align (x,y), (z,w), (xy, zw): x:GAAGTTA y:GAC-TTA z:GAACTGA w:GTACTGA After realigning y: x:GAAGTTA y:G-ACTTA + 3 matches z:GAACTGA w:GTACTGA

Lecture 12, Tuesday May 13, 2003 Iterative Refinement Example not handled well: x:GAAGTTA y 1 :GAC-TTA y 2 :GAC-TTA y 3 :GAC-TTA z:GAACTGA w:GTACTGA Realigning any single y i changes nothing

Lecture 12, Tuesday May 13, 2003 Restricted MDP Here is a final way to improve a multiple alignment: 1.Construct progressive multiple alignment m 2.Run MDP, restricted to radius R from m Running Time: O(2 N R N-1 L)

Lecture 12, Tuesday May 13, Restricted MDP Run MDP, restricted to radius R from m x y z Running Time: O(2 N R N-1 L)

Lecture 12, Tuesday May 13, 2003 Restricted MDP (2) x:GAAGTTA y 1 :GAC-TTA y 2 :GAC-TTA y 3 :GAC-TTA z:GAACTGA w:GTACTGA Within radius 1 of the optimal  Restricted MDP will fix it.

MLAGAN: Multiple Alignment 1.Multi-Anchoring 2.Progressive Alignment 3.Iterative Refinement

Lecture 12, Tuesday May 13, Multi-anchoring X Z Y Z X/Y Z To anchor the (X/Y), and (Z) alignments:

Lecture 12, Tuesday May 13, Progressive Alignment Given N sequences, phylogenetic tree Align pairwise, in order of the tree (LAGAN) Human Baboon Mouse Rat

Lecture 12, Tuesday May 13, Iterative Refinement For each sequence y 1.Remove y 2.Anchor “good” spots 3.Realign y using LAGAN x y z x,z fixed projection

Lecture 12, Tuesday May 13, 2003 Cystic Fibrosis (CFTR), 12 species The “zoo” project Human sequence length: 1.8 Mb Total genomic sequence: 13 Mb Human Baboon Cat Dog Cow Pig Mouse Rat Chimp Chicken Fugu Zebrafish

Lecture 12, Tuesday May 13, 2003 Performance in the CFTR region Exons Perfect Exons > 90% TIME (sec) MAX MEMORY (Mb) MUMmer Mammals 25%40%4540 Chicken & Fishes 0% 740 AVID Mammals 95%98% Chicken & Fishes 23%27% LAGAN Mammals 98%99.7%55090 Chicken & Fishes 80%84%86290 MLAGAN Mammals 98%99.8% Chicken & Fishes 82%91%

Alignment & Rearrangements

Lecture 12, Tuesday May 13, 2003 Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication

Lecture 12, Tuesday May 13, 2003 Local & Global Alignment AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGAACCCTGGGTCACAAAACTC AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGAACCCTGGGTCACAAAACTC Local Global

Lecture 12, Tuesday May 13, 2003 Glocal Alignment Problem Find least cost transformation of one sequence into another using new operations Sequence edits Inversions Translocations Duplications Combinations of above AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGAACCCTGGGTCACAAAACTC

Lecture 12, Tuesday May 13, 2003 Shuffle-LAGAN A glocal aligner for long DNA sequences

Lecture 12, Tuesday May 13, 2003 S-LAGAN: Find Local Alignments 1.Find Local Alignments 2.Build Rough Homology Map 3.Globally Align Consistent Parts

Lecture 12, Tuesday May 13, 2003 S-LAGAN: Build Homology Map 1.Find Local Alignments 2.Build Rough Homology Map 3.Globally Align Consistent Parts

Lecture 12, Tuesday May 13, 2003 Building the Homology Map d a b c Chain (using Eppstein Galil); each alignment gets a score which is MAX over 4 possible chains. Penalties are affine (event and distance components) Penalties: a)regular b)translocation c)inversion d)inverted translocation

Lecture 12, Tuesday May 13, 2003 S-LAGAN: Build Homology Map 1.Find Local Alignments 2.Build Rough Homology Map 3.Globally Align Consistent Parts

Lecture 12, Tuesday May 13, 2003 S-LAGAN: Global Alignment 1.Find Local Alignments 2.Build Rough Homology Map 3.Globally Align Consistent Fragments

Lecture 12, Tuesday May 13, 2003 S-LAGAN alignments LocalLocal GlocalGlocal

Lecture 12, Tuesday May 13, 2003 S-LAGAN alignments Hum/MusHum/Mus Hum/RatHum/Rat

Lecture 12, Tuesday May 13, 2003 S-LAGAN alignments (Chr 20) Human Chr 20 v. homologous Mouse Chr Segments of conserved synteny 70 Inversions

Lecture 12, Tuesday May 13, 2003 Some more examples Hum/Mus Hum/Rat

Lecture 12, Tuesday May 13, 2003 Some more examples Hum/Mus Hum/Rat

Lecture 12, Tuesday May 13, 2003 Some more examples Hum/Mus Hum/Rat

Lecture 12, Tuesday May 13, 2003 Some more examples Hum/Mus Hum/Rat

Lecture 12, Tuesday May 13, 2003 Some more examples Hum/MusHum/Rat

Lecture 12, Tuesday May 13, 2003 Some more examples Hum/Mus Hum/Rat