Sequence Alignment Algorithms Morten Nielsen BioSys, DTU

Slides:



Advertisements
Similar presentations
Global Sequence Alignment by Dynamic Programming.
Advertisements

The fast way Morten Nielsen BioSys, DTU. The fast algorithm (O2) Database (m) Query (n) Open a gapExtending a gap P Q Affine gap penalties.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i,
S. Maarschalkerweerd & A. Tjhang1 Probability Theory and Basic Alignment of String Sequences Chapter
©CMBI 2005 Sequence Alignment In phylogeny one wants to line up residues that came from a common ancestor. For information transfer one wants to line up.
Heuristic alignment algorithms and cost matrices
1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Sequence Alignment III CIS 667 February 10, 2004.
BNFO 602 Multiple sequence alignment Usman Roshan.
Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002.
Sequences are related Darwin: all organisms are related through descent with modification Related molecules have similar functions in different organisms.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Pairwise alignment Computational Genomics and Proteomics.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Local alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Sequence Alignment.
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
BIOMETRICS Module Code: CA641 Week 11- Pairwise Sequence Alignment.
Sequence Alignment Algorithms Morten Nielsen Department of systems biology, DTU.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Protein Sequence Alignment and Database Searching.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Construction of Substitution Matrices
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Arun Goja MITCON BIOPHARMA
We want to calculate the score for the yellow box. The final score that we fill in the yellow box will be the SUM of two other scores, we’ll call them.
A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Construction of Substitution matrices
Dynamic programming with more complex models When gaps do occur, they are often longer than one residue.(biology) We can still use all the dynamic programming.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Heuristic Alignment Algorithms Hongchao Li Jan
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Outline Basic Local Alignment Search Tool
Learning to Align: a Statistical Approach
INTRODUCTION TO BIOINFORMATICS
Center for Biological Sequence Analysis
Pairwise Alignment and Database Searching
Sequence comparison: Dynamic programming
Sequence comparison: Local alignment
BNFO 602 Lecture 2 Usman Roshan.
Global, local, repeated and overlaping
BNFO 136 Sequence alignment
Pairwise sequence Alignment.
Pairwise Sequence Alignment
Lecture 14 Algorithm Analysis
Find the Best Alignment For These Two Sequences
Pairwise Alignment Global & local alignment
Dynamic Programming Finds the Best Score and the Corresponding Alignment O Alignment: Start in lower right corner and work backwards:
Sequence alignment with Needleman-Wunsch
A T C.
Basic Local Alignment Search Tool (BLAST)
Presentation transcript:

Sequence Alignment Algorithms Morten Nielsen BioSys, DTU

Why learn about alignment algorithms? All publicly available alignment programs do the same thing Sequence alignment using amino acids substitution matrices and affine gap penalties This is fast but not optimal Protein alignment is done much more accurate using sequence profiles and position specific gap penalties (price for gaps depends on the structure) Must implement your own alignment algorithm to do this

What you have been told is not true -) Outline What you have been told is not true -) Alignment algorithms are more complex The true sequence alignment algorithm story The slow algorithm (O3) The fast algorithm (O2)

Sequence alignment The old Story

Pairwise alignment: the solution ”Dynamic programming” (the Needleman-Wunsch algorithm) Match score = 2 Gap penalty = -2

Alignment depicted as path in matrix T C G C A T C A Match score = 2 Gap penalty = -2 TCGCA TC-CA Score=2 T C G C A T C A TCGCA T-CCA Score=0

Alignment depicted as path in matrix T C G C A T C A Meaning of point in matrix: all residues up to this point have been aligned (but there are many different possible paths). x Position labeled “x”: TC aligned with TC --TC -TC TC TC-- T-C TC

Dynamic programming: computation of scores T C G C A T C A Any given point in matrix can only be reached from three possible positions (you cannot “align backwards”). => Best scoring alignment ending in any given point in the matrix can be found by choosing the highest scoring of the three possibilities. x

Dynamic programming: computation of scores T C G C A T C A Any given point in matrix can only be reached from three possible positions (you cannot “align backwards”). => Best scoring alignment ending in any given point in the matrix can be found by choosing the highest scoring of the three possibilities. x score(x,y-1) - gap-penalty score(x,y) = max

Dynamic programming: computation of scores T C G C A T C A Any given point in matrix can only be reached from three possible positions (you cannot “align backwards”). => Best scoring alignment ending in any given point in the matrix can be found by choosing the highest scoring of the three possibilities. x score(x,y-1) - gap-penalty score(x-1,y-1) + substitution-score(x,y) score(x,y) = max

Dynamic programming: computation of scores T C G C A T C A Any given point in matrix can only be reached from three possible positions (you cannot “align backwards”). => Best scoring alignment ending in any given point in the matrix can be found by choosing the highest scoring of the three possibilities. x score(x,y-1) - gap-penalty score(x-1,y-1) + substitution-score(x,y) score(x-1,y) - gap-penalty score(x,y) = max

Dynamic programming: computation of scores T C G C A T C A Any given point in matrix can only be reached from three possible positions (you cannot “align backwards”). => Best scoring alignment ending in any given point in the matrix can be found by choosing the highest scoring of the three possibilities. x Each new score is found by choosing the maximum of three possibilities. For each square in matrix: keep track of where best score came from. Fill in scores one row at a time, starting in upper left corner of matrix, ending in lower right corner. score(x,y-1) - gap-penalty score(x-1,y-1) + substitution-score(x,y) score(x-1,y) - gap-penalty score(x,y) = max

Dynamic programming: example A C G T A 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1 Gaps: -2

Dynamic programming: example A C G T A 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1 Gaps: -2

Dynamic programming: example A C G T A 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1 Gaps: -2

Dynamic programming: example A C G T A 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1 Gaps: -2

Dynamic programming: example T C G C A : : : : T C - C A 1+1-2+1+1 = 2

A now the truth

Dynamic programming: computation of scores T C G C A T C A Any given point in matrix can only be reached from three possible positions (you cannot “align backwards”). => Best scoring alignment ending in any given point in the matrix can be found by choosing the highest scoring of the three possibilities. x

Dynamic programming: example What about j-2 and a gap extension?

And now the true algorithm V L I P n Start from here m

How does it work (score matrix d)? d matrix Blosum 50 matrix V L I L P 5 1 4 1 -3 V 1 5 2 5 -4 L 1 5 2 5 -4 L -3 -4 -3 -4 10 P

How does it work? (The slow way O3) D matrix d matrix V L I P 9 3 17 10 4 15 5 Gap-open W1 = -5 Gap-extension U = -1 Start from here!

How does it work? (The slow way O3) D matrix d matrix V L I P 17 10 4 15 5 Gap-open W1 = -5 Gap-extension U = -1 Check all positions in (green) row and column to check score for gap extension. CPU intensive (O3)

How does it work? Fill out the D matrix V L I L P 20 14 3 V 11 17 10 4 L 9 10 15 5 L 2 4 5 10 P Gap-open W1 = -5 Gap-extension U = -1 Check all positions in (green) row and column to check score for gap extension. CPU intensive (O3)

And now the fast algorithm (O2) Database (m) Query (n) P Q Affine gap penalties Open a gap Extending a gap

And now the fast algorithm (O2) Database (m) Query (n) P Q Open a gap Extending a gap

And now the true algorithm (cont.) Database (m) Query (n) P Q

How does it work (D,Q, and P-matrices) V L I P 5 2 3 4 10 n V P L I L P V W1 = -5 U = -1 L -5 L 5 -5 P

How does it work (D,Q,P, and E-matrices) V L I P 5 2 3 4 10 n V L I P -5 5 P W1 = -5 U = -1

How does it work (D,Q, and P-matrices) V L I P 5 -5 Q V L I P 5 2 3 4 10 n W1 = -5 U = -1

How does it work (D,Q,P, and E-matrices) V L I P 5 -5 Q V L I P 5 2 3 4 10 n W1 = -5 U = -1

How does it work (D, Q, and P-matrices) V L I P 5 -5 Q V L I P 5 2 3 4 10 n V L I P -5 5 P

How does it work (D, Q, and P-matrices) V L I P 5 -5 Q V L I P 15 5 2 3 4 10 n V L I P -5 5 P

How does it work. Eij-matrix. (Keeping track on the path) V L I P 2 4 1 eij = 1 match eij = 2 gap-opening database eij = 3 gap-extension database eij = 4 gap-opening query eij = 5 gap-extension query

How does it work (D,Q,P, and E-matrices) V L I P 5 -5 Q V L I P 15 5 2 3 4 10 n V L I P -5 5 P V L I P 1 2 4 E

How does it work (D,Q,P, and E-matrices) 11 V L I P 10 12 9 3 8 4 5 2 -2 -1 -5 Q 20 V L I P 18 14 9 3 11 15 17 10 4 8 5 2 n 18 V L I P 14 9 3 11 12 5 -1 -5 8 10 2 4 P 1 V L I P 3 5 4 2 E

And the alignment m D E n VLILP VL-LP 20 V L I P 18 14 9 3 11 15 17 10 1 V L I P 3 5 4 2 E n VLILP VL-LP

And the alignment. Gap extensions D-matrix E-matrix V L I A L P V L I A L P E 19 13 17 10 V 9 3 1 1 1 1 3 3 V 9 14 12 13 10 4 L 1 1 1 1 1 3 L n 7 8 9 10 15 5 L 5 1 5 4 1 2 L 1 2 3 4 5 10 P 5 5 5 5 4 1 P

And the alignment m D E n 15 -5 -1 = 9? V L I A L P V L I A L P V 19 13 17 10 9 3 1 1 1 1 3 3 V 9 14 12 13 10 4 L 1 1 1 1 1 3 L n 7 8 9 10 15 5 L 5 1 5 4 1 2 L 1 2 3 4 5 10 P 5 5 5 5 4 1 P 15 -5 -1 = 9?

And the alignment m D E n 5 -5 -1 = 9? V L I A L P V L I A L P V 19 13 17 10 9 3 1 1 1 1 3 3 V 9 14 12 13 10 4 L 1 1 1 1 1 3 L n 7 8 9 10 15 5 L 5 1 5 4 1 2 L 1 2 3 4 5 10 P 5 5 5 5 4 1 P 5 -5 -1 = 9?

And the alignment m D E n 15 -5 -1 = 9? *** VLIALP VL--LP 5 -5 -1 = 9? 19 13 17 10 9 3 1 1 1 1 3 3 V 9 14 12 13 10 4 L 1 1 1 1 1 3 L n 7 8 9 10 15 5 L 5 1 5 4 1 2 L 1 2 3 4 5 10 P 5 5 5 5 4 1 P 15 -5 -1 = 9? *** VLIALP VL--LP 5 -5 -1 = 9?

And now you!

Alignment is more complicated than what you have been told. Summary Alignment is more complicated than what you have been told. Simple algorithmic tricks allow for alignment in O2 time More heuristics to improve speed Limit gap length Look for high scoring regions