Sequence Alignment Kun-Mao Chao (趙坤茂)

Slides:

Advertisements

Similar presentations

Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.

Advertisements

Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.

Sequence Alignment Tutorial #2

Heaviest Segments in a Number Sequence Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan.

Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Sequence Alignment Tutorial #2

Global alignment algorithm CS 6890 Zheng Lu. Introduction Global alignments find the best match over the total length of both sequences. We do global.

Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.

Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.

Introduction to Bioinformatics Algorithms Sequence Alignment.

Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.

Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.

Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.

Introduction to Bioinformatics Algorithms Sequence Alignment.

Alignment II Dynamic Programming

Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.

Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.

FA05CSE182 CSE 182-L2:Blast & variants I Dynamic Programming

Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.

Developing Pairwise Sequence Alignment Algorithms

Sequence Alignment.

Dynamic-Programming Strategies for Analyzing Biomolecular Sequences.

Dynamic Programming Method for Analyzing Biomolecular Sequences Tao Jiang Department of Computer Science University of California - Riverside (Typeset.

Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.

Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.

Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.

Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.

Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan

Expected accuracy sequence alignment Usman Roshan.

A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.

Heaviest Segments in a Number Sequence Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan.

Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.

Dynamic Programming Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan

Alignments and Phylogenetic tree Reading: Introduction to Bioinformatics. Arthur M. Lesk. Fourth Edition Chapter 5.

Sequence Alignment Kun-Mao Chao (趙坤茂)

Sequence Alignment.

Bioinformatics: The pair-wise alignment problem

LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:

Dynamic-Programming Strategies for Analyzing Biomolecular Sequences

Jin Zhang, Jiayin Wang and Yufeng Wu

Sequence Alignment Using Dynamic Programming

Sequence Alignment 11/24/2018.

SMA5422: Special Topics in Biotechnology

Shortest-Paths Trees Kun-Mao Chao (趙坤茂)

Heaviest Segments in a Number Sequence

Intro to Alignment Algorithms: Global and Local

Sequence Alignment Kun-Mao Chao (趙坤茂)

A Quick Note on Useful Algorithmic Strategies

Dynamic Programming 1/15/2019 8:22 PM Dynamic Programming.

BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment

A Note on Useful Algorithmic Strategies

A Note on Useful Algorithmic Strategies

A Note on Useful Algorithmic Strategies

A Note on Useful Algorithmic Strategies

Sequence Alignment Kun-Mao Chao (趙坤茂)

Space-Saving Strategies for Computing Δ-points

Multiple Sequence Alignment

Space-Saving Strategies for Computing Δ-points

Space-Saving Strategies for Computing Δ-points

Space-Saving Strategies for Analyzing Biomolecular Sequences

Sequence Alignment (I)

Space-Saving Strategies for Computing Δ-points

A Note on Useful Algorithmic Strategies

A Note on Useful Algorithmic Strategies

Sequence Alignment Tutorial #2

Pairwise Sequence Alignment (II)

Multiple Sequence Alignment

Space-Saving Strategies for Computing Δ-points

Space-Saving Strategies for Computing Δ-points

Dynamic Programming Kun-Mao Chao (趙坤茂)

Presentation transcript:

Sequence Alignment Kun-Mao Chao (趙坤茂) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW: http://www.csie.ntu.edu.tw/~kmchao

Useful Websites MIT Biology Hypertextbook http://www.mit.edu:8001/afs/athena/course/other/esgbio/www/7001main.html The International Society for Computational Biology: http://www.iscb.org/ National Center for Biotechnology Information (NCBI, NIH): http://www.ncbi.nlm.nih.gov/ European Bioinformatics Institute (EBI): http://www.ebi.ac.uk/ DNA Data Bank of Japan (DDBJ): http://www.ddbj.nig.ac.jp/

orz’s sequence evolution orz (kid) OTZ (adult) Orz (big head) Crz (motorcycle driver) on_ (soldier) or2 (bottom up) oΩ (back high) STO (the other way around) Oroz (me) the origin? their evolutionary relationships? their putative functional relationships?

What? The truth is more important than the facts. THETR UTHIS MOREI

Dot Matrix Sequence A：CTTAACT Sequence B：CGGATCAT C G G A T C A T

Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT An alignment of A and B: C---TTAACT CGGATCA--T Sequence A Sequence B

Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT An alignment of A and B: Mismatch Match C---TTAACT CGGATCA--T Deletion gap Insertion gap

Alignment Graph C---TTAACT CGGATCA--T Sequence A: CTTAACT Sequence B: CGGATCAT C G G A T C A T C T T A A C T C---TTAACT CGGATCA--T

A simple scoring scheme Match: +8 (w(x, y) = 8, if x = y) Mismatch: -5 (w(x, y) = -5, if x ≠ y) Each gap symbol: -3 (w(-,x)=w(x,-)=-3) C - - - T T A A C T C G G A T C A - - T +8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12 Alignment score

An optimal alignment -- the alignment of maximum score Let A=a1a2…am and B=b1b2…bn . Si,j: the score of an optimal alignment between a1a2…ai and b1b2…bj With proper initializations, Si,j can be computed as follows.

Computing Si,j j w(ai,bj) w(ai,-) i w(-,bj) Sm,n

Initializations C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 -3 -6 -9 -12 -15 -18 -21 -24 C T T A A C T

S3,5 = ？ C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 ? C T T A A C T

S3,5 = 5 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 -8 -11 -14 14 C T T A A C T optimal score

C T T A A C – T C G G A T C A T 8 – 5 –5 +8 -5 +8 -3 +8 = 14 8 – 5 –5 +8 -5 +8 -3 +8 = 14 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 -8 -11 -14 14 C T T A A C T

Now try this example in class Sequence A: CAATTGA Sequence B: GAATCTGC Their optimal alignment？

Initializations G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -3 -6 -9 -12 -15 -18 -21 -24 C AA T T G A

S4,2 = ？ G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 ? C AA T T G A

S5,5 = ？ G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 ? C AA T T G A

S5,5 = 14 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 14 24 21 18 32 29 1 27 C AA T T G A optimal score

C A A T - T G A G A A T C T G C -5 +8 +8 +8 -3 +8 +8 -5 = 27 -5 +8 +8 +8 -3 +8 +8 -5 = 27 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 14 24 21 18 32 29 1 27 C AA T T G A

Global Alignment vs. Local Alignment

Maximum-sum interval Given a sequence of real numbers a1a2…an , find a consecutive subsequence with the maximum sum. 9 –3 1 7 –15 2 3 –4 2 –7 6 –2 8 4 -9 For each position, we can compute the maximum-sum interval starting at that position in O(n) time. Therefore, a naive algorithm runs in O(n2) time.

Maximum-sum interval (The recurrence relation) Define S(i) to be the maximum sum of the intervals ending at position i. ai If S(i-1) < 0, concatenating ai with its previous interval gives less sum than ai itself.

Maximum-sum interval (Tabular computation) 9 –3 1 7 –15 2 3 –4 2 –7 6 –2 8 4 -9 S(i) 9 6 7 14 –1 2 5 1 3 –4 6 4 12 16 7 The maximum sum

Maximum-sum interval (Traceback) 9 –3 1 7 –15 2 3 –4 2 –7 6 –2 8 4 -9 S(i) 9 6 7 14 –1 2 5 1 3 –4 6 4 12 16 7 The maximum-sum interval: 6 -2 8 4

An optimal local alignment Si,j: the score of an optimal local alignment ending at ai and bj With proper initializations, Si,j can be computed as follows.

local alignment C G G A T C A T 8 5 2 3 13 11 ? C T T A A C T Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T 8 5 2 3 13 11 ? C T T A A C T

local alignment C G G A T C A T 8 5 2 3 13 11 10 7 18 C T T A A C T Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T 8 5 2 3 13 11 10 7 18 C T T A A C T The best score

A – C - T A T C A T 8-3+8-3+8 = 18 C G G A T C A T 8 5 2 3 13 11 10 7 8 5 2 3 13 11 10 7 18 C T T A A C T The best score

Now try this example in class Sequence A: CAATTGA Sequence B: GAATCTGC Their optimal local alignment？

Did you get it right? G A A T C T G C 8 5 2 3 16 13 10 7 4 24 21 18 15 8 5 2 3 16 13 10 7 4 24 21 18 15 12 19 29 26 23 37 34 32 C AA T T G A

A A T – T G A A T C T G 8+8+8-3+8+8 = 37 G A A T C T G C 8 5 2 3 16 13 10 7 4 1 24 21 18 15 12 19 29 26 23 37 34 32 C AA T T G A

Affine gap penalties C - - - T T A A C T C G G A T C A - - T Match: +8 (w(a, b) = 8, if a = b) Mismatch: -5 (w(a, b) = -5, if a ≠ b) Each gap symbol: -3 (w(-,b) = w(a,-) = -3) Each gap is charged an extra gap-open penalty: -4. -4 -4 C - - - T T A A C T C G G A T C A - - T +8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12 Alignment score: 12 – 4 – 4 = 4

Affine gap panalties A gap of length k is penalized x + k·y. gap-open penalty Three cases for alignment endings: ...x ...x ...x ...- ...- ...x gap-symbol penalty an aligned pair a deletion an insertion

Affine gap penalties Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with a deletion. Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion. Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

Affine gap penalties (A gap of length k is penalized x + k·y.)

Affine gap penalties S I D S I D -y w(ai,bj) -x-y S I D D -x-y I S -y

Constant gap penalties Match: +8 (w(a, b) = 8, if a = b) Mismatch: -5 (w(a, b) = -5, if a ≠ b) Each gap symbol: 0 (w(-,b) = w(a,-) = 0) Each gap is charged a constant penalty: -4. -4 -4 C - - - T T A A C T C G G A T C A - - T +8 0 0 0 +8 -5 +8 0 0 +8 = +27 Alignment score: 27 – 4 – 4 = 19

Constant gap penalties Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with a deletion. Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion. Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

Constant gap penalties

Restricted affine gap panalties A gap of length k is penalized x + f(k)·y. where f(k) = k for k <= c and f(k) = c for k > c Five cases for alignment endings: ...x ...x ...x ...- ...- ...x and 5. for long gaps an aligned pair a deletion an insertion

Restricted affine gap penalties

D(i, j) vs. D’(i, j) Case 1: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length <= c D(i, j) >= D’(i, j) Case 2: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length >= c D(i, j) <= D’(i, j)

Max{S(i,j)-x-ky, S(i,j)-x-cy}