Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Dynamic Programming To Align Sequences

Similar presentations


Presentation on theme: "Using Dynamic Programming To Align Sequences"— Presentation transcript:

1 Using Dynamic Programming To Align Sequences
Cédric Notredame

2 Our Scope Understanding the DP concept
Coding a Global and a Local Algorithm Aligning with Affine gap penalties Saving memory Sophisticated variants…

3 Outline -Coding Dynamic Programming with Non-affine Penalties
-Turning a global algorithm into a local Algorithm -Adding affine penalties -Using A Divide and conquer Strategy -Tailoring DP to your needs: -The repeated Matches Algorithm -Double Dynamic Programming

4 Global Alignments Without Affine Gap penalties
Dynamic Programming

5 How To align Two Sequences With a Gap Penalty, A Substitution matrix and Not too Much Time
Dynamic Programming

6 A bit of History… -DP invented in the 50s by Bellman
-Programming  Tabulation -Re-invented in 1970 by Needlman and Wunsch -It took 10 year to find out…

7 The Foolish Assumption
The score of each column of the alignment is independent from the rest of the alignment It is possible to model the relationship between two sequences with: -A substitution matrix -A simple gap penalty

8 The Principle of DP If you extend optimally an optimal alignment of two sub-sequences, the result remains an optimal alignment X-XX XXXX X - Deletion Alignment Insertion ? +

9 Finding the score of i,j -Sequence 1: [1-i] -Sequence 2: [1-j]
-The optimal alignment of [1-i] vs [1-j] can finish in three different manners: - X X X -

10 Finding the score of i,j + + + 1…i 1…j-1 - j
Three ways to build the alignment 1…i 1…j 1…i-1 1…j-1 i j + 1…i-1 1…j i - +

11 Finding the score of i,j 1…i 1…j In order to Compute the score of
All we need are the scores of: 1…i-1 1…j 1…i-1 1…j-1 1…i 1…j-1

12 Formalizing the algorithm
1…j-1 - X + F(i,j-1) + Gep 1…i-1 1…j-1 X F(i,j)= best F(i-1,j-1) + Mat[i,j] + F(i-1,j) + Gep 1…i-1 1…j X - +

13 Arranging Everything in a Table
- F A S T 1…I-1 1…J-1 1…I 1…J-1 1…I-1 1…J 1…I 1…J

14 Taking Care of the Limits
The DP strategy relies on the idea that ALL the cells in your table have the same environment… This is NOT true of ALL the cells!!!! In a Dynamic Programming strategy, the most delicate part is to take care of the limits: -what happens when you start -what happens when you finish

15 Taking Care of the Limits
-1 F - -2 FA -- -3 FAT --- - F A T - -1 F - F -2 FA -- A -3 FAS --- S Match=2 MisMatch=-1 Gap=-1 T -4

16 Filing Up The Matrix

17 - F A S T -1 -2 -3 -4 -2 +2 -3 -2 +1 -4 -3 +1 -2 -3 +4 -1 +3 -3 -4 +3
-2 +2 -3 -2 +1 -4 -3 +1 -2 -3 +4 -1 +3 -3 -4 +3 -1 +2 +3 -1 -4 -5 +2 -1 -2 +2 +5 +1

18 Delivering the alignment: Trace-back
F S - A T Score of 1…3 Vs 1…4 Optimal Aln Score

19 Trace-back: possible implementation

20 Local Alignments Without Affine Gap penalties
Smith and Waterman

21 Getting rid of the pieces of Junk between the interesting bits
Smith and Waterman

22

23 The Smith and Waterman Algorithm
F(i-1,j) + Gep F(i-1,j-1) + Mat[i,j] F(i,j-1) + Gep X - 1…i 1…j-1 1…i-1 1…j + F(i,j)= best

24 The Smith and Waterman Algorithm
F(i-1,j) + Gep F(i-1,j-1) + Mat[i,j] F(i,j-1) + Gep F(i,j)= best

25 The Smith and Waterman Algorithm
Ignore The rest of the Matrix Terminate a local Aln

26 Filing Up a SW Matrix

27 Filling up a SW matrix: borders
* - A N I C E C A T C 0 A 0 T 0 A 0 N D 0 O 0 G 0 Easy: Local alignments NEVER start/end with a gap…

28 Beginning of the trace-back
Filling up a SW matrix * - A N I C E C A T C A T A N D O G Best Local score Beginning of the trace-back

29 Turning NW into SW Prepare Trace back

30 A few things to remember
SW only works if the substitution matrix has been normalized to give a Negative score to a random alignment. Chance should not pay when it comes to local alignments !

31 More than One match… -SW delivers only the best scoring Match
If you need more than one match: SIM (Huang and Millers) Or Waterman and Eggert (Durbin, p91)

32 Waterman and Eggert Iterative algorithm: 1-identify the best match
2-redo SW with used pairs forbidden 3-finish when the last interesting local extracted Delivers a collection of non-overlapping local alignments Avoid trivial variations of the optimal.

33 Adding Affine Gap Penalties
The Gotoh Algorithm

34 Forcing a bit of Biology into your alignment
The Gotoh Formulation

35 Why Affine gap Penalties are
Biologically better Cost L Afine Gap Penalty GOP GEP GOP GOP Cost=gop+L*gep Or Cost=gop+(L-1)*gep Parsimony: Evolution takes the simplest path (So We Think…)

36 But Harder To compute… + More Than 3 Ways to extend an Alignment ?
Opening Extension X - Deletion X-XX XXXX X Alignment - X Insertion

37 More Questions Need to be asked
For instance, what is the cost of an insertion ? 1…I-1 ??X 1…J-1 ??X 1…I ??- 1…J-1 ??X GEP GOP 1…I ??- 1…J ??X

38 Solution:Maintain 3 Tables
Ix: Table that contains the score of every optimal alignment 1…i vs 1…j that finishes with an Insertion in sequence X. Iy: Table that contains the score of every optimal alignment 1…I vs 1…J that finishes with an Insertion in sequence Y. M: Table that contains the score of every optimal alignment 1…I vs 1…J that finishes with an alignment between sequence X and Y

39 + The Algorithm M(i,j)= best M(i-1,j-1) + Mat(i,j) X 1…i-1 1…j-1
Ix(i-1,j-1) + Mat(i,j) Iy(i-1,j-1) + Mat(i,j) X - 1…i-1 X 1…j X + Ix(i,j)= best M(i-1,j) + gop Ix(i-1,j) + gep 1…j - - X 1…i X 1…j-1 X + Iy(i,j)= best M(i,j-1) + gop Iy(i,j-1) + gep 1…i -

40 Trace-back? Ix M Iy M(i,j) Start From BEST Ix(i,j) Iy(i,j)

41 Trace-back? Ix M Iy Navigate from one table to the next, knowing that a gap always finishes with an aligned column…

42 Going Further ? With the affine gap penalties, we have increased the number of possibilities when building our alignment. CS talk of states and represent this as a Finite State Automaton (FSA are HMM cousins)

43 Going Further ?

44 Going Further ? In Theory, there is no Limit on the number of states one may consider when doing such a computation.

45

46 Going Further ? Imagine a pairwise alignment algorithm where the gap penalty depends on the length of the gap. Can you simplify it realistically so that it can be efficiently implemented?

47 Lx Ly

48 A divide and Conquer Strategy
The Myers and Miller Strategy

49 Remember Not To Run Out of Memory
The Myers and Miller Strategy

50 A Score in Linear Space You never Need More Than The Previous Row To Compute the optimal score

51 A Score in Linear Space For I For J R2[i][j]=best R1 R2[j-1], +gep R2
For J, R1[j]=R2[j] R1 R2[j-1], +gep R1[j-1]+mat R1[j]+gep R2

52 A Score in Linear Space

53 A Score in Linear Space Or do you ????
You never Need More Than The Previous Row To Compute the optimal score You only need the matrix for the Trace-Back, Or do you ????

54 An Alignment in Linear Space
Forward Algorithm F(i,j)=Optimal score of 0…i Vs 0…j Backward algorithm B(i,j)=Optimal score of M…i Vs N…j B(i,j)+F(i,j)=Optimal score of the alignment that passes through pair i,j

55 An Alignment in Linear Space
Forward Algorithm Forward Algorithm Backward algorithm Backward algorithm Optimal B(i,j)+F(i,j)

56

57 An Alignment in Linear Space
Forward Algorithm Backward algorithm Recursive divide and conquer strategy: Myers and Miller (Durbin p35)

58 An Alignment in Linear Space

59 A Forward-only Strategy(Durbin, p35)
Forward Algorithm M -Keep Row M in memory -Keep track of which Cell in Row M lead to the optimal score -Divide on this cell

60 M M

61 An interesting application: finding sub-optimal alignments
Forward Algorithm Forward Algorithm Backward algorithm Backward algorithm Sum over the Forw/Bward and identify the score of the best aln going through cell i,j

62 Application: Non-local models
Double Dynamic Programming

63 Outline The main limitation of DP: Context independent measure

64 Double Dynamic Programming
High Level Smith and Waterman Dynamic Programming Score=Max S(i-1, j-1)+RMSd score S(i, j-1)+gp { RMSd Score Rigid Body Superposition where i and j are forced together 1 14 1 13 13 12 5 8 9

65 Double Dynamic Programming

66 Application: Repeats The Durbin Algorithm

67

68 In The End: Wraping it Up

69 Dynamic Programming Needleman and Wunsch: Delivers the best scoring global alignment Smith and Waterman: NW with an extra state 0 Affine Gap Penalties: Making DP more realistic

70 Dynamic Programming Linear space: Using Divide and Conquer Strategies Not to run out of memory Double Dynamic Programming, repeat extraction: DP can easily be adapted to a special need


Download ppt "Using Dynamic Programming To Align Sequences"

Similar presentations


Ads by Google