Presentation is loading. Please wait.

Presentation is loading. Please wait.

Randomized Algorithms for Motif Finding [1] Ch 12.2.

Similar presentations


Presentation on theme: "Randomized Algorithms for Motif Finding [1] Ch 12.2."— Presentation transcript:

1 Randomized Algorithms for Motif Finding [1] Ch 12.2

2 cctgatagacgctatctggctatccaGgtacTtaggtcctctgtgcgaatctatgcgtttccaaccat agtactggtgtacatttgatCcAtacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc aaacgtTAgtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaatttt agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtCcAtataca ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaCcgtacgGc l = 8 t=5 s 1 = 26 s 2 = 21 s 3 = 3 s 4 = 56 s 5 = 60 s DNA n = 69

3 Profile P a G g t a c T t C c A t a c g t a c g t T A g t a c g t C c A t C c g t a c g G _________________ A 3 0 1 0 3 1 1 0 C 2 4 0 0 1 4 0 0 G 0 1 4 0 0 0 3 1 T 0 0 0 5 1 0 1 4 _________________ Consensus a c g t a c g t Score = 3+4+4+5+3+4+3+4=30 l t

4 When P is Given P by count: A473010 C104530 T110027 G201341 A4/87/83/801/80 C 04/85/83/80 T1/8 002/87/8 G2/801/83/82/81/8 P by probability: Score: 4+7+4+5+3+7=30

5 Scoring a String with P A4/87/83/801/80 C 04/85/83/80 T1/8 002/87/8 G2/801/83/82/81/8 The probability of a possible consensus string: Prob(aaacct|P) = 1/2 x 7/8 x 3/8 x 5/8 x 3/8 x 7/8 =.033646 Prob(atacag|P) = 1/2 x 1/8 x 3/8 x 5/8 x 1/8 x 1/8 =.001602

6 P-Most Probable l-mer A4/87/83/801/80 C 04/85/83/80 T1/8 002/87/8 G2/801/83/82/81/8 P = Given P and a sequence ctataaaccttacatc, find the P-most probable l-mer

7 Third try: c t a t a a a c c t t a c a t c Second try: c t a t a a a c c t t a c a t c First try: c t a t a a a c c t t a c a t c P-Most Probable l-mer A4/87/83/801/80 C 04/85/83/80 T1/8 004/87/8 G2/801/83/84/81/8 Find the Prob(a|P) of every possible 6-mer: -Continue this process to evaluate every possible 6-mer

8 P-Most Probable l-mer String, Highlighted in RedCalculationsProb(a|P) ctataaaccttacat1/8 x 1/8 x 3/8 x 0 x 1/8 x 00 ctataaaccttacat1/2 x 7/8 x 0 x 0 x 1/8 x 00 ctataaaccttacat1/2 x 1/8 x 3/8 x 0 x 1/8 x 00 ctataaaccttacat1/8 x 7/8 x 3/8 x 0 x 3/8 x 00 ctataaaccttacat1/2 x 7/8 x 3/8 x 5/8 x 3/8 x 7/8.0336 ctataaaccttacat1/2 x 7/8 x 1/2 x 5/8 x 1/4 x 7/8.0299 ctataaaccttacat1/2 x 0 x 1/2 x 0 1/4 x 00 ctataaaccttacat1/8 x 0 x 0 x 0 x 0 x 1/8 x 00 ctataaaccttacat1/8 x 1/8 x 0 x 0 x 3/8 x 00 ctataaaccttacat1/8 x 1/8 x 3/8 x 5/8 x 1/8 x 7/8.0004

9 P-Most Probable l-mers in All Sequences Find the P-most probable l- mer in each of the sequences. ctataaacgttacatc atagcgattcgactg cagcccagaaccct cggtataccttacatc tgcattcaatagctta tatcctttccactcac ctccaaatcctttaca ggtcatcctttatcct A4/87/83/801/80 C 04/85/83/80 T1/8 002/87/8 G2/801/83/82/81/8 P=P=

10 New Profile ctataaacgttacat atagcgattcgactg cagcccagaaccctg cggtgaaccttacat tgcattcaatagctt tgtcctttccactca ctccaaatcctttac ggtctacctttatcc P-Most Probable l-mers form a new profile 1aaacgt 2atagcg 3aaccct 4gaacct 5atagct 6gacctT 7atcctT 8tacctT A5/8 4/8000 C00 6/84/80 T1/83/800 7/8 G2/800 1/8

11 Compare New and Old Profiles 1aaacgt 2atagcg 3aaccct 4gaacct 5atagct 6gacctT 7atcctT 8tacctT A5/8 4/8000 C00 6/84/80 T1/83/800 7/8 G4/8002/81/8 A4/87/83/801/80 C 04/85/83/80 T1/8 002/87/8 G2/801/83/82/81/8 New Score: 5+5+4+6+4+7=31Old Score: 4+7+4+5+3+7=30

12 Greedy Profile Motif Search 1.Select random starting positions s 2.Create profile P 3.Compute Score 4.Find P-most probable l-mers 5.Compute new profile based on new starting positions s 6.Compute Score 7.Repeat step 4-5 as long as Score increases Cost: Polynomial for each round, and number of round can be controlled

13 Performance Since we choose starting positions randomly, there is little chance that our guess will be close to an optimal motif, meaning it will take a very long time to find the optimal motif. It is unlikely that the random starting positions will lead us to the correct solution at all. In practice, this algorithm is run many times with the hope that random starting positions will be close to the optimum solution simply by chance.

14 Gibbs Sampling Gibbs Sampling improves on one l-mer at a time Thus proceeds more slowly and cautiously

15 Gibbs Sampling: an Example Input: t = 5 sequences, motif length l = 8 1. GTAAACAATATTTATAGC 2. AAAATTTACCTCGCAAGG 3. CCGTACTGTCAAGCGTGG 4. TGAGTAAACGACGTCCCA 5. TACTTAACACCCTGTCAA

16 Gibbs Sampling: an Example 1) Randomly choose starting positions, s=(s 1,s 2,s 3,s 4,s 5 ) in the 5 sequences: s 1 =7 GTAAACAATATTTATAGC s 2 =11 AAAATTTACCTTAGAAGG s 3 =9 CCGTACTGTCAAGCGTGG s 4 =4 TGAGTAAACGACGTCCCA s 5 =1 TACTTAACACCCTGTCAA

17 Gibbs Sampling: an Example 2) Choose one of the sequences at random: Sequence 2: AAAATTTACCTTAGAAGG s 1 =7 GTAAACAATATTTATAGC s 2 =11 AAAATTTACCTTAGAAGG s 3 =9 CCGTACTGTCAAGCGTGG s 4 =4 TGAGTAAACGACGTCCCA s 5 =1 TACTTAACACCCTGTCAA

18 Gibbs Sampling: an Example 2) Choose one of the sequences at random: Sequence 2: AAAATTTACCTTAGAAGG s 1 =7 GTAAACAATATTTATAGC s 3 =9 CCGTACTGTCAAGCGTGG s 4 =4 TGAGTAAACGACGTCCCA s 5 =1 TACTTAACACCCTGTCAA

19 Gibbs Sampling: an Example 3) Create profile P from l-mers in remaining 4 sequences: 1AATATTTA 3TCAAGCGT 4GTAAACGA 5TACTTAAC A1/42/4 3/41/4 2/4 C01/4 002/401/4 T2/41/4 2/41/4 G 000 03/40 Consensus String TAAATCGA

20 Gibbs Sampling: an Example 4) Calculate the prob(a|P) for every possible 8-mer in the removed sequence AAAATTTACCTTAGAAGG.000732 AAAATTTACCTTAGAAGG.000122 AAAATTTACCTTAGAAGG0 0 0 0 0.000183 AAAATTTACCTTAGAAGG0 0 0

21 Gibbs Sampling: an Example 5) Create a distribution of probabilities of l-mers prob(a|P), and randomly select a new starting position based on this distribution. Starting Position 1: prob( AAAATTTA | P ) =.000732 /.000122 = 6 Starting Position 2: prob( AAATTTAC | P ) =.000122 /.000122 = 1 Starting Position 8: prob( ACCTTAGA | P ) =.000183 /.000122 = 1.5 a) To create this distribution, divide each probability prob(a|P) by the lowest probability: Ratio = 6 : 1 : 1.5

22 Turning Ratios into Probabilities Probability (Selecting Starting Position 1): 6/(6+1+1.5)= 0.706 Probability (Selecting Starting Position 2): 1/(6+1+1.5)= 0.118 Probability (Selecting Starting Position 8): 1.5/(6+1+1.5)=0.176 b) Define probabilities of starting positions according to computed ratios

23 Gibbs Sampling: an Example c) Select the start position according to computed ratios: P(selecting starting position 1):.706 P(selecting starting position 2):.118 P(selecting starting position 8):.176

24 Gibbs Sampling: an Example Assume we select the substring with the highest probability – then we are left with the following new substrings and starting positions. s 1 =7 GTAAACAATATTTATAGC s 2 =1 AAAATTTACCTCGCAAGG s 3 =9 CCGTACTGTCAAGCGTGG s 4 =5 TGAGTAATCGACGTCCCA s 5 =1 TACTTCACACCCTGTCAA Earlier s 1 =7 GTAAACAATATTTATAGC s 2 =11 AAAATTTACCTTAGAAGG s 3 =9 CCGTACTGTCAAGCGTGG s 4 =4 TGAGTAAACGACGTCCCA s 5 =1 TACTTAACACCCTGTCAA

25 Gibbs Sampling: an Example 6) iterate the procedure again with the above starting positions until we can not improve the score any more.

26 How Gibbs Sampling Works 1.Randomly choose starting positions s = (s 1,...,s t ) 2.Randomly choose one of the t sequences 3.Create a profile P from the other t -1 sequences 4.For each position in the removed sequence, calculate the probability that the l-mer starting at that position was generated by P. 5.Choose a new starting position for the removed sequence at random based on the probabilities calculated in step 4. 6.Repeat steps 2-5 until there is no improvement


Download ppt "Randomized Algorithms for Motif Finding [1] Ch 12.2."

Similar presentations


Ads by Google