Presentation is loading. Please wait.

Presentation is loading. Please wait.

Algorithms for variable length Markov chain modeling Author: Gill Bejerano Presented by Xiangbin Qiu.

Similar presentations


Presentation on theme: "Algorithms for variable length Markov chain modeling Author: Gill Bejerano Presented by Xiangbin Qiu."— Presentation transcript:

1 Algorithms for variable length Markov chain modeling Author: Gill Bejerano Presented by Xiangbin Qiu

2 Review of Markov Chain Model Often used in bioinformatics to capture relatively simple sequence patterns, such as genomic CpG islands.

3 Problem The low order Markov chains are poor classifiers Higher order chains are often impractical to implement or train.  The memory and training set size requirements of an order-k Markov chain grow exponentially with k!

4 Variable length Markov Model (VMM) The models are not restricted to a predefined uniform depth (e.g. order-k). The model is constructed that fits higher order Markov dependencies where such contexts exist, while using lower order Markov dependencies elsewhere. The order is determined by examining the training data.

5 Description of Author’s Work Four main modules are implemented:  Train  Predict  Emit  2pfa

6 Probabilistic Suffix Tree (PST) A special tree data structure

7 PST-Definitions Σ the alphabet, string set: i= 1, 2..m Empirical probability: Conditional empirical probability:

8 Parameters Minimum probability: Smoothing factors: Memory length: L Difference measure parameter: r

9 Building the PST

10 Biologically Extended PST- a Variant of PST Model

11 Incremental Model Refinement ↑ L ↑ r → 1

12 Prediction using a PST

13 Results and Discussion When averaged over all 170 families, the PST detected 90.7% of the true positives. Much better than a typical BLAST search, and comparable to an HMM trained from a multiple alignment of the input sequences in a global search mode.

14 Results and Discussion (Cont.)

15

16 Limitations

17 Why Significant? While performance comparable to HMM models Built in a fully automated manner  Without multiple alignment  Without scoring matrices Less demanding than HMMs in terms of data abundance and quality

18 Future Work An additional improvement is expected if a larger sample set is used to train the PST. Currently the PST is built from the training set alone. Obviously, training the PST on all strings of a family should improve its prediction as well.

19 Confused?


Download ppt "Algorithms for variable length Markov chain modeling Author: Gill Bejerano Presented by Xiangbin Qiu."

Similar presentations


Ads by Google