Download presentation
Presentation is loading. Please wait.
1
Algorithms for variable length Markov chain modeling Author: Gill Bejerano Presented by Xiangbin Qiu
2
Review of Markov Chain Model Often used in bioinformatics to capture relatively simple sequence patterns, such as genomic CpG islands.
3
Problem The low order Markov chains are poor classifiers Higher order chains are often impractical to implement or train. The memory and training set size requirements of an order-k Markov chain grow exponentially with k!
4
Variable length Markov Model (VMM) The models are not restricted to a predefined uniform depth (e.g. order-k). The model is constructed that fits higher order Markov dependencies where such contexts exist, while using lower order Markov dependencies elsewhere. The order is determined by examining the training data.
5
Description of Author’s Work Four main modules are implemented: Train Predict Emit 2pfa
6
Probabilistic Suffix Tree (PST) A special tree data structure
7
PST-Definitions Σ the alphabet, string set: i= 1, 2..m Empirical probability: Conditional empirical probability:
8
Parameters Minimum probability: Smoothing factors: Memory length: L Difference measure parameter: r
9
Building the PST
10
Biologically Extended PST- a Variant of PST Model
11
Incremental Model Refinement ↑ L ↑ r → 1
12
Prediction using a PST
13
Results and Discussion When averaged over all 170 families, the PST detected 90.7% of the true positives. Much better than a typical BLAST search, and comparable to an HMM trained from a multiple alignment of the input sequences in a global search mode.
14
Results and Discussion (Cont.)
16
Limitations
17
Why Significant? While performance comparable to HMM models Built in a fully automated manner Without multiple alignment Without scoring matrices Less demanding than HMMs in terms of data abundance and quality
18
Future Work An additional improvement is expected if a larger sample set is used to train the PST. Currently the PST is built from the training set alone. Obviously, training the PST on all strings of a family should improve its prediction as well.
19
Confused?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.