Download presentation
Presentation is loading. Please wait.
Published byAmberly Griffin Modified over 9 years ago
1
Effective hidden Markov models for detecting splicing junction sites in DNA sequences Authors: Michael M. Yin and Jason T. L. Wang Sources: Information Sciences, 139(1-2), pp. 139-163, 2001. Advisor: Min-Shiang Hwang Speaker: Chun-Ta Li
2
2 Introduction (1/1) codon: 密碼子 introns: 內含子 exons: 編碼順序 donor: 捐贈者
3
3 Using HMMs to model splicing junction sites (1/3) The Donor Model
4
4 Using HMMs to model splicing junction sites (2/3) The Acceptor Model
5
5 Using HMMs to model splicing junction sites (3/3) Two modules for each model –The Donor Model: true site modules (true sites in the training data set) false site modules (false sites in the training data set) –The Acceptor Model: true site modules (true sites in the training data set) false site modules (false sites in the training data set)
6
6 Algorithm (1/3) Training algorithm (Donor Model) –Two training data sets positive training data set, E t negative training data set, E f –Probability of a transition form base b i to base b i+1 –True Donor Module P( True | S, M (t) ) S: a sequence S in set M –False Donor Module P( False | S, M (f) ) S: a sequence S in set M testing data set, M 200 true donor sites 14000 false donor sites
7
7 Algorithm (2/3) Bayes’ rule Probability of S being a donor sequence Probability of S being a nondonr sequence
8
8 Algorithm (3/3) The pratio is calculated for each sequence in set M Sort the pratio values in the descending order Calculates the positive lower bound, denoted L p A sequence S > L p assigns into set P pratio value of positive sequence in set M T (TP) :屬於 set P 的 positive sequences T (P+N) :在 set M 的 positive sequences T (PP) :在 set P 的 sequences
10
Algorithm for classifying splicing junction donor sequences
11
11 Example (1/4) Training data set M 200 true donor sites 14000 false donor sites
12
12 Example (2/4) A sequence S (AGGGTCAGT) P(S|True,M (t) ) P(True) = 200/14200=0.014 = 0.05*0.11*0.81*1*0.03*0.02*0.63*0.46 = 0.0000007746354 P(S)P(S) = 0.32*0.13*0.81*1*1*0.03*0.72*0.83*0.51 = 0.0003081 P(True|S,M (t) ) = ( 0.0000007746354*0.014)/0.0003081 = 0.0000352
13
13 Example (3/4) A sequence S (AGGGTCAGT) P(S|False,M (f) ) P(False) = 0.986 = 0.07*0.08*0.27*1*0.22*0.06*0.07*0.07 = 0.00000009779616 P(S)P(S) = 0.25*0.25*0.27*1*1*0.22*0.24*0.25*0.3 = 0.00006683 P(False|S,M (f) ) = ( 0.00000009779616*0.986)/0.00006683 = 0.001443
14
14 Example (4/4) pratio = 0.0000352/ 0.001443 = 0.0244 200 pratio values Descending order L p = 180 th Testing data, S cand Table 1 & 2 sratio KIND i (True donor) (False donor) (True donor or False donor)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.