Download presentation
Presentation is loading. Please wait.
1
Ab Initio Profile HMM Generation
Sam Gross
2
Profile HMMs STOLEN FROM BATZOGLOU LECTURE
BEGIN I0 I1 Im-1 D1 D2 Dm END Im Dm-1 Protein profile H Each M state has a position-specific pre-computed substitution table Each I and D state has position-specific gap penalties Profile is a generative model: The sequence X that is aligned to H, is thought of as “generated by” H Therefore, H parametrizes a conditional distribution P(X | H)
3
Ab Initio Profile Generation
Given N related protein sequences x1…xN Construct a profile HMM H such that is maximized Õ i x H P ) | (
4
Easier Said Than Done Profile HMM length is unknown
Use average sequence length Alignment is unknown HMM parameters are unknown
5
Not A New Problem Instance of the general problem of HMM parameter estimation using unlabelled outputs Instance of the even more general problem of MLE with partially missing data We want We know arg max P ( D | q ) obs q P ( D , D | q ) obs hid
6
The Expectation Maximization (EM) Algorithm
Start with initial guess for parameters Iterate until convergence: E-step: Calculate expectations for missing data M-step: Treating expectations as observations, calculate MLE for parameters
7
Baum-Welsh: EM For HMMs
Start with initial guess of HMM parameters Iterate until convergence: Forward-backward algorithm MLE using forward-backward posterior probabilities
8
Incorporating Prior Knowledge
We know in advance certain types of residues tend to align together Use a Dirichlet mixture prior over outputs for match states Each distribution in the mixture corresponds to a different “alignment environment”
9
Coin Flips Example Two trick coins used to generated a sequence of heads and tails You see only the sequence, and must determine the probability of heads for each coin Coin A Coin B
10
10,000 Coin Flips Real coins Initial guess Learned model
PA(heads) = 0.4 PB(heads) = 0.8 Initial guess PA(heads) = 0.51 PB(heads) = 0.49 Learned model PA(heads) = 0.801 PB(heads) = 0.413
11
Toy Profile Example Create a profile for the following sequences:
ADACGIH ADAGIH ADACGH AACQH ADAYGIH Use the profile to align the sequences
12
Results ADACGIH ADA-GIH ADACG-H A-ACQ-H ADAYGIH Match1 A 100%
Match2 D 100% Match3 A 100% Match4 C 75%, Y 25% Match5 G 80%, Q 20% Match6 I 62%, H 38% Match7 H 100%
13
Clustering With A Mixture Of Profiles
Given N protein sequences x1…xN Construct M profile HMMs H1…HM and a mapping F: xH such that is maximized F is a natural clustering of the protein sequences into M groups Õ i x F P )) ( |
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.