Download presentation
Presentation is loading. Please wait.
Published byBartholomew Walton Modified over 9 years ago
1
Profile HMMs for sequence families and Viterbi equations Linda Muselaars and Miranda Stobbe
2
2 Example alignment HBA_HUMAN –HGSAQVKGHGKKVADALTNAVAHV- HBB_HUMAN VMGNPKVKAHGKKVLGAFSDGLAHL- MYG_PHYCAMKASEDLKKHGVTVLTALGAILKK-- GLB3_CHITPIKGTAPFETHANRIVGFFSKIIGEL- GLB5_PETMALKKSADVRWHAERIINAVNDAVASM- LGB2_LUPLUPQNNPELQAHAGKVFKLVYEAAIQLQ GLB1_GLYDI---DPGVAALGAKVLAQIGVAVSHL-
3
Linda Muselaars and Miranda Stobbe3 Overview chapter 5 Ungapped score matrices. Adding insert and delete states to obtain profile HMMs. Deriving profile HMMs from multiple alignments Searching with profile HMMs. Profile HMM variants for non-global alignments. More on estimation of probabilities. Optimal model construction. Weighting training sequences.
4
Linda Muselaars and Miranda Stobbe4 Overview chapter 5 Ungapped score matrices. Adding insert and delete states to obtain profile HMMs. Deriving profile HMMs from multiple alignments Searching with profile HMMs. Profile HMM variants for non-global alignments. More on estimation of probabilities. Optimal model construction. Weighting training sequences.
5
Linda Muselaars and Miranda Stobbe5 Key-issues Identifying the relationship of an individual sequence to a sequence family. How to build a profile HMM. Use profile HMMs to detect potential membership in a family. Use profile HMMs to give an alignment of a sequence to the family.
6
Linda Muselaars and Miranda Stobbe6 Key-issues (2) Lollypops for a valuable (up to the speakers to decide) contribution to this lecture.
7
Linda Muselaars and Miranda Stobbe7 Needed theory Emission probabilities. Silent states. Pair HMMs. The Viterbi algorithm. The Forward algorithm.
8
Linda Muselaars and Miranda Stobbe8 Contents Ungapped score matrices. Adding insert and delete states to obtain profile HMMs. Deriving profile HMMs from multiple alignments. – Non-probabilistic profiles – Basic profile HMM parameterisation Searching with profile HMMs. Profile HMM variants for non-global alignments.
9
Linda Muselaars and Miranda Stobbe9 Example alignment HBA_HUMAN –HGSAQVKGHGKKVADALTNAVAHV- HBB_HUMAN VMGNPKVKAHGKKVLGAFSDGLAHL- MYG_PHYCAMKASEDLKKHGVTVLTALGAILKK-- GLB3_CHITPIKGTAPFETHANRIVGFFSKIIGEL- GLB5_PETMALKKSADVRWHAERIINAVNDAVASM- LGB2_LUPLUPQNNPELQAHAGKVFKLVYEAAIQLQ GLB1_GLYDI---DPGVAALGAKVLAQIGVAVSHL- *********************
10
Linda Muselaars and Miranda Stobbe10 Ungapped regions Gaps tend to line up. We can consider models for ungapped regions. Specify indepependent probabilities e i (a). But of course: log-odds ratio! Position specific score matrix.
11
Linda Muselaars and Miranda Stobbe11 Drawbacks Multiple alignments do have gaps. Need to be accounted for. For example: BLOCKS database, with combined scores of ungapped regions. We will develop a single probabilistic model for the whole extent of the alignment.
12
Linda Muselaars and Miranda Stobbe12 Contents Ungapped score matrices. Adding insert and delete states to obtain profile HMMs. Deriving profile HMMs from multiple alignments. – Non-probabilistic profiles – Basic profile HMM parameterisation Searching with profile HMMs. Profile HMM variants for non-global alignments.
13
Linda Muselaars and Miranda Stobbe13 Short review Emission probabilities: the probability that a certain symbol is seen when in certain state k. Silent states: states that do not emit symbols in an HMM.
14
Linda Muselaars and Miranda Stobbe14 Building the model (1) We need position sensitive gap scores. HMM with repetitive structure of (match) states. Transitions of probability 1. Emmision probabilities: e M i (a). BeginEnd MjMj....
15
Linda Muselaars and Miranda Stobbe15 Building the model (2) Deal with insertions: set of new states I i. I i have emission distribution e I i (a). Set to the background distribution q a. Begin MjMj End IjIj
16
Linda Muselaars and Miranda Stobbe16 Building the model (3) Deal with deletions. Possibly forward jumps. For arbitrarily long gaps: silent states D j. Begin MjMj End DjDj
17
Linda Muselaars and Miranda Stobbe17 Costs for additional states States for insertions: the sum of the costs of the transitions and emissions (M→ I, number of I→ I, I→ M). States for deletions: the sum of the costs of an M→ D transition and a number of D→ D transitions and an D→ M transition.
18
Linda Muselaars and Miranda Stobbe18 Full model Begin MjMj End IjIj DjDj
19
Linda Muselaars and Miranda Stobbe19 Comparison with pair HMM X q xi M p xiyj Y q yj Begin End X q xi Y q yj
20
Linda Muselaars and Miranda Stobbe20 Contents Ungapped score matrices. Adding insert and delete states to obtain profile HMMs. Deriving profile HMMs from multiple alignments. – Non-probabilistic profiles – Basic profile HMM parameterisation Searching with profile HMMs. Profile HMM variants for non-global alignments.
21
Linda Muselaars and Miranda Stobbe21 Non-probabilistic profiles Profile HMM without underlying probabilistic model. Set scores to averages of standard substitution scores. Anomalies: – Conservation of columns is not taken into account. – Scores for gaps do not behave properly.
22
Linda Muselaars and Miranda Stobbe22 Example HBA_HUMAN...VGA--HAGEY... HBB_HUMAN...V----NVDEV... MYG_PHYCA...VEA--DVAGH... GLB3_CHITP...VKG------D... GLB5_PETMA...VYS--TYETS... LGB2_LUPLU...FNA--NIPKH... GLB1_GLYDI...IAGADNGAGV... *** ***** The score for residue a in column 1 would be set to:
23
Linda Muselaars and Miranda Stobbe23 Basic profile HMM parameterisation Objective: make the probability distribution peak around members of the family. Available parameters: – Length of the model. – Transition and emission probabilities.
24
Linda Muselaars and Miranda Stobbe24 Length of the model Which multiple alignment columns do we assign to match states? And which to insert states? Heuristic rule: Columns that consist for more than 50% of gap characters should be modeled by insert states.
25
Linda Muselaars and Miranda Stobbe25 Transition probability: Emission probability: In the limit this is an accurate and consistent estimation. Pseudocount method: LaPlace’s rule. Probability parameters # of transitions from state k to state l # of transitions from state k to any other state
26
Linda Muselaars and Miranda Stobbe26 Example BatAG---C RatA-AG-C CatAG-AA- Gnat--AAAC GoatAG---C ****
27
Linda Muselaars and Miranda Stobbe27 Example continued Begin ACGTACGT End D2D2 D3D3 I2I2 I3I3 I0I0 D1D1 I1I1 D4D4 I4I4 ACGTACGT ACGTACGT ACGTACGT A 5/8 C 1/8 G 1/8 T 1/8 A 1/7 C 1/7 G 4/7 T 1/7 A 3/7 C 1/7 G 2/7 T 1/7 A 1/8 C 5/8 G 1/8 T 1/8 M 1 M 2 M 3 M 4 a M 1 M 2 = 4/7 a M 1 D 2 = 2/7 a M 1 I 1 = 1/7
28
Linda Muselaars and Miranda Stobbe28 Contents Ungapped score matrices. Adding insert and delete states to obtain profile HMMs. Deriving profile HMMs from multiple alignments. – Non-probabilistic profiles – Basic profile HMM parameterisation Searching with profile HMMs. Profile HMM variants for non-global alignments.
29
Linda Muselaars and Miranda Stobbe29 Searching with profile HMMs Obtaining significant matches of a sequence to the profile HMM: – Viterbi algorithm: P(x, π*| M). – Forward algorithm: P(x | M). Give an alignment of a sequence to the family. – Highest scoring, or Viterbi, alignment.
30
Linda Muselaars and Miranda Stobbe30 Log-odds score of best path matching subsequence x 1…i to the submodel up to state j, ending with x i being emitted by state M j : Log-odds score of the best path ending in x i being emitted by I j : The best path ending in state D j : Pair HMM: Viterbi equations
31
Linda Muselaars and Miranda Stobbe31 Viterbi equations
32
Linda Muselaars and Miranda Stobbe32 Forward algorithm
33
Linda Muselaars and Miranda Stobbe33 Initialisation and termination Viterbi algorithm: – Initialisation: – Termination: Forward algorithm: – Initialisation: – Termination:
34
Linda Muselaars and Miranda Stobbe34 Alternative to log-odds scoring Log Likelihood score (LL score) Strongly length dependent. Solutions: – Divide by sequence length – Z-score Which method is preferred?
35
Linda Muselaars and Miranda Stobbe35
36
Linda Muselaars and Miranda Stobbe36 Demo
37
Linda Muselaars and Miranda Stobbe37 Part of the profile HMM
38
Linda Muselaars and Miranda Stobbe38 Scoring
39
Linda Muselaars and Miranda Stobbe39 Part of the multiple alignment
40
Linda Muselaars and Miranda Stobbe40 Relative frequencies
41
Linda Muselaars and Miranda Stobbe41 Contents Ungapped score matrices. Adding insert and delete states to obtain profile HMMs. Deriving profile HMMs from multiple alignments. – Non-probabilistic profiles – Basic profile HMM parameterisation Searching with profile HMMs. Profile HMM variants for non-global alignments.
42
Linda Muselaars and Miranda Stobbe42 Flanking model states Used to model the flanking sequences to the actual profile match itself. Extra probabilities needed: – Emission probability: q a. – ‘Looping’ transition probability: (1 - η). – Transition probability from left flanking state: depends on application.
43
Linda Muselaars and Miranda Stobbe43 Model for local alignment Smith-Waterman style Begin MjMj End IjIj DjDj Begin End QQ
44
Linda Muselaars and Miranda Stobbe44 Model for overlap matches Begin MjMj End IjIj Q DjDj Q
45
Linda Muselaars and Miranda Stobbe45 Model for repeat matches Begin MjMj End IjIj DjDj BeginEnd Q
46
Linda Muselaars and Miranda Stobbe46 Summary Construction of a profile HMM for different kinds of alignments. Use profile HMMs to detect potential membership in a family. Use profile HMMs to give an alignment of a sequence to the family.
47
Linda Muselaars and Miranda Stobbe47 BLAST versus profile HMM Discussion subject
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.