Similar Techniques For Molecular Sequencing and Network Security Doug Madory 27 APR 05 Big Picture Big Picture Protein Structure Protein Structure Sequencing using Profile HMM Sequencing using Profile HMM
Big Picture PQS for Network Security (Us) PQS for Network Security (Us) Design HMM for network event Design HMM for network event Find event within linear stream of observed network events Find event within linear stream of observed network events Sequencing using Profile HMM (Bioinformatics) Sequencing using Profile HMM (Bioinformatics) Train HMM using known information about subsequence Train HMM using known information about subsequence Find subsequence within linear protein / genome sequence Find subsequence within linear protein / genome sequence Q: Did an event happen? Q: If it exists, where is sequence?
Profile HMM - Simple Case Train HMM Train HMM Viterbi Scoring Viterbi Scoring Backtrace Viterbi Backtrace Viterbi Query:A-, AA, TA Query:A-, AA, TA DB:ATA DB:ATA
HMM Training Build HMM with 2 M states because there are 2 columns in query M 1 M 2 Begin ACGTACGT End D2D2 I2I2 I0I0 D1D1 I1I1 ACGTACGT
HMM Training Step 1 – add pseudocount to each transition and emission M 1 M 2 Begin A 1 C 1 G 1 T 1 End D2D2 I2I2 I0I0 D1D1 I1I1 A 1 C 1 G 1 T
HMM Training Step 2 – train with A- M 1 M 2 Begin A 2 C 1 G 1 T 1 End D2D2 I2I2 I0I0 D1D1 I1I1 A 1 C 1 G 1 T
HMM Training Step 3 – train with AA M 1 M 2 Begin A 3 C 1 G 1 T 1 End D2D2 I2I2 I0I0 D1D1 I1I1 A 2 C 1 G 1 T
HMM Training Step 4 – train with TA M 1 M 2 Begin A 3 C 1 G 1 T 2 End D2D2 I2I2 I0I0 D1D1 I1I1 A 3 C 1 G 1 T
HMM Training Fully trained HMM M 1 M 2 Begin A 3 C 1 G 1 T 2 End D2D2 I2I2 I0I0 D1D1 I1I1 A 3 C 1 G 1 T
Viterbi Scoring XATA B/I 0 V B =0 M1M1M1M1 I1I1I1I1 D1D1D1D1 M2M2M2M2 I2I2I2I2 D2D2D2D2 E Insert Match Delete Moves V I 0 (1) = log a B-I0 V M 1 (0) = 0 V I 1 (0) = 0 V D 1 (0) = log a B-D1 Illegal Moves Observations States
Viterbi Scoring XATA B/I 0 V B =0 V I 0 (1) = M1M1M1M1 V M 1 (0)= 0 I1I1I1I1 V I 1 (0)= 0 D1D1D1D1 V D 1 (0)= M2M2M2M2 I2I2I2I2 D2D2D2D2 E Insert Match Delete Moves V I 0 (2) = V I 0 (1)+log a I0-I0 V I 0 (3) = V I 0 (2)+log a I0-I0 Observations States
Viterbi Scoring XATA B/I 0 V B =0 V I 0 (1) = V I 0 (2)= -1.25V I 0 (3)=-1.72 M1M1M1M1 V M 1 (0)= 0 I1I1I1I1 V I 1 (0)= 0 D1D1D1D1 V D 1 (0)= M2M2M2M2 I2I2I2I2 D2D2D2D2 E Insert Match Delete Moves V M 1 (1) = log e(A)/q + V B + log a B-M1 V M 1 (1) = log (3/7)/(1/4) V M 1 (1) = 0.23 – 0.17 = 0.06 Observations States
Viterbi Scoring XATA B/I 0 V B =0 V I 0 (1) = V I 0 (2)= -1.25V I 0 (3)=-1.72 M1M1M1M1 V M 1 (0)= 0V M 1 (1)= 0.06 I1I1I1I1 V I 1 (0)= 0 D1D1D1D1 V D 1 (0)= M2M2M2M2 I2I2I2I2 D2D2D2D2 E Insert Match Delete Moves V D 1 (1) = V I 1 (0) + log a I0D1 V D 1 (1) = – 0.47 = V M 1 (0) + log a M1I1 V I 1 (1) = 0 + max { V I 1 (0) + log a I1I1 } V D 1 (0) + log a D1I1 V I 1 (1) = 0 + max { } V I 1 (1) = Observations States
Viterbi Scoring XATA B/I 0 V B =0 V I 0 (1) = V I 0 (2)= -1.25V I 0 (3)=-1.72 M1M1M1M1 V M 1 (0)= 0V M 1 (1)= 0.06 I1I1I1I1 V I 1 (0)= 0V I 1 (1) = D1D1D1D1 V D 1 (0)= -0.78V D 1 (1)= M2M2M2M2 I2I2I2I2 D2D2D2D2 E Insert Match Delete Moves Observations States
Viterbi Scoring XATA B/I 0 V B =0 V I 0 (1) = V I 0 (2)= -1.25V I 0 (3)=-1.72 M1M1M1M1 V M 1 (0)= 0V M 1 (1)= 0.06V M 1 (2) = -1.19V M 1 (3)= I1I1I1I1 V I 1 (0)= 0V I 1 (1) = -0.47V I 1 (2) = -0.72V I 1 (3) = D1D1D1D1 V D 1 (0)= -0.78V D 1 (1)= -1.25V D 1 (2) = -1.72V D 1 (3)= M2M2M2M2 V M 2 (0)= 0V M 2 (1)= -0.47V M 2 (2)= -0.41V M 2 (3)= I2I2I2I2 V I 2 (0)= 0V I 2 (1) = -1.85V I 2 (2) = -1.07V I 2 (3) = D2D2D2D2 V D 2 (0)= -1.25V D 2 (1)= -1.25V D 2 (2)= -0.58V D 2 (3)= E V E = Insert Match Delete Moves Observations States
Profile HMM - Simple Case Demo in Python Demo in Python
Big Picture Revisited PQS for Network Security (Us) PQS for Network Security (Us) Design HMM for network event Design HMM for network event Find event within linear stream of observed network events Find event within linear stream of observed network events Sequencing using Profile HMM (Bioinformatics) Sequencing using Profile HMM (Bioinformatics) Train HMM using known information about subsequence Train HMM using known information about subsequence Find subsequence within linear protein / genome sequence Find subsequence within linear protein / genome sequence Q: Did an event happen? Q: If it exists, where is sequence?