Comp. Genomics Recitation 11 SCFG.

Slides:



Advertisements
Similar presentations
Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun.
Advertisements

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
Stochastic Context Free Grammars for RNA Modeling CS 838 Mark Craven May 2001.
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Theory By Johan Walters (SR 2003)
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Lecture 6, Thursday April 17, 2003
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
Hidden Markov Models 1 2 K … x1 x2 x3 xK.
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Noncoding RNA Genes Pt. 2 SCFGs CS374 Vincent Dorie.
Inside-outside algorithm LING 572 Fei Xia 02/28/06.
Hidden Markov Models.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
More on Text Management. Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Recitation on EM slides taken from:
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.
Exercises on Chomsky Normal Form and CYK parsing
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
PCFG estimation with EM The Inside-Outside Algorithm.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
CSE182-L10 HMM applications.
Hidden Markov Models BMI/CS 576
Comp. Genomics Recitation 6 14/11/06 ML and EM.
Chomsky Normal Form CYK Algorithm
Stochastic Context-Free Grammars for Modeling RNA
Simplifications of Context-Free Grammars
Hidden Markov Models - Training
Stochastic Context-Free Grammars for Modeling RNA
Bayesian Models in Machine Learning
Introduction to EM algorithm
Three classic HMM problems
Parsing Costas Busch - LSU.
Stochastic Context Free Grammars for RNA Structure Modeling
CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure CISC667, S07, Lec19, Liao.
CSE 5290: Algorithms for Bioinformatics Fall 2009
The Cocke-Kasami-Younger Algorithm
Presentation transcript:

Comp. Genomics Recitation 11 SCFG

Exercise Convert to SCFG p 1-p q 1-q W1 W2 1-q Different emission probabilities (e.g. DNA compositions) Convert to SCFG

Solution W1aW1|cW1|…|aW2|cW2…|tW2 W2aW2|cW2|…|aW1|cW1…|tW1 p(W1aW1)=ew1(a)p p(W1aW2)=ew1(a)(1-p)

Solution Other rules trivial Regular CF

Exercise Convert the production rule WaWbW to Chomsky normal form. If the probability of the original production is p, show the probabilities for the productions in your normal form version.

Solution Old rule: WaWbW Chomsky normal form requires that all production rules are of the form: WvWyWz or Wza We define four new non-terminals: W1,W2,Wa,Wb The new rules are: WW1W2 W1WaW W2WbW Waa Wbb

Solution For every non-terminal, the sum of probabilities of all production rules must be 1 Since the new non-terminals have only one rule, their rules will be assigned probability 1 The rule WW1W2 will therefore have probability p, same as the rule that we eliminated

שאלה 4 ממועד ב', תשע"ב יהי x=x1x2…xn מחרוזת רנא (RNA) מעל א"ב ACGU. קיפול דו-ממדי של המחרוזת הוא אוסף של זוגות זרים של אינדקסים בין 1 ל-n שהוא מקונן, כלומר אם מקומותa,b מזווגים, ומקומות c,d מזווגים, וכןa<b, c<d וגם a<c אזי לא ייתכן c<b<d. בסיס יכול להיות מזווג עם בסיס אחר ברצף, ואם אינו מזווג הוא נקרא חופשי. זיווג ייתכן בין הבסיסים A ל-U ובין C ל-G. עבור זוג (i,j) נגדיר את הזוג (i+1,j-1) בתור הצמוד לו.

המשך שאלה 4 ממועד ב', תשע"ב נגדיר מודל אנרגטי פשוט של קיפול בצורה הבאה: אם בסיס חופשי, אין לו תרומה אנרגטית. אם בסיס מזווג, יש לו תרומה (שלילית) אך ורק אם הזוג הצמוד לו גם מזווג. יש לתאר אלגוריתם תכנון דינמי יעיל ככל האפשר המוצא קיפול הממקסם את מספר הזוגות המזווגים שהזוג הצמוד להם גם הוא מזווג (היינו קיפול בעל אנרגיה מינימלית).

פתרון שאלה 4, מועד ב', תשע"ב נגדיר A(i, j) קיפול עם אנרגיה מינימלית בין i ל-j. W(i, j) = קיפול עם אנרגיה מינימלית, כש-i ו-j לא מזווגים. V(i, j) = קיפול עם אנרגיה מינימלית, כש-i ו-j מזווגים.

המשך פתרון שאלה 4, מועד ב', תשע"ב A(i, j) =max(W(i,j), V(i,j) ) if xi and xj can be paired, W(i,j) otherwise W(i,j) = max{i≤k<j} (A(i,k)+A(k+1,j)) V(i,j) = max(1+V(i+1, j-1), W(i+1, j-1)) if xi and xj can be paired, W(i+1, j-1) otherwise A(i,i) = 0, A(i,i+1) = 0

EM algorithm for SCFG Initial estimate. Calculate expectations: E(X->YZ), E(X) Update rule: Pt+1(X->YZ)=E(X->YZ)/E(X) Repeat until convergence.

Probability calculation| x,Θ The probability that state v is used as a root in the derivation of xi,…,xj: The probability the rule vyz is used in deriving Xij (v is the root):

Expectation calculation The expected number of times state v is used in a derivation: inside outside The expected number of times the rule vyz is used:

EM for SCFG How to compute the new probability for vyz? What about va?

Example T1 Suppose that our data contains the following sentence: S V hangs pictures without frames

Example The sentence was generated using the following production rules: SNV with probability p(SNV) VVN … NNP … PPPN NHe Vhangs Npictures PPwithout Nframes

Example The likelihood of this sentence is: We believe in our sentence! We start with some initial probabilities and want to the likelihood of the sentence using the EM algorithm

Example To make it more interesting, let’s add another production rule: VVNP S T2 V P N V N PP N He hangs pictures without frames

Example But now the grammar is no longer Chomsky normal form We will turn it into Chomsky normal form as follows: VV N-P p(VV N-P)=p(VVNP) N-PN P p(N-PN P)=1.0

Example Compute inside probabilities

Example He N V hangs N pictures PP without N frames

Example He N S V V hangs N pictures PP P without N frames

Example Box(1,3) accounts for substring 1-3 He N S S S V V V hangs N N,N-P pictures PP P without Box(3,5) accounts for substring 3-5 N frames

Example Compute outside probabilities

Example i j X Y Z k j+1 i j X Z Y k i-1

Example He S V hangs pictures without N frames

Example Let’s improve p(VVN). The expected number of times it is used:

Example The expected number of times that V is visited: This is actually the same as:

Example In order to get the new p(VVN), we divide and get: Similarly, for p(Vhangs), we get:

The CYK algorithm Initialization: for i=1…L, v=1…M: Iteration: for i=1…L-1, j=i+1…L, v=1…M Termination: score of optimal parse tree π* for sentence x

The CYK algorithm Looks similar to the inside algorithm, but we take the maximum instead of summing (consider the forward algorithm vs. Viterbi)

Summary M: SCFG symbols, Q: HMM states, L: Data length Time SCFG algorithm HMM algorithm Goal CYK Viterbi Optimal alignment inside forward P(x|Θ) inside-outside forward-backward EM parameter estimation |Q|2L |M|3L3 |Q|2L |M|3L3 |Q|2L |M|3L3

Summary Space SCFG algorithm HMM algorithm Goal CYK Viterbi Optimal alignment inside forward P(x|Θ) inside-outside forward-backward EM parameter estimation |Q|L |M|L2 |Q|L |M|L2 |Q|L |M|L2