CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.

Slides:



Advertisements
Similar presentations
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
Advertisements

GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Apaydin slides with a several modifications and additions by Christoph Eick.
Albert Gatt Corpora and Statistical Methods Lecture 8.
INTRODUCTION TO Machine Learning 3rd Edition
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Inside-outside algorithm LING 572 Fei Xia 02/28/06.
Fall 2001 EE669: Natural Language Processing 1 Lecture 13: Probabilistic CFGs (Chapter 11 of Manning and Schutze) Wen-Hsiang Lu ( 盧文祥 ) Department of Computer.
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
More on Text Management. Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Text Models. Why? To “understand” text To assist in text search & ranking For autocompletion Part of Speech Tagging.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
Text Models Continued HMM and PCFGs. Recap So far we have discussed 2 different models for text – Bag of Words (BOW) where we introduced TF-IDF Location.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35–HMM; Forward and Backward Probabilities 19 th Oct, 2010.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27– SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya.
Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 33,34– HMM, Viterbi, 14 th Oct, 18 th Oct, 2010.
CS621: Artificial Intelligence Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay Lecture 19: Hidden Markov Models.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 6-7: Hidden Markov Model 18.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
PCFG estimation with EM The Inside-Outside Algorithm.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
MACHINE LEARNING 16. HMM. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Modeling dependencies.
1 Statistical methods in NLP Diana Trandabat
Hidden Markov Models BMI/CS 576
Combined Lecture CS621: Artificial Intelligence (lecture 19) CS626/449: Speech-NLP-Web/Topics-in-AI (lecture 20) Hidden Markov Models Pushpak Bhattacharyya.
CS : Speech, NLP and the Web/Topics in AI
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24– Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing)
CS621: Artificial Intelligence
CS621: Artificial Intelligence
CS : Language Technology for the Web/Natural Language Processing
CS : NLP, Speech and Web-Topics-in-AI
Hidden Markov Models By Manish Shrivastava.
Presentation transcript:

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th March, 2011

Formal Definition of PCFG A PCFG consists of  A set of terminals {w k }, k = 1,….,V {w k } = { child, teddy, bear, played…}  A set of non-terminals {N i }, i = 1,…,n {N i } = { NP, VP, DT…}  A designated start symbol N 1  A set of rules {N i   j }, where  j is a sequence of terminals & non-terminals NP  DT NN  A corresponding set of rule probabilities

Rule Probabilities  Rule probabilities are such that E.g., P( NP  DT NN) = 0.2 P( NP  NN) = 0.5 P( NP  NP PP) = 0.3  P( NP  DT NN) = 0.2  Means 20 % of the training data parses use the rule NP  DT NN

Probabilistic Context Free Grammars S  NP VP1.0 NP  DT NN0.5 NP  NNS0.3 NP  NP PP 0.2 PP  P NP1.0 VP  VP PP 0.6 VP  VBD NP0.4 DT  the1.0 NN  gunman0.5 NN  building0.5 VBD  sprayed 1.0 NNS  bullets1.0

Example Parse t 1` The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunman sprayed P (t 1 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = VP 0.4

Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullet s with buildingth e Thegunmansprayed NP 0.2 P (t 2 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = The gunman sprayed the building with bullets.

Probability of a sentence Notation : w ab – subsequence w a ….w b N j dominates w a ….w b or yield(N j ) = w a ….w b w a ……………..w b NjNj Where t is a parse tree of the sentence the..sweet..teddy..bear NP Probability of a sentence = P(w 1m ) If t is a parse tree for the sentence w 1m, this will be 1 !!

Assumptions of the PCFG model Place invariance : P(NP  DT NN) is same in locations 1 and 2 Context-free : P(NP  DT NN | anything outside “The child”) = P(NP  DT NN) Ancestor free : At 2, P(NP  DT NN|its ancestor is VP) = P(NP  DT NN) S NP The child VP NP The toy 1 2

Probability of a parse tree Domination :We say N j dominates from k to l, symbolized as, if W k,l is derived from N j P (tree |sentence) = P (tree | S 1,l ) where S 1,l means that the start symbol S dominates the word sequence W 1,l P (t |s) approximately equals joint probability of constituent non-terminals dominating the sentence fragments (next slide)

Probability of a parse tree (cont.) S 1,l NP 1,2 VP 3,l N2N2 V 3,3 PP 4,l P 4,4 NP 5,l w2w2 w4w4 DT 1 w1w1 w3w3 w5w5 wlwl P ( t|s ) = P (t | S 1,l ) = P ( NP 1,2, DT 1,1, w 1, N 2,2, w 2, VP 3,l, V 3,3, w 3, PP 4,l, P 4,4, w 4, NP 5,l, w 5…l | S 1,l ) = P ( NP 1,2, VP 3,l | S 1,l ) * P ( DT 1,1, N 2,2 | NP 1,2 ) * D(w 1 | DT 1,1 ) * P (w 2 | N 2,2 ) * P (V 3,3, PP 4,l | VP 3,l ) * P(w 3 | V 3,3 ) * P( P 4,4, NP 5,l | PP 4,l ) * P(w 4 |P 4,4 ) * P (w 5…l | NP 5,l ) (Using Chain Rule, Context Freeness and Ancestor Freeness )

HMM ↔ PCFG O observed sequence ↔ w 1m sentence X state sequence ↔ t parse tree  model ↔ G grammar Three fundamental questions

HMM ↔ PCFG How likely is a certain observation given the model? ↔ How likely is a sentence given the grammar? How to choose a state sequence which best explains the observations? ↔ How to choose a parse which best supports the sentence? ↔ ↔

HMM ↔ PCFG How to choose the model parameters that best explain the observed data? ↔ How to choose rule probabilities which maximize the probabilities of the observed sentences? ↔

Recap of HMM

HMM Definition Set of states: S where |S|=N Start state S 0 /*P(S 0 )=1*/ Output Alphabet: O where |O|=M Transition Probabilities: A= {a ij } /*state i to state j*/ Emission Probabilities : B= {b j (o k )} /*prob. of emitting or absorbing o k from state j*/ Initial State Probabilities: Π={p 1,p 2,p 3,…p N } Each p i =P(o 0 =ε,S i |S 0 )

Markov Processes Properties Limited Horizon: Given previous t states, a state i, is independent of preceding 0 to t- k+1 states. P(X t =i|X t-1, X t-2,… X 0 ) = P(X t =i|X t-1, X t-2 … X t-k ) Order k Markov process Time invariance: (shown for k=1) P(X t =i|X t-1 =j) = P(X 1 =i|X 0 =j) …= P(X n =i|X n-1 =j)

Three basic problems (contd.) Problem 1: Likelihood of a sequence Forward Procedure Backward Procedure Problem 2: Best state sequence Viterbi Algorithm Problem 3: Re-estimation Baum-Welch ( Forward-Backward Algorithm )

Probabilistic Inference O: Observation Sequence S: State Sequence Given O find S * where called Probabilistic Inference Infer “Hidden” from “Observed” How is this inference different from logical inference based on propositional or predicate calculus?

Essentials of Hidden Markov Model 1. Markov + Naive Bayes 2. Uses both transition and observation probability 3. Effectively makes Hidden Markov Model a Finite State Machine (FSM) with probability

Probability of Observation Sequence Without any restriction, Search space size= |S| |O|

Continuing with the Urn example Urn 1 # of Red = 30 # of Green = 50 # of Blue = 20 Urn 3 # of Red =60 # of Green =10 # of Blue = 30 Urn 2 # of Red = 10 # of Green = 40 # of Blue = 50 Colored Ball choosing

Example (contd.) U1U1 U2U2 U3U3 U1U U2U U3U Given : Observation : RRGGBRGR What is the corresponding state sequence ? and RGB U1U U2U U3U Transition Probability Observation/output Probability

Diagrammatic representation (1/2) U1U1 U2U2 U3U R, 0.6 G, 0.1 B, 0.3 R, 0.1 B, 0.5 G, 0.4 B, 0.2 R, 0.3 G, 0.5

Diagrammatic representation (2/2) U1U1 U2U2 U3U3 R,0.02 G,0.08 B,0.10 R,0.24 G,0.04 B,0.12 R,0.06 G,0.24 B,0.30 R, 0.08 G, 0.20 B, 0.12 R,0.15 G,0.25 B,0.10 R,0.18 G,0.03 B,0.09 R,0.18 G,0.03 B,0.09 R,0.02 G,0.08 B,0.10 R,0.03 G,0.05 B,0.02

Probabilistic FSM (a 1 :0.3) (a 2 :0.4) (a 1 :0.2) (a 2 :0.3) (a 1 :0.1) (a 2 :0.2) (a 1 :0.3) (a 2 :0.2) The question here is: “what is the most likely state sequence given the output sequence seen” S1S1 S2S2

Developing the tree Start S1S2S1S2S1S2S1S2S1S *0.1= *0.2= *0.4= *0.3= *0.2= € a1a1 a2a2 Choose the winning sequence per state per iteration

Tree structure contd… S1S2S1S2S1S *0.1= S S S S a1a1 a2a2 The problem being addressed by this tree is a1-a2-a1-a2 is the output sequence and μ the model or the machine

Viterbi Algorithm for the Urn problem (first two symbols) S0S0 U1U1 U2U2 U3U U1U1 U2U2 U3U U1U1 U2U2 U3U3 U1U1 U2U2 U3U * *0.036 *: winner sequences ε R

Markov process of order>1 (say 2) Same theory works P(S).P(O|S) =P(O 0 |S 0 ).P(S 1 |S 0 ). [P(O 1 |S 1 ).P(S 2 |S 1 S 0 )]. [P(O 2 |S 2 ).P(S 3 |S 2 S 1 )]. [P(O 3 |S 3 ).P(S 4 |S 3 S 2 )]. [P(O 4 |S 4 ).P(S 5 |S 4 S 3 )]. [P(O 5 |S 5 ).P(S 6 |S 5 S 4 )]. [P(O 6 |S 6 ).P(S 7 |S 6 S 5 )]. [P(O 7 |S 7 ).P(S 8 |S 7 S 6 )]. [P(O 8 |S 8 ).P(S 9 |S 8 S 7 )]. We introduce the states S 0 and S 9 as initial and final states respectively. After S 8 the next state is S 9 with probability 1, i.e., P(S 9 |S 8 S 7 )=1 O 0 is ε-transition O 0 O 1 O 2 O 3 O 4 O 5 O 6 O 7 O 8 Obs: ε RRG G B R G R State: S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9

Adjustments Transition probability table will have tuples on rows and states on columns Output probability table will remain the same In the Viterbi tree, the Markov process will take effect from the 3 rd input symbol (εRR) There will be 27 leaves, out of which only 9 will remain Sequences ending in same tuples will be compared Instead of U1, U2 and U3 U 1 U 1, U 1 U 2, U 1 U 3, U 2 U 1, U 2 U 2,U 2 U 3, U 3 U 1,U 3 U 2,U 3 U 3

Forward and Backward Probability Calculation

Forward probability F(k,i) Define F(k,i)= Probability of being in state S i having seen o 0 o 1 o 2 …o k F(k,i)=P(o 0 o 1 o 2 …o k, S i ) With m as the length of the observed sequence P(observed sequence)=P(o 0 o 1 o 2..o m ) =Σ p=0,N P(o 0 o 1 o 2..o m, S p ) =Σ p=0,N F(m, p)

Forward probability (contd.) F(k, q) = P(o 0 o 1 o 2..o k, S q ) = P(o 0 o 1 o 2..o k-1, o k,S q ) = Σ p=0,N P(o 0 o 1 o 2..o k-1, S p, o k,S q ) = Σ p=0,N P(o 0 o 1 o 2..o k-1, S p ). P(o m,S q |o 0 o 1 o 2..o k-1, S p ) = Σ p=0,N F(k-1,p). P(o k,S q |S p ) = Σ p=0,N F(k-1,p). P(S p  S q ) okok O 0 O 1 O 2 O 3 … O k O k+1 … O m-1 O m S 0 S 1 S 2 S 3 … S p S q … S m S final

Backward probability B(k,i) Define B(k,i)= Probability of seeing o k o k+1 o k+2 …o m given that the state was S i B(k,i)=P(o k o k+1 o k+2 …o m \ S i ) With m as the length of the observed sequence P(observed sequence)=P(o 0 o 1 o 2..o m ) = P(o 0 o 1 o 2..o m | S 0 ) =B(0,0)

Backward probability (contd.) B(k, p) = P(o k o k+1 o k+2 …o m \ S p ) = P(o k+1 o k+2 …o m, o k |S p ) = Σ q=0,N P(o k+1 o k+2 …o m, o k, S q |S p ) = Σ q=0,N P(o k,S q |S p ) P(o k+1 o k+2 …o m |o k,S q,S p ) = Σ q=0,N P(o k+1 o k+2 …o m |S q ). P(o k, S q |S p ) = Σ q=0,N B(k+1,q). P(S p  S q ) okok O 0 O 1 O 2 O 3 … O k O k+1 … O m-1 O m S 0 S 1 S 2 S 3 … S p S q … S m S final

Back to PCFG

Interesting Probabilities The gunman sprayed the building with bullets N1N1 NP What is the probability of having a NP at this position such that it will derive “the building” ? - What is the probability of starting from N 1 and deriving “The gunman sprayed”, a NP and “with bullets” ? - Inside Probabilities Outside Probabilities

Interesting Probabilities Random variables to be considered The non-terminal being expanded. E.g., NP The word-span covered by the non-terminal. E.g., (4,5) refers to words “the building” While calculating probabilities, consider: The rule to be used for expansion : E.g., NP  DT NN The probabilities associated with the RHS non- terminals : E.g., DT subtree’s inside/outside probabilities & NN subtree’s inside/outside probabilities

Outside Probability  j (p,q) : The probability of beginning with N 1 & generating the non-terminal N j pq and all words outside w p..w q w 1 ………w p-1 w p …w q w q+1 ……… w m N1N1 NjNj

Inside Probabilities  j (p,q) : The probability of generating the words w p..w q starting with the non-terminal N j pq. w 1 ………w p-1 w p …w q w q+1 ……… w m   N1N1 NjNj

Outside & Inside Probabilities: example The gunman sprayed the building with bullets N1N1 NP

Calculating Inside probabilities  j (p,q) Base case: Base case is used for rules which derive the words or terminals directly E.g., Suppose N j = NN is being considered & NN  building is one of the rules with probability 0.5

Induction Step: Assuming Grammar in Chomsky Normal Form Induction step : wpwp NjNj NrNr NsNs wdwd w d+1 wqwq Consider different splits of the words - indicated by d E.g., the huge building Consider different non-terminals to be used in the rule: NP  DT NN, NP  DT NNS are available options Consider summation over all these. Split here for d=2 d=3

The Bottom-Up Approach The idea of induction Consider “the gunman” Base cases : Apply unary rules DT  the Prob = 1.0 NN  gunmanProb = 0.5 Induction : Prob that a NP covers these 2 words = P (NP  DT NN) * P (DT deriving the word “the”) * P (NN deriving the word “gunman”) = 0.5 * 1.0 * 0.5 = 0.25 The gunman NP 0.5 DT 1.0 NN 0.5

Parse Triangle A parse triangle is constructed for calculating  j (p,q) Probability of a sentence using  j (p,q):

Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) Fill diagonals with

Parse Triangle The (1) gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7) Calculate using induction formula

Example Parse t 1 S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullet s with buildingth e Thegunman sprayed VP 0.4 Rule used here is VP  VP PP The gunman sprayed the building with bullets.

Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullet s with buildingth e Thegunmansprayed NP 0.2 Rule used here is VP  VBD NP The gunman sprayed the building with bullets.

Parse Triangle The (1)gunman (2) sprayed (3) the (4) building (5) with (6) bullets (7)

Different Parses Consider Different splitting points : E.g., 5th and 3 rd position Using different rules for VP expansion : E.g., VP  VP PP, VP  VBD NP Different parses for the VP “sprayed the building with bullets” can be constructed this way.

Outside Probabilities  j (p,q) Base case: Inductive step for calculating : wpwp N f pe N j pq N g (q+1) e wqwq w q+1 wewe w p-1 w1w1 wmwm w e+1 N1N1 Summation over f, g & e

Probability of a Sentence Joint probability of a sentence w 1m and that there is a constituent spanning words w p to w q is given as: The gunman sprayed the building with bullets N1N1 NP