December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)

December 2011CSA3202: PCFGs2 Handling Ambiguities The Earley Algorithm is equipped to represent ambiguities efficiently but not to resolve them. Methods available for resolving ambiguities include: –Semantics (choose parse that makes sense). –Statistics: (choose parse that is most likely). Probabilistic context-free grammars (PCFGs) offer a solution.

December 2011CSA3202: PCFGs3 PCFG A PCFG is a 5-tuple (NT,T,P,S,D) where D is a function that assigns probabilities to each rule p  P A PCFG augments each rule with a conditional probability. A  [p] Formally this is the probability of a given expansion given LHS non-terminal.

December 2011CSA3202: PCFGs4 Example PCFG

December 2011CSA3202: PCFGs5 A  [p] p expresses the probability that the non- terminal A is expanded as  A   A   A  

December 2011CSA3202: PCFGs6 Example PCFG Fragment S  NP VP [.80] S  Aux NP VP [.15] S  VP [.05] Sum of conditional probabilities for a given A  NT = 1 PCFG can be used to estimate probabilities of each parse-tree for sentence S.

December 2011CSA3202: PCFGs7 Probability of a Parse Tree For sentence S, the probability assigned by a PCFG to a parse tree T is given by  P(r(n)) i.e. the product of the probabilities of all the rules r used to expand each node n in T.

December 2011CSA3202: PCFGs8 2 Parses for “book the dinner flight”

December 2011CSA3202: PCFGs9 Book the dinner flight T1 2.2 x 10 -6 T2 6.1 x 10 -7

December 2011CSA3202: PCFGs10 Ambiguous Sentence For ambiguous sentence S P(S) is the sum of the probabilities of each reading i.e. P(S (a) ) = 1.5 x 10 -6 P(S (b) ) = 1.7 x 10 -6 P(S) = 3.2 x 10 -6

December 2011CSA3202: PCFGs11 The Parsing Problem for PCFGs The parsing problem for PCFGs is to produce the most likely parse for a given sentence, i.e. to compute the T spanning the sentence whose probability is maximal. CYK algorithm assumes that grammar is in Chomsky Normal Form: –No  productions –Rules of the form A  B C or A  a

December 2011CSA3202: PCFGs12 CKY Algorithm – Base Case Base case: covering input strings with of length 1 (i.e. individual words). In CNF, probability p has to come from that of corresponding rule A  w [p]

December 2011CSA3202: PCFGs13 CKY Algorithm Recursive Case Recursive case: input strings of length > 1: A  * w ij if and only if there is a rule A  B C and some 1 ≤ k ≤ j such that B derives w ik and C derives w kj In this case P(w ij ) is obtained by multiplying together P(w ik ) and P(w jk ). These probabilities in other parts of table Take max value

December 2011CSA3202: PCFGs14 Probabilistic CKY Algorithm for Sentence of Length n 1.for k := 1 to n do 2. π [k-1,k,A] := P(A → w k ) 3. for i := k-2 downto 0 do 4. for j := k-1 downto i+1 do 5. π [i,j,A] := max [ π [i,j,B] * π [j,k,C] * P(A → BC) ] for each A → BC  G 6.return max[ π (1,n,S)]

December 2011CSA3202: PCFGs15  w  w  w  w   i  k i+1  k-1   j

December 2011CSA3202: PCFGs16 Probabilistic CKY Table

December 2011CSA3202: PCFGs17 Problems with PCFGs Fundamental Independence Assumptions: A CFG assumes that the expansion of any one non-terminal is independent of the expansion of any other non-terminal. Hence rule probabilities are multiplied together.

December 2011CSA3202: PCFGs18 Fundamental Independence Assumptions P(N k,l → γ | anything above N k,l in the tree) = P(N k,l → γ ) P(N k,l → γ | anything outside k through l) = P(N k,l → γ ) N k,l wkwk wlwl

December 2011CSA3202: PCFGs19 Evidence Against Independence Assumptions By examination of text corpora, it has been shown (Kuno 1972) that there is a strong tendency (c. 2:1) in English and other languages for subject of a sentence to be pronoun: –She's able to take her baby to work versus –Joanna worked until she had a family whilst the object tends to be a non-pronoun –All the people signed the confessions –Some laws prohibit it

December 2011CSA3202: PCFGs20 Dependencies cannot be stated These dependencies could be captured if it were possible to say that the probabilities associated with, e.g. NP → Pronoun or NP → Det Noun depend on whether NP is subject or object However, this cannot normally be said in a standard PCFG.

December 2011CSA3202: PCFGs21 Lack of Sensitivity to the properties of individual Words Lexical information can play an important role in selecting between alternative interpretations. Consider sentence: "Moscow sent soldiers into Afghanistan." NP → NP PP VP → VP PP These give rise to 2 parse trees

December 2011CSA3202: PCFGs22 PP Attachment Ambiguity N V N P N Moscow sent soldiers into Afghanistan VP NP NP NP VP VP PP S NP VP VP NP S N V N P N Moscow sent soldiers into Afghanistan NP PP 67% of PPs attach to NPs33% of PPs attach to VPs

December 2011CSA3202: PCFGs23 Lexical Properties In this case the raw statistics are misleading and yield the wrong conclusion. The correct parse should be decided on the basis of the lexical properties of the verb "send into" alone, since we know that the basic pattern for this verb is (NP) send (NP) (PP into ) where the PP into attaches to the VP.

December 2011CSA3202: PCFGs24 Lexicalised PCFGs Basic idea: each syntactic constituent is associated with a head which is a single word. Each non-terminal in a parse tree is annotated with that single word.

December 2011CSA3202: PCFGs25 Lexicalised Tree

December 2011CSA3202: PCFGs26 Generating Lexicalised Parse Trees To generate such a tree, each rule must identify exactly one right hand side constituent to be the head daughter. Then the headword of a node is inherited from the headword of the head daughter. In the case of a lexical item, the head is clearly itself (though the word might undergo minor inflectional modification).

December 2011CSA3202: PCFGs27 Finding the Head Constituent In some cases this is very easy, e.g. NP[N] → Det N VP[V] → V NP In other cases it isn't water meter adjustment lever Modern linguistic theories include a component that define what heads are.

December 2011CSA3202: PCFGs28 Collins (1999) To find the head of an NP If it is tagged "possessive" then return the last word (e.g. John's mother's car) Else search left to right for the first child that is a lexical noun Else search left to right for the first NP

December 2011CSA3202: PCFGs29 Discovery of Probabilities Normal Rules Use a corpus of already parsed sentences. Example: Penn Treebank (Marcus et al 1993) – parse trees for 1M word Brown Corpus. P(α→β|α) = C(α→β)/ΣγC(α→γ) = C(α→β)/C(α) Parse corpus and take statistics. Has to account for ambiguity.

December 2011CSA3202: PCFGs30 Discovery of Probabilities Lexicalised Rules Need to establish probabilities of, e.g. VP(dumped) → V(dumped) NP(sacks) PP(into) VP(dumped) → V(dumped) NP(cats) PP(into) VP(dumped) → V(dumped) NP(hats) PP(into) VP(dumped) → V(dumped) NP(sacks) PP(above) Problem – no corpus big enough to train with this number of rules

December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)

Similar presentations

Presentation on theme: "December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)

Similar presentations

Presentation on theme: "December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)"— Presentation transcript:

Similar presentations

About project

Feedback