General Information on Context-free and Probabilistic Context-free Grammars İbrahim Hoça CENG784, Fall 2013
Outline Basics of Context-free Grammars (CFGs) Tree Structure Convenience of Tree Structures Natural Language Examples Probabilistic Context-free Grammars (PCFGs) PCFG Rules Computing the Probabilities in PCFGs Some Aspects of PCFGs
Basics of Context-free Grammars A CFG is a quadruple (V, Σ, P, S) where -V is a finite set of variables, -Σ (the alphabet) is a finite set of terminal symbols, -P is a finite set of rules, and -S is a distinguished element of V called the start symbol.
Basics of Context-free Grammars -A rule is an element of the set V x (VU)* -The rule [A, w] is written as A w. -Lambda (null) rules are also possible: A λ -The rules are written using the shorthand A u|v to abbreviate A u and A v, the vertical bar being ‘or’.
Sample Derivations G =(V, Σ, P, S) V = {S,A} Σ = {a,b} P: S AA A AAA | bA | Ab | a S => AA => aA => aAAA => abAAA => abaAA => ababAA => ababaA => ababaa (a) S => AA => AAAA => aAAA => abAAA => abaAA => ababAA => ababaA => ababaa (b) S => AA => Aa => AAAa => AAbAa => AAbaa => AbAbaa => Ababaa => ababaa (c) S => AA => aA => aAAA => aAAa => abAAa => abAbAa => ababAa => ababaa (d)
Tree Structure Trees corresponding to the derivations in the previous slide.
Implementing CFG on Natural Language Let’s consider the following sentence: ‘nice dogs like cats’ Rules: S NP VP NP Adj N NP N VP V NP N dogs | cats V like Adj nice Tree:
Convenience of Tree Structures -Natural language have a recursive structure. -Tree structures, hence CFGs, allow us to extend the context according to the properties of the relevant head nodes rather than limiting it with an arbitrary amount of adjacent words.
Convenience of Tree Structures Consider the verb agreement in the following construction: ‘Velocity of the seismic waves rises to …’ bigram:trigram: P(rises|waves)P(rises|seismic waves) quadrigram: P(rises|the seismic waves)
Convenience of Tree Structures -The verb ‘rises’ is apparently modified by a singular noun, which is ‘velocity’ in this case. -CFG allows us to capture this relationship between non-adjacent words.
Probabilistic Context-free Grammars (PCFG) -PCFG is simply a CFG with probabilities added to the rules, indicating how likely different rewritings are.
PCFG A PCFG G consists of: A set of terminals: {w k }, k = 1, …,V A set of non-terminals: {N i }, i = 1,…n A designated start symbol: N 1 A set of rules: {N i ζ j }, (where ζ j is the sequence of terminals and non-terminals) A corresponding set of probabilities on rules such that:
PCFG Rules S NP VP1.0 NP NP PP0.4 PP P NP1.0 VP V NP0.7 NP astronomers0.1 NP ears0.18 NP saw0.04 NP stars0.18 NP telescopes0.1 V saw1.0 P with1.0 Note that the NP rules are chosen to make the rules comply with the Chomsky Normal Form, which basically allows: A B C A w A λ Where A, B, and C are non-terminals, and w is a terminal symbol.
Computing the Probabilities in PCFG where t is the parse tree and w 1m is the sentence from w 1 to w m. -This formula gives us the total probability of a sentence. -Probability of each tree is found by multiplication of the probabilities of the rules that created the tree.
Computing the Probabilities in PCFG P(t 1 ) = 1.0 × 0.1 × 0.7 × 1.0 × 0.4 × 0.18 × 1.0 × 1.0 × 0.18 =
Computing the Probabilities in PCFG P(t2) = 1.0 × 0.1 × 0.3 × 0.7 × 1.0 × 0.18 × 1.0 × 1.0 × 0.18 =
Computing the Probabilities in PCFG P(w 15 ) = P(t 1 ) + P(t 2 ) =
Some Aspects of PCFG + As grammars expand to cover a large and diverse corpus of text, they become increasingly ambiguous. A PCFG gives some idea of plausibility of different parses. - Nevertheless, a PCFG does not offer a very good idea of plausibility in itself, since its probability estimates are based purely on structural factors, and do not include lexical co- occurrence.
Some Aspects of PCFG + Real text tends to have grammatical mistakes, disfluencies and errors. This problem can be avoided to some extent with a PCFG by ruling out nothing excluded by the grammar but instead, by giving implausible sentences a low probability. - In a PCFG, the probability of a smaller tree is greater than a larger tree. For instance, the most frequent length for Wall Street Journal sentences is around 23 words. A PCFG gives too much of the probability mass to very short sentences.