Statistical NLP Winter 2009 Lecture 7: Grammar formalisms: the tools of mathematical linguistics (weighted) finite-state automata and (weighted) context-free grammars Roger Levy
Language structure so far So far in class, we haven’t dealt much with structured representations of language A document consists of a sequence of sentences A sentence consists of a sequence of words We haven’t looked at anything in between, or farther down But there’s lots more structure in language! Words are comprised of morphemes Words are grouped into syntactic categories (parts of speech) Words combine into phrases Today we’ll talk about formal means for describing and computing these structures
Regular expressions You’ve almost certainly worked with grep before grep takes a regular expression Regular expressions can be quite rich
Finite state automata (FSAs) A Finite State Automaton (FSA) is defined as: A finite set Q of states q0…qN, with q0 the start state A finite input alphabet Σ of symbols A set of final states F in Q A transition function δ(q,i) mapping from Q×Σ to Q An FSA accepts a string s if recursive application of δ leads to a final state Most accessibly represented in a graphical format Q={q0,q1} Σ={a,b} F={q1} δ={(q0,a)=q1,(q1,b)=q1}
Regular expressions and FSAs For every regular expression R, there is an FSA that accepts exactly those strings in R, and vice versa Example: ([sp]end(d|ds|ding|t))|(ship(s|ped|ping)?) However, in general there are many FSAs and regexs accepting the same set of strings.
Intersection FSAs are closed under intersection + =
The Chomsky Hierarchy Finite languages are uninteresting Regular languages: FSAs. There are richer classes! Finite languages Regular languages Context-free languages Context-sensitive languages Type 0 languages
Adding weights to FSAs FSAs can also have weights associated with their transition function A Weighted Finite State Automaton (FSA) is defined as: A finite set Q of states q0…qN, with q0 the start state A finite input alphabet Σ of symbols A set of final states F in Q A semiring R A transition function δ(q,i) mapping from Q×Σ to Q×R These weights can have many interpretations A common one is “cost” (log-probability)
Probabilistic Linguistic Knowledge A generative probabilistic grammar determines beliefs about which strings are likely to be seen Probabilistic Context-Free Grammars (PCFGs; Booth, 1969) Probabilistic Minimalist Grammars (Hale, 2006) Probabilistic Finite-State Grammars (Mohri, 1997; Crocker & Brants 2000) In position 1, {a,b,c,d} equally likely; but in position 2: {a,b} are usually followed by e, occasionally by f {c,d} are usually followed by f, occasionally by e Cost (Log-probability) Input symbol
Probabilistic intersection Bayes’s rule says that posterior = evidence * prior In log space * becomes +
Intersecting weighted FSAs Bayes’ Rule says that the evidence and the prior should be combined (multiplied) For probabilistic grammars, this combination is the formal operation of intersection (see also Hale, 2006) grammar + input This is input1_1 combined with grammar Grammar affects beliefs about the future = BELIEF
{b,c} {f,e} {b,c} {?}