Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical NLP Winter 2009

Similar presentations


Presentation on theme: "Statistical NLP Winter 2009"— Presentation transcript:

1 Statistical NLP Winter 2009
Lecture 7: Grammar formalisms: the tools of mathematical linguistics (weighted) finite-state automata and (weighted) context-free grammars Roger Levy

2 Language structure so far
So far in class, we haven’t dealt much with structured representations of language A document consists of a sequence of sentences A sentence consists of a sequence of words We haven’t looked at anything in between, or farther down But there’s lots more structure in language! Words are comprised of morphemes Words are grouped into syntactic categories (parts of speech) Words combine into phrases Today we’ll talk about formal means for describing and computing these structures

3 Regular expressions You’ve almost certainly worked with grep before
grep takes a regular expression Regular expressions can be quite rich

4 Finite state automata (FSAs)
A Finite State Automaton (FSA) is defined as: A finite set Q of states q0…qN, with q0 the start state A finite input alphabet Σ of symbols A set of final states F in Q A transition function δ(q,i) mapping from Q×Σ to Q An FSA accepts a string s if recursive application of δ leads to a final state Most accessibly represented in a graphical format Q={q0,q1} Σ={a,b} F={q1} δ={(q0,a)=q1,(q1,b)=q1}

5 Regular expressions and FSAs
For every regular expression R, there is an FSA that accepts exactly those strings in R, and vice versa Example: ([sp]end(d|ds|ding|t))|(ship(s|ped|ping)?) However, in general there are many FSAs and regexs accepting the same set of strings.

6 Intersection FSAs are closed under intersection + =

7 The Chomsky Hierarchy Finite languages are uninteresting
Regular languages: FSAs. There are richer classes! Finite languages Regular languages Context-free languages Context-sensitive languages Type 0 languages

8 Adding weights to FSAs FSAs can also have weights associated with their transition function A Weighted Finite State Automaton (FSA) is defined as: A finite set Q of states q0…qN, with q0 the start state A finite input alphabet Σ of symbols A set of final states F in Q A semiring R A transition function δ(q,i) mapping from Q×Σ to Q×R These weights can have many interpretations A common one is “cost” (log-probability)

9 Probabilistic Linguistic Knowledge
A generative probabilistic grammar determines beliefs about which strings are likely to be seen Probabilistic Context-Free Grammars (PCFGs; Booth, 1969) Probabilistic Minimalist Grammars (Hale, 2006) Probabilistic Finite-State Grammars (Mohri, 1997; Crocker & Brants 2000) In position 1, {a,b,c,d} equally likely; but in position 2: {a,b} are usually followed by e, occasionally by f {c,d} are usually followed by f, occasionally by e Cost (Log-probability) Input symbol

10 Probabilistic intersection
Bayes’s rule says that posterior = evidence * prior In log space * becomes +

11 Intersecting weighted FSAs
Bayes’ Rule says that the evidence and the prior should be combined (multiplied) For probabilistic grammars, this combination is the formal operation of intersection (see also Hale, 2006) grammar + input This is input1_1 combined with grammar Grammar affects beliefs about the future = BELIEF

12 {b,c} {f,e} {b,c} {?}

13


Download ppt "Statistical NLP Winter 2009"

Similar presentations


Ads by Google