Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.

Similar presentations


Presentation on theme: "Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers."— Presentation transcript:

1 Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers generally maximize probabilities for tag sequences for sentences.

2 Bigram stochastic tagger This kind of tagger “…chooses tag t i for word w i that is most probable given the previous tag t i-1 and the current word w i : t i = argmax j P(t j | t i-1, w i ) (8.2)” [page 303] Bayes law says: P(T|W) = P(T)P(W|T)/P(W) P(t j | t i-1, w i ) = P(t j ) P(t i-1, w i | t j ) / P(t i-1, w i ) Since we take the argmax of this over the t i s, result is the same as using: P(t j | t i-1, w i ) = P(t j ) P(t i-1, w i | t j ) Rewriting: t i = argmax j P(t j | t i-1 )P(w i | t j )

3 Example (page 304) What tag to we assign to race? – to/TO race/?? – the/DT race/?? If we are choosing between NN and VB as tags for race, the equations are: – P(VB|TO)P(race|VB) – P(NN|TO)P(race|NN) Tagger will choose tag for ‘race’ which maximizes the probability

4 Example For first part – look at tag sequence probability: – P(NN|TO) = 0.021 – P(VB|TO) = 0.34 For second part – look at lexical likelihood: – P(race|NN) = 0.00041 – P(race|VB) = 0.00003 Combining these: – P(VB|TO)P(race|VB) = 0.00001 – P(NN|TO)P(race|NN) = 0.000007

5 English syntax What are some properties of English syntax we might want our formalism to capture? This depends on our goal: – processing written or spoken language? – modeling human behavior or not? Context-free grammar formalism

6 Things a grammar should capture As we have mentioned repeatedly, human language is an amazingly complex system of communication. Some properties of language which a (computational) grammar should reflect include: – Constituency – Agreement – Subcategorization / selectional restrictions

7 Constituency Phrases are syntactic equivalence classes: – they can appear in the same contexts – they are not semantic equivalence classes: they can clearly mean different things Ex (noun phrases) – Clifford the big red dog – the man from the city – a lovable little kitten

8 Constituency tests Can appear before a verb: – a lovable little kitten eats food – the man from the city arrived yesterday Other arbitrary word groupings cannot: – * from the arrived yesterday

9 More tests of constituency They also function as a unit with respect to syntactic processes: – On September seventeenth, I’d like to fly from Atlanta to Denver. – I’d like to fly on September seventeenth from Atlanta to Denver. – I’d like to fly from Atlanta to Denver on September seventeenth. Other groupings of words don’t behave the same: – * On September, I’d like to fly seventeenth from Atlanta to Denver. – * On I’d like to fly September seventeenth from Atlanta to Denver. – * I’d like to fly on September from Atlanta to Denver seventeenth. – * I’d like to fly on from Atlanta to Denver September seventeenth.

10 Agreement English has subject-verb agreement: – The cats chase that dog all day long. – * The cats chases that dog all day long. – The dog is chased by the cats all day long. – * The dog are chased by the cats all day long. Many languages exhibit much more agreement than English.

11 Subcategorization Verbs (predicates) require arguments of different types: – The mirage disappears daily. – NPI prefer ice cream. – NP PPI leave Boston in the morning. – NP NPI gave Mary a ticket. – PPI leave on Thursday.

12 Alternations want can take either an NP and an infinitival VP: – I want a flight … – I want to fly … find cannot take an infinitival VP: – I found a flight … – * I found to fly …

13 How can we encode rules of language? There are many grammar formalisms. Most are variations on context-free grammars. Context-free grammars are of interest because they – have well-known properties (e.g. can be parsed in polynomial time) – can capture many aspects of language

14 Basic context-free grammar formalism A CFG is a 4-tuple (N, ,P,S) where – N is a set of non-terminal symbols –  is a set of terminal symbols – P is a set of productions, P  N X (   N)* – S is a start symbol and   N =  Each production is of the form A  , where A is a non-terminal and  is drawn from (   N)*

15 Problems with basic formalism Consider a grammar rule like S  Aux NP VP To handle agreement between subject and verb, we could replace that rule with two new ones: S  3SgAux 3SgNP VP S  Non3SgAux Non3SgNP VP Need rules like the following too: 3SgAux  does | has | can | … Non3SgAux  do | have | can | …

16 Extensions to formalism Feature structures and unification – feature structures are of the form [ f 1 =v 1, f 2 =v 2, …, f n =v n ] – feature structures can be partially specified: (a) [ Number = Sg, Person = 3, Category = NP ] (b) [ Number = Sg, Category = NP ] (c) [ Person = 3, Category = NP ] – (b) unified with (c) is (a) Feature structures can be used to express feature- value constraints across constituents without rule multiplication.

17 Other formalisms More powerful: tree adjoining grammars – trees, not rules, are fundamental – trees are either initial or auxiliary – two operations: substitution and adjunction Less powerful: finite-state grammars – cannot handle general recursion – can be sufficient to handle real-world data – recursion spelled out explicitly to some level (large grammar)

18 Homework (not for credit) Anonymous mid-semester course evaluation – Print up (no handwriting) a single page with three things you like about the course (and why), and three things you dislike about the course (and why). – Constructive feedback is appreciated. – Put in my departmental mailbox today or tomorrow.

19 Next week I am out of town for a conference. Mike will teach: – Lisp – Parsing with context-free grammars

20 Reminder Make sure you are thinking about your semester project! – form teams – discuss ideas


Download ppt "Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers."

Similar presentations


Ads by Google