Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.

Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers generally maximize probabilities for tag sequences for sentences.

Bigram stochastic tagger This kind of tagger “…chooses tag t i for word w i that is most probable given the previous tag t i-1 and the current word w i : t i = argmax j P(t j | t i-1, w i ) (8.2)” [page 303] Bayes law says: P(T|W) = P(T)P(W|T)/P(W) P(t j | t i-1, w i ) = P(t j ) P(t i-1, w i | t j ) / P(t i-1, w i ) Since we take the argmax of this over the t i s, result is the same as using: P(t j | t i-1, w i ) = P(t j ) P(t i-1, w i | t j ) Rewriting: t i = argmax j P(t j | t i-1 )P(w i | t j )

Example (page 304) What tag to we assign to race? – to/TO race/?? – the/DT race/?? If we are choosing between NN and VB as tags for race, the equations are: – P(VB|TO)P(race|VB) – P(NN|TO)P(race|NN) Tagger will choose tag for ‘race’ which maximizes the probability

English syntax What are some properties of English syntax we might want our formalism to capture? This depends on our goal: – processing written or spoken language? – modeling human behavior or not? Context-free grammar formalism

Things a grammar should capture As we have mentioned repeatedly, human language is an amazingly complex system of communication. Some properties of language which a (computational) grammar should reflect include: – Constituency – Agreement – Subcategorization / selectional restrictions

Constituency Phrases are syntactic equivalence classes: – they can appear in the same contexts – they are not semantic equivalence classes: they can clearly mean different things Ex (noun phrases) – Clifford the big red dog – the man from the city – a lovable little kitten

Constituency tests Can appear before a verb: – a lovable little kitten eats food – the man from the city arrived yesterday Other arbitrary word groupings cannot: – * from the arrived yesterday

More tests of constituency They also function as a unit with respect to syntactic processes: – On September seventeenth, I’d like to fly from Atlanta to Denver. – I’d like to fly on September seventeenth from Atlanta to Denver. – I’d like to fly from Atlanta to Denver on September seventeenth. Other groupings of words don’t behave the same: – * On September, I’d like to fly seventeenth from Atlanta to Denver. – * On I’d like to fly September seventeenth from Atlanta to Denver. – * I’d like to fly on September from Atlanta to Denver seventeenth. – * I’d like to fly on from Atlanta to Denver September seventeenth.

Agreement English has subject-verb agreement: – The cats chase that dog all day long. – * The cats chases that dog all day long. – The dog is chased by the cats all day long. – * The dog are chased by the cats all day long. Many languages exhibit much more agreement than English.

Subcategorization Verbs (predicates) require arguments of different types: – The mirage disappears daily. – NPI prefer ice cream. – NP PPI leave Boston in the morning. – NP NPI gave Mary a ticket. – PPI leave on Thursday.

Alternations want can take either an NP and an infinitival VP: – I want a flight … – I want to fly … find cannot take an infinitival VP: – I found a flight … – * I found to fly …

How can we encode rules of language? There are many grammar formalisms. Most are variations on context-free grammars. Context-free grammars are of interest because they – have well-known properties (e.g. can be parsed in polynomial time) – can capture many aspects of language

Basic context-free grammar formalism A CFG is a 4-tuple (N, ,P,S) where – N is a set of non-terminal symbols –  is a set of terminal symbols – P is a set of productions, P  N X (   N)* – S is a start symbol and   N =  Each production is of the form A  , where A is a non-terminal and  is drawn from (   N)*

Problems with basic formalism Consider a grammar rule like S  Aux NP VP To handle agreement between subject and verb, we could replace that rule with two new ones: S  3SgAux 3SgNP VP S  Non3SgAux Non3SgNP VP Need rules like the following too: 3SgAux  does | has | can | … Non3SgAux  do | have | can | …

Extensions to formalism Feature structures and unification – feature structures are of the form [ f 1 =v 1, f 2 =v 2, …, f n =v n ] – feature structures can be partially specified: (a) [ Number = Sg, Person = 3, Category = NP ] (b) [ Number = Sg, Category = NP ] (c) [ Person = 3, Category = NP ] – (b) unified with (c) is (a) Feature structures can be used to express feature- value constraints across constituents without rule multiplication.

Other formalisms More powerful: tree adjoining grammars – trees, not rules, are fundamental – trees are either initial or auxiliary – two operations: substitution and adjunction Less powerful: finite-state grammars – cannot handle general recursion – can be sufficient to handle real-world data – recursion spelled out explicitly to some level (large grammar)

Homework (not for credit) Anonymous mid-semester course evaluation – Print up (no handwriting) a single page with three things you like about the course (and why), and three things you dislike about the course (and why). – Constructive feedback is appreciated. – Put in my departmental mailbox today or tomorrow.

Next week I am out of town for a conference. Mike will teach: – Lisp – Parsing with context-free grammars

Reminder Make sure you are thinking about your semester project! – form teams – discuss ideas

Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.

Similar presentations

Presentation on theme: "Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.

Similar presentations

Presentation on theme: "Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers."— Presentation transcript:

Similar presentations

About project

Feedback