Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36,37–Part of Speech Tagging and HMM 21 st and 25 th Oct, 2010 (forward,

Similar presentations


Presentation on theme: "CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36,37–Part of Speech Tagging and HMM 21 st and 25 th Oct, 2010 (forward,"— Presentation transcript:

1 CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36,37–Part of Speech Tagging and HMM 21 st and 25 th Oct, 2010 (forward, backward computation and Baum Welch Algorithm will be done later)

2 Part of Speech Tagging POS Tagging is a process that attaches each word in a sentence with a suitable grammar tag (noun, verb etc.) from a given set of tags. The set of tags is called the Tag-set. Standard Tag-set : Penn Treebank (for English).

3 POS: A kind of sequence labeling task Other such tasks Marking tags on genomic sequences Training for predicting protein structure: labels are primary (P), secondery (S), tertiary (T) Named entity labels Washington_PLACE voted Washington_PERSON to power पूजा _PERS ने पूजा के लिया फूल ख़रीदा (Puja bought flowers for worshipping) Shallow parsing (noun phrase marking) The_B little_I boy_I sprained his_B ring_I finger_I.

4 POS Tags NN – Noun; e.g. Dog_NN VM – Main Verb; e.g. Run_VM VAUX – Auxiliary Verb; e.g. Is_VAUX JJ – Adjective; e.g. Red_JJ PRP – Pronoun; e.g. You_PRP NNP – Proper Noun; e.g. John_NNP etc.

5 POS Tag Ambiguity In English: I bank 1 with the bank 2 on the river bank 3. Bank 1 is verb, the other two banks are noun {Aside- generator of humour (incongruity theory)}: A man returns to his parked car and finds the sticker “Parking fine”. He goes and thaks the policeman for appreiating his parking skill. ‏ fine_adverb vs. fine_noun

6 For Hindi Rama achhaa gaata hai. (hai is VAUX : Auxiliary verb)‏; Ram sings well Rama achha ladakaa hai. (hai is VCOP : Copula verb)‏; Ram is a good boy

7 Process List all possible tag for each word in sentence. Choose best suitable tag sequence.

8 Example ”People jump high”. People : Noun/Verb jump : Noun/Verb high : Noun/Verb/Adjective We can start with probabilities.

9

10 Challenge of POS tagging Example from Indian Language

11 Tagging of jo, vaha, kaun and their inflected forms in Hindi and their equivalents in multiple languages

12 DEM and PRON labels Jo_DEM ladakaa kal aayaa thaa, vaha cricket acchhaa khel letaa hai Jo_PRON kal aayaa thaa, vaha cricket acchhaa khel letaa hai

13 Disambiguation rule-1 If Jo is followed by noun Then DEM Else …

14 False Negative When there is arbitrary amount of text between the jo and the noun Jo_??? bhaagtaa huaa, haftaa huaa, rotaa huaa, chennai academy a koching lenevaalaa ladakaa kal aayaa thaa, vaha cricket acchhaa khel letaa hai

15 False Positive Jo_DEM (wrong!) duniyadarii samajhkar chaltaa hai, … Jo_DEM/PRON? manushya manushyoM ke biich ristoM naatoM ko samajhkar chaltaa hai, … (ambiguous)

16 False Positive for Bengali Je_DEM (wrong!) bhaalobaasaa paay, sei bhaalobaasaa dite paare (one who gets love can give love) Je_DEM (right!) bhaalobaasa tumi kalpanaa korchho, taa e jagat e sambhab nay (the love that you are image exits, is impossible in this world)

17 Will fail In the similar situation for Jis, jin, vaha, us, un All these forms add to corpus count

18 Disambiguation rule-2 If Jo is oblique (attached with ne, ko, se etc. attached) Then It is PRON Else

19 Will fail (false positive) In case of languages that demand agreement between jo-form and the noun it qualifies E.g. Sanskrit Yasya_PRON (wrong!) baalakasya aananam drshtyaa… (jis ladake kaa muha dekhkar) Yasya_PRON (wrong!) kamaniyasya baalakasya aananam drshtyaa…

20 Will also fail for Rules that depend on the whether the noun following jo/vaha/kaun or its form is oblique or not Because the case marker can be far from the noun ladakii jise piliya kii bimaarii ho gayiii thii ko … Needs discussions across languages

21 Remark on DEM and PRON DEM vs. PRON cannot be disambiguated IN GENERAL At the level of the POS tagger i.e. Cannot assume parsing Cannot assume semantics

22 Mathematics of POS tagging

23 Derivation of POS tagging formula Best tag sequence = T* = argmax P(T|W) = argmax P(T)P(W|T)(by Baye’s Theorem) P(T) = P(t 0 =^ t 1 t 2 … t n+1 =.) = P(t 0 )P(t 1 |t 0 )P(t 2 |t 1 t 0 )P(t 3 |t 2 t 1 t 0 ) … P(t n |t n-1 t n-2 …t 0 )P(t n+1 |t n t n-1 …t 0 ) = P(t 0 )P(t 1 |t 0 )P(t 2 |t 1 ) … P(t n |t n-1 )P(t n+1 |t n ) = P(t i |t i-1 )Bigram Assumption ∏ N+1 i = 0

24 Lexical Probability Assumption P(W|T) = P(w 0 |t 0 -t n+1 )P(w 1 |w 0 t 0 -t n+1 )P(w 2 |w 1 w 0 t 0 -t n+1 ) … P(w n |w 0 -w n-1 t 0 -t n+1 )P(w n+1 |w 0 -w n t 0 -t n+1 ) Assumption: A word is determined completely by its tag. This is inspired by speech recognition = P(w o |t o )P(w 1 |t 1 ) … P(w n+1 |t n+1 ) = P(w i |t i ) = P(w i |t i )(Lexical Probability Assumption) ∏ n+1 i = 0 ∏ n+1 i = 1

25 Generative Model ^_^People_NJump_VHigh_R._. ^N V V N N A N. Lexical Probabilities Bigram Probabilities This model is called Generative model. Here words are observed from tags as states. This is similar to HMM. AA

26 Parts of Speech Tags (Simplified situation) Noun (N)– boy Verb (V)– sing Adjective (A)—red Adverb (R)– loudly Preposition (P)—to Article (T)– a, an Conjunction (C)– and Wh-word (W)– who Pronoun (U)--he

27 Hidden Markov Model and POS tagging Parts of Speech tags are states Words are observation S={N,V,A,R,P,C,T,W,U} O={Words of language}

28 Example Test sentence “^ People laugh aloud $”

29 Transition Table Tag\tagNVAR…U N# /#N V# /#V A R...... U

30 Lexical or Word Probabilities Tag\wordsBuyApplePeopleGoing… N# /#N V# /#V A R...... U

31 Corpus Collection of coherent text ^_^ People_N laugh_V aloud_A $_$ Corpus SpokenWritten Switchboard Corpus BrownBNC


Download ppt "CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36,37–Part of Speech Tagging and HMM 21 st and 25 th Oct, 2010 (forward,"

Similar presentations


Ads by Google