Presentation is loading. Please wait.

Presentation is loading. Please wait.

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6801 עיבוד שפות טבעיות - שיעור שישי Viterbi Tagging Syntax עידו.

Similar presentations


Presentation on theme: "Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6801 עיבוד שפות טבעיות - שיעור שישי Viterbi Tagging Syntax עידו."— Presentation transcript:

1 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6801 עיבוד שפות טבעיות - שיעור שישי Viterbi Tagging Syntax עידו דגן המחלקה למדעי המחשב אוניברסיטת בר אילן

2 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6802 Stochastic POS Tagging POS tagging: For a given sentence W = w 1 …w n Find the matching POS tags T = t 1 …t n In a statistical framework: T' = arg max P(T|W) T

3 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6803 Bayes’ Rule Words are independent of each other A word’s identity depends only on its own tag Markovian assumptions (bigram) Denominator doesn’t depend on tags Chaining rule Notation: P(t 1 ) = P(t 1 | t 0 )

4 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6804 The Markovian assumptions Limited Horizon –P(X i+1 = t k |X1,…,X i ) = P(X i+1 = t k | X i ) Time invariant –P(X i+1 = t k | X i ) = P(X j+1 = t k | X j )

5 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6805 Maximum Likelihood Estimations In order to estimate P(w i |t i ), P(t i |t i-1 ) we can use the maximum likelihood estimation –P(w i |t i ) = c(w i,t i ) / c(t i ) –P(t i |t i-1 ) = c(t i-1 t i ) / c(t i-1 ) Notice estimation for i=1

6 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6806 Unknown Words Many words will not appear in the training corpus. Unknown words are a major problem for taggers (!) Solutions – –Incorporate Morphological Analysis –  Consider words appearing once in training data as UNKOWNs

7 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6807 “Add-1/Add-Constant” Smoothing

8 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6808 Smoothing for Tagging For P(t i |t i-1 ) –Including one estimate for each t for all count(t,t * )=0 Optionally – for P(w i |t i )

9 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6809 Viterbi Finding the most probable tag sequence can be done with the viterbi algorithm. No need to calculate every single possible tag sequence (!)

10 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68010 Hmms Assume a state machine with –Nodes that correspond to tags –A start and end state (, ) –Arcs corresponding to transition probabilities - P(t i |t i-1 ) –A set of observations likelihoods for each state - P(w i |t i )

11 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68011 NN VBZ NNS AT VB RB P(like)=0.2 P(fly)=0.3 … P(eat)=0.36 0.6 0.4 P(likes)=0.3 P(flies)=0.1 … P(eats)=0.5 P(the)=0.4 P(a)=0.3 P(an)=0.2 …

12 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68012 HMMs An HMM is similar to an Automaton augmented with probabilities Note that the states in an HMM do not correspond to the input symbols. The input (output) symbols don’t uniquely determine the next state.

13 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68013 HMM definition HMM=(S,K,A,B) –Set of states S={s 1,…s n } –Output alphabet K={k 1,…k n } –State transition probabilities A={a ij } i,j  S –Symbol emission probabilities B=b(i,k) i  S,k  K –start and end states (Non emitting) Alternatively: initial state probabilities Note: for a given i-  a ij =1 &  b(i,k)=1

14 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68014 Why Hidden? Because we only observe the output - the underlying states are hidden Decoding: The problem of part-of-speech tagging can be viewed as a decoding problem: Given an observation sequence W=w 1,…,w n find a state sequence T=t 1,…,t n that best explains the observation.

15 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68015 -log m t1t1 t2t2 t3t3 t 0  2.31.71 t 1  1.712.3 t 2  0.33.3 t 3  1.3 2.3 -log m w1w1 w2w2 w3w3 t1t1 0.72.3 t2t2 1.70.73.3 t3t3 1.7 1.3 t1t1 t2t2 t3t3 w1w1 t1t1 t2t2 t3t3 w2w2 t1t1 t2t2 t3t3 w3w3 t0t0 -1.7 -0.3 -1.3 -3 -3.4 -2.7 -2.3 -1.7 -6 -4.7 -6.7 -1.7 -0.3 -1.3 -7.3 -9.3 -10.3

16 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68016 Viterbi Algorithm 1. D(0, S TART ) = 0 2. for each tag t != S TART do: D(0, t) = -  3. for i  1 to N do: a.for each tag t j do: D(i, t j )  max k [ D(i-1,t k ) + lm(t j | t k ) ] + lm(w i | t j ) Record best(i,j)=k which yielded the max 4.log P(W,T) = max j D(N, t j ) 5.Reconstruct path from max j backwards Where: lm(.) = log m(.) and D(i, t j ) – max joint probability of state and word sequences till position i, ending at t j. Complexity: O(N t 2 N)

17 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68017 A*, N-best decoding Sometimes one wants not just the best state sequence for a given input but rather the top – n best sequences. e.g. as input for a different model A* / stack decoding is an alternative to viterbi.

18 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68018 Up from bigrams The POS tagging model we described used a history of just the previous tag: P(t i |t 1,…,t i-1 ) ≈ P(t i |t i-1 ) i.e. a First Order Markovian Assumption In this case each state in the HMM corresponds to a POS tag One can build an HMM for POS trigrams P(t i |t 1,…,t i-1 ) ≈ P(t i |t i-2,t i-1 )

19 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68019 POS Trigram HMM Model More accurate (also empirically) than a bigram model –He clearly marked –is clearly marked Sparseness problem – smoothing, back-off In such a model the HMM states do NOT correspond to POS tags. Why not 4-grams? –Too many states, not enough data!

20 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68020 Supervised/Unsupervised Is the HMM based tagging a supervised algorithm? –Yes, because we need a tagged corpus to estimate the transition and emission probabilities (!) What do we do if we don’t have an annotated corpus but, –Have a dictionary –Have an annotated corpus from a different domain and an un-annotated corpus in desired domain.

21 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68021 Baum-Welch Algorithm also known as the Forward-Backward Algorithm An EM algorithm for HMMs. Maximization by Iterative hill climbing The algorithm iteratively improves the model parameters based on un- annotated training data.

22 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68022 Baum-Welch Algorithm… Start of with parameters based on the dictionary: –P(w|t) = 1 if t is possible tag for w –P(w|t) = 0 otherwise –Uniform distribution on state transitions This is enough to bootstrap from. Could also be used to tune a system to a new domain. But best results, and common practice, is using supervised estimation

23 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68023 Completely unsupervised? What if there is no dictionary and no annotated corpus?  Clustering – doesn’t correspond to linguistic POS

24 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68024 עיבוד שפות טבעיות - שיעור שבע Partial Parsing אורן גליקמן המחלקה למדעי המחשב אוניברסיטת בר אילן

25 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68025 Syntax The study of grammatical relations between words and other units within the sentence. The Concise Oxford Dictionary of Linguistics the way in which linguistic elements (as words) are put together to form constituents (as phrases or clauses) Merriam-Webster Dictionary

26 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68026 Brackets “I prefer a morning flight” [ S [ NP [ pro I]][ VP [ V prefer][ NP [ Det a] [ Nom [ N morning] [ N flight]]]]]]

27 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68027 Parse Tree Noun Nom NounDet VerbPronoun Ipreferamorning flight NP S VP

28 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68028 Parsing The problem of mapping from a string of words to to its parse tree is called parsing.

29 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68029 Generative Grammar A set of rules which indicate precisely what can be and cannot be a sentence in a language. A grammar which precisely specifies the membership of the set of all the grammatical sentences in the language in question and therefore excludes all the ungrammatical sentences.

30 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68030 Formal Languages The set of all grammatical sentences in a given natural language. Are natural languages regular?

31 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68031 English is not a regular language! a n b n is not regular Look at the following English sentences: –John and Mary like to eat and sleep, respectively. –John, Mary, and Sue like to eat, sleep, and dance, respectively. –John, Mary, Sue, and Bob like to eat, sleep, dance, and cook, respectively. Anti missile missile

32 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68032 Constituents Certain groupings of words behave as constituents. Constituents are able to occur in various sentence positions: –ראיתי את הילד הרזה –ראיתי אותו מדבר עם הילד הרזה –הילד הרזה גר ממול

33 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68033 The Noun Phrase (NP) Examples: –He –Ariel Sharon –The prime minister –The minister of defense during the war in Lebanon. They can all appear in a similar context: ___ was born in Kfar-Malal

34 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68034 Prepositional Phrases Examples: –the man in the white suit –Come and look at my paintings –Are you fond of animals? –Put that thing on the floor

35 Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-68035 Verb Phrases Examples: –He went –He was trying to keep his temper. –She quickly showed me the way to hide.


Download ppt "Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books 88-6801 עיבוד שפות טבעיות - שיעור שישי Viterbi Tagging Syntax עידו."

Similar presentations


Ads by Google