1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran.

1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran

2 Decision Trees Decision Tree (DT):  Tree where the root and each internal node is labeled with a question.  The arcs represent each possible answer to the associated question.  Each leaf node represents a prediction of a solution to the problem. Popular technique for classification; Leaf node indicates class to which the corresponding tuple belongs.

3 Decision Tree Example

4 Decision Trees A Decision Tree Model is a computational model consisting of three parts:  Algorithm to create the tree  Algorithm that applies the tree to data Creation of the tree is the most difficult part. Processing is basically a search similar to that in a binary search tree (although DT may not be binary).

5 Decision Tree Algorithm

6 Using DT in POS Tagging Compute Ambiguity classes  Each term may have different tags  Ambiguity class for each term: set of all possible tags  compute # of occurrence for each tag in each ambiguity class # of occurrence Ambiguity Class 10 20 25 40a b c d 40 39 50b c d 60 55b d

7 Using DT in POS Tagging Create Decision Tree on Ambiguity classes In each level delete tag with minimum occurrence abcd 10202540 bcd 403950 bd 6055 b

8 Using DT in POS Tagging Advantage  Easy to understand  Easy to implement Disadvantage  Context independent

9 Using DT in POS Tagging Known Tokens Results AccuracyCorrectTokensPercentRun 92.34%36376439392397.971 92.50%32896535563098.062 92.51%36778939752897.963 92.94%38157841056197.924 92.36%37230540307997.975 92.474%362880.2392144.297.976Average

10 Using DT in POS Tagging Unknown Tokens Results AccuracyCorrectTokensPercentRun 52.59%427481272.03 1 56.64%398170281.94 2 51.00%422182762.04 3 55.45% 48208692 2.08 4 54.28%452083262.035 53.992%4363.28089.82.014 Average

11 POS tagging using HMMs Let W be a sequence of words W = w 1, w 2, …, w n Let T be the corresponding tag sequence T = t 1, t 2, …, t n Task : Find T which maximizes P ( T | W ) T’ = argmax T P ( T | W )

13 POS tagging using HMMs  Smoothing Transition Probabilities  Sparse data problem  Linear interpolation method P ' (t i | t i - 2, t i - 1 ) = λ 1 P( t i ) + λ 2 P(t i | t i - 1 ) + λ 3 P(t i | t i - 2, t i - 1 ) such that the s sum to 1

14 POS tagging using HMMs  Calculation of λs

15 POS tagging using HMMs  Emission Probability, P(W | T ) ≈ P(w 1 | t 1 ) * P(w 2 | t 2 ) *... * P(w n | t n )  Context Dependency  To make more dependent on the context the emission probability is calculated as: P(W | T ) ≈ P(w 1 | $ t 1 ) * P(w 2 | t 1 t 2 )...* P(w n | t n-1 t n )

16 POS tagging using HMMs  Smoothing technique is applied P ' (w i | t i-1 t i ) = θ 1 P(w i | t i ) + θ 2 P(w i | t i-1 t i ) Sum of all θs is equal to 1  θs are different for different words.

17 POS tagging using HMMs 1) 2) 3) 4) 5) 6)

18 POS tagging using HMMs

20 POS tagging using HMMs Lexicon generation probability

22 P(N V ART N | files like a flower) = 4.37*10 -6 POS tagging using HMMs

23 POS tagging using HMMs Known Tokens Results AccuracyCorrectTokensPercentRun 96.94%38221139429098.07 1 97.18%345913 98.16 2 96.96%34389439784998.04 3 96.96% 398487410970 98.02 4 97.03%39147540346098.075 97.01%372396390496.498.072 Average

24 Unknown Tokens Results AccuracyCorrectTokensPercentRun 75.12%582977601.93 1 80.09%535766891.84 2 77.34%615379561.96 3 77.69% 64358283 1.98 4 78.62%624679451.935 77.77%60047726.61.928 Average

25 Overall Results AccuracyCorrectTokensRun 96.52%3880404020501 96.86%3512703626582 96.57%3918904058053 96.58%4049224192534 96.67%3977214114055 96.64%386768.6400234.2Average

1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran.

Similar presentations

Presentation on theme: "1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran.

Similar presentations

Presentation on theme: "1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran."— Presentation transcript:

Similar presentations

About project

Feedback