Download presentation
Presentation is loading. Please wait.
Published byAshley Perry Modified over 9 years ago
1
임성신sslim@pusan.ac.kr Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING
2
Artificial Intelligence Laboratory 2 Agenda What are they? Distribution Tagsets Tagging Rules Probabilities Transformation-Based(Brill)
3
Artificial Intelligence Laboratory 3 Parts of Speech Start with eight basic categories Noun, verb, pronoun, preposition, adjective, adverb, article, conjunction These categories are based on morphological and distributional properties (not semantics) Some cases are easy, others are murky
4
Artificial Intelligence Laboratory 4 Parts of Speech Two kinds of category Closed class Prepositions, articles, conjunctions, pronouns Open class Nouns, verbs, adjectives, adverbs
5
Artificial Intelligence Laboratory 5 Fig 8.1 Prepositions(and particles) of English from the CELEX on-line dictionary. Frequency counts are from the COBUILD 16 million word corpus.
6
Artificial Intelligence Laboratory 6 Fig 8.2 English single-word particles from Quirk et al.(1985).
7
Artificial Intelligence Laboratory 7 Fig 8.3 Coordinating and subordinating conjunctions of English from the CELEX on-line dictionary. Frequency counts are from the COBUILD 16 million word corpus.
8
Artificial Intelligence Laboratory 8 Fig 8.4 Pronouns of English from the CELEX on-line dictionary. Frequency counts are from the COBUILD 16 million word corpus.
9
Artificial Intelligence Laboratory 9 Fig 8.5 English modal verbs from the CELEX on-line dictionary. Frequency counts are from the COBUILD 16 million word corpus.
10
Artificial Intelligence Laboratory 10 Sets of Parts of Speech: Tagsets There are various standard tagsets to choose from; some have a lot more tags than others The choice of tagset is based on the application Accurate tagging can be done with even large tagsets
11
Artificial Intelligence Laboratory 11 Fig 8.6 Penn Treebank part-of-speech tags (including punctuation).
12
Artificial Intelligence Laboratory 12 Tagging Part of speech tagging is the process of assigning parts of speech to each word in a sentence … Assume we have A tagset A dictionary that gives you the possible set of tags for each entry A text to be tagged A reason? The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS./.
13
Artificial Intelligence Laboratory 13 Figure 8.7 The number of word types in Brown corpus by degree of ambiguity (after DeRose(1988)).
14
Artificial Intelligence Laboratory 14 Tagging - Rules Hand-crafted rules for ambiguous words that test the context to make appropriate choices Early attempts fairly error-prone Extremely labor-intensive
15
Artificial Intelligence Laboratory 15 Figure 8.8 Sample lexical entries from the ENGTWOL lexicon described in Voutilainen(1995) and Heikkila(1995).
16
Artificial Intelligence Laboratory 16 Tagging - Probabilities 장점 충분한 크기의 태그부탁 말뭉치만 주어지면 태깅에 필요한 통계 정보의 추출이 용이하기 때문에 확장성이 좋고 적용범위가 넓으 며 전체적인 정확성이 비교적 높다는 장점 단점 말뭉치에 의존적 의미 있는 통계정보를 추출하기 위해서는 일정크기 이상의 태그 부탁 말뭉치 필요 말뭉치 구축에 시간과 노력이 많이 요구됨 말뭉치가 편중되어 있거나 불충분한 경우에는 data sparseness 로 인해 신뢰도가 떨어짐
17
Artificial Intelligence Laboratory 17 Tagging - Probabilities We want the best set of tags for a sequence of words (a sentence) W is a sequence of words T is a sequence of tags The probability of the word sequence P(W) will be the same for each tag sequence
18
Artificial Intelligence Laboratory 18 Tagging - Transformation-Based(Brill tagging) Combine rules and statistics … TBL(Transformation-Based Learning) is based on rules Rules are automatically induced from the data(ML)
19
Artificial Intelligence Laboratory 19 Brill tagging - Examples Race “ race ” as NN:.98 “ race ” as VB:.02 So you ’ ll be wrong 2% of the time, which really isn ’ t bad Patch the cases where you know it has to be a verb Change NN to VB when previous tag is TO
20
Artificial Intelligence Laboratory 20 Brill tagging - Rules Where did that transformational rule come from? Define a hypothesis space of rules that might help decrease an error rate Search that space (exhaustively?) to find rules that most reduce an error rate. Continue to add rules until some stopping criteria is reached Figure 8.9 Brill’s(1995) templates. Each begins with “Change tag a to tag b when : …”. The variables a, b, z and w range over parts-of-speech.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.