Download presentation
Presentation is loading. Please wait.
Published byMarcus Sparks Modified over 9 years ago
1
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG
2
Administrivia No more homeworks until the final No more homeworks until the final Final will also cover the material after Homework 4 Take-home final Handed out on Tuesday December 9th Discussed in class that day One week strict deadline No class on Thursday No class on Thursday Happy Turkey Day!
3
Relative Clauses From Lecture 14, we have examples like: From Lecture 14, we have examples like: The cat that John saw(object) The cat i that John e i The cat that saw John(subject) The cat i that e i saw John From Homework 4 (review), we saw that we can have multiply embedded relative clauses From Homework 4 (review), we saw that we can have multiply embedded relative clauses
4
Relative Clauses Classwork Question (do it now) Classwork Question (do it now) Rank the following sentences in order of the difficulty of comprehension: 1. I hate the man that the cat that Mary saw hissed at 2. I hate the man that saw the cat that hissed at John 3. I hate the man that the cat that hissed at John saw 4. I hate the man that hissed at the cat that John saw Note: 1 = most difficult If two (or more) are about the same level, give them the same rank
5
Today’s Lecture In Lecture 21, we looked at Stemming In Lecture 21, we looked at Stemming … the (morphological) process of going from a fully inflected word form to a root In today’s lecture, we’ll discuss part-of- speech (POS) tagging In today’s lecture, we’ll discuss part-of- speech (POS) tagging … the process of identifying the part of speech of a fully inflected word form
6
Part-of-Speech (POS) Tagging Example of a lightweight NLP task Example of a lightweight NLP task Useful when complete syntactic analysis is not needed, or… When used as a first stage towards a more complete analysis POS taggers are practical and do well 95%+ accuracy claimed in the literature
7
Parts of Speech: Problem Example: Example: walk: noun, verb The walk : noun I took … I walk : verb 2 miles every day Correct tag determined by syntax Correct tag determined by syntax POS taggers try to assign correct tag without actually parsing the sentence POS taggers try to assign correct tag without actually parsing the sentence
8
Components of a Tagger Dictionary of words Dictionary of words Exhaustive list of closed class items Examples: the, a, an: determinerthe, a, an: determiner from, to, of, by: prepositionfrom, to, of, by: preposition and, or: coordination conjunctionand, or: coordination conjunction Large set of open class (e.g. noun, verbs, adjectives) items with frequency information
9
Components of a Tagger Mechanism to assign tags Mechanism to assign tags Context-free: by frequency Context: bigram, trigram, hand-coded rules Example: Det Noun/*Verb the walk…Det Noun/*Verb the walk… Mechanism to handle unknown words (extra-dictionary) Mechanism to handle unknown words (extra-dictionary) Capitalization Morphology: -ed, -tion
10
How Hard is Tagging? Brown Corpus (Francis & Kucera, 1982): Brown Corpus (Francis & Kucera, 1982): 1 million words 39K distinct words 35K words with only 1 tag, 4K with multiple tags (DeRose, 1988) Easy task to do well on: Easy task to do well on: 90% accuracy for naïve algorithm (Charniak et al., 1993)
11
How Hard is Tagging? Multiple POS Multiple POS Example: still: noun, adjective, adverb, verb the still of the night, a glass stillthe still of the night, a glass still still watersstill waters stand stillstand still still strugglingstill struggling Still, I didn’t give wayStill, I didn’t give way still your fear of the dark (transitive)still your fear of the dark (transitive) the bubbling waters stilled (intransitive)the bubbling waters stilled (intransitive)
12
Penn TreeBank Tagset 48-tag simplification of Brown Corpus tagset 48-tag simplification of Brown Corpus tagset Examples: Examples: 1.CC Coordinating conjunction 3.DTDeterminer 7.JJAdjective 11.MDModal 12.NN Noun (singular,mass) 13.NNS Noun (plural) 27VB Verb (base form) 28VBD Verb (past)
13
Penn TreeBank Tagset www.ldc.upenn.edu/doc/treebank2/cl93.html
14
Penn TreeBank Tagset www.ldc.upenn.edu/doc/treebank2/cl93.html
15
Penn TreeBank Tagset How many tags? How many tags? Tag criterion Distinctness with respect to grammatical behavior? Make tagging easier? Punctuation tags Punctuation tags Penn Treebank numbers 37- 48 Trivial computational task
16
Penn TreeBank Tagset Simplifications : Simplifications : TO: infinitival marker, preposition I want to win I went to the store IN (preposition): that, when, although I know that I should have stopped, although… I stopped when I saw Bill
17
Penn TreeBank Tagset Simplifications: Simplifications: DT (determiner): any, some, these, those any man these *man/men VBP (verb, present): am, are, walk Am I here? *Walked I here?/Did I walk here?
18
Hard to Tag Items Syntactic function Syntactic function Example: I saw the man tired from running Examples from Brown Corpus Manual Examples from Brown Corpus Manual Hyphenation: long-range, high-energy shirt-sleeved signal-to-noise Foreign words: mens sana in corpore sano
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.