1 Natural Language Processing Vasile Rus

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Word Bi-grams and PoS Tags
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Outline Why part of speech tagging? Word classes
Statistical NLP: Lecture 3
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams.
LINGUISTICA GENERALE E COMPUTAZIONALE DISAMBIGUAZIONE DELLE PARTI DEL DISCORSO.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
September PART-OF-SPEECH TAGGING Universita’ di Venezia 1 Ottobre 2003.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
1 Words and the Lexicon September 10th 2009 Lecture #3.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
1 I256: Applied Natural Language Processing Marti Hearst Sept 20, 2006.
NLP and Speech 2004 English Grammar
Part of speech (POS) tagging
1 PART-OF-SPEECH TAGGING. 2 Topics of the next three lectures Tagsets Rule-based tagging Brill tagger Tagging with Markov models The Viterbi algorithm.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 13, 2004.
1 I256: Applied Natural Language Processing Marti Hearst Sept 18, 2006.
February 2007CSA3050: Tagging I1 CSA2050: Natural Language Processing Tagging 1 Tagging POS and Tagsets Ambiguities NLTK.
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word Bi-grams and PoS Tags COMP3310 Natural Language Processing Eric Atwell,
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Word classes and part of speech tagging Chapter 5.
Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
Natural Language Processing
CSA3202 Human Language Technology HMMs for POS Tagging.
Parts of Speech Major source: Wikipedia. Adjectives An adjective is a word that modifies a noun or a pronoun, usually by describing it or making its meaning.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Part-of-speech tagging
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Modified from Diane Litman's version of Steve Bird's notes 1 Rule-Based Tagger The Linguistic Complaint –Where is the linguistic knowledge of a tagger?
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Word classes and part of speech tagging Chapter 5.
Natural Language Processing Vasile Rus
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
Lecture 9: Part of Speech
Parts of Speech Review.
Statistical NLP: Lecture 3
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
Evaluation Which of the three taggers did best?
PART OF SPEECH TAGGING (POS)
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
Lecture 6: Part of Speech Tagging (II): October 14, 2004 Neal Snider
English parts of speech
Statistical n-gram David ling.
Natural Language Processing
Part-of-Speech Tagging Using Hidden Markov Models
Presentation transcript:

1 Natural Language Processing Vasile Rus

2 Outline Announcements Word Categories (Parts of Speech) Part of Speech Tagging

Announcements Paper presentations Projects 3

4 Language Language = words grouped according to some rules called a grammar Language = words + rules Rules are too flexible for system developers Rules are not flexible enough for poets

5 Words and their Internal Affairs: Morphology Words are grouped into classes/ grammatical categories/ syntactic categories/parts-of-speech (POS) based –on their syntactic and morphological behavior Noun: words that occur with determiners, take possessives, occur (most but not all) in plural form –and less on their typical semantic type Luckily the classes are semantically coherent at some extent A word belongs to a class if it passes the substitution test –The sad/intelligent/green/fat bug sucks cow’s blood. They all belong to the same class: ADJ

6 Words and their Internal Affairs: Morphology Word categories are of two types: –Open categories: accept new members Nouns Verbs Adjectives Adverbs –Closed or functional categories Almost fixed membership Few members Determiners, prepositions, pronouns, conjunctions, auxiliary verbs?, particles, numerals, etc. Play an important role in grammar Any known human language has nouns and verbs (Nootka is a possible exception)

7 Nouns Noun is the name given to the category containing: people, places, or things A word is a noun if: –Occurs with determiners (a student) –Takes possessives (a student’s grade) –Occurs in plural form (focus - foci) English Nouns –Count nouns: allow enumeration (rabbits) –Mass nouns: homogeneous things (snow, salt)

8 Verbs Words that describe actions, processes or states Subclasses of Verbs: –Main verbs –Auxiliaries (copula be, do, have) –Modal verbs: mark the mood of the main verb Can: possibility May: permission Must: necessity –Phrasal verbs: verb + particle Particle: word that combines with verb –It is often confused with prepositions or adverbs –Can appear in places in which prepositions and adverbs cannot »For example before a preposition: I went on for a walk

9 Adjectives & Adverbs Adjectives: words that describe qualities or properties Adverbs: a very diverse class –Subclasses Directional or locative adverbs (northwards) Degree adverbs (very) Manner adverbs (fast) Temporal adverbs (yesterday, Monday) –Monday: Isn’t it a noun ?

10 Prepositions Occur before noun phrases They are relational words indicating temporal or spatial relations or other relations –by the river –by tommorow –by Shakespeare

11 Conjunctions Used to join two phrases, clauses, or sentences Subclasses –Coordinating conjunctions (and, or, but) –Subordinating conjunctions or complementizers (that) link a verb to its argument

12 Pronouns A shorthand for noun phrases or entities or events Subclasses: –Personal pronouns: refer to persons or entities –Possessive pronouns –Wh-pronouns: in questions and as complementizers

13 Other categories Interjections: oh, hey Negatives: no, not Politeness markers: please Greetings: hello Existentials: there

14 Tagsets Tagset – set of categories/POS The number of categories differ among tagsets Trade-off between granularity (finer categories) and simplicity Available Tagsets: –Dionysius Thrax of Alexandria: 8 tags [circa 100 B.C.] –Brown corpus: 87 tags –Penn Treebank: 45 tags –Lancaster UCREL project’ C5 (used to tag the BNC): 61 tags (see Appendix C) –C7: 145 tags (see Appendix C)

15 The Brown Corpus The first digital corpus (1961) –Francis and Kucera, Brown University Contents: 500 texts, each 2000 words long –From American books, newspapers, magazines –various genres: Science fiction, romance fiction, press reportage, scientific writing, popular lore

16 Penn Treebank First syntactically annotated corpus 1 million words from Wall Street Journal Part of speech tags and syntax trees

17 Important Penn Treebank Tags

18 Verb Inflection Tags

19 Penn Treebank Tagset

20 Terminology Tagging –The process of labeling words in a text with part of speech or other lexical class marker Tags –The labels Tag Set –The collection of tags used for a particular task

21 Example Input: raw text Output: text as word/tag Mexico/NNP City/NNP has/VBZ a/DT very/RB bad/JJ pollution/NN problem/NN because/IN the/DT mountains/NNS around/IN the/DT city/NN act/NN as/IN walls/NNS and/CC block/NN in/IN dust/NN and/CC smog/NN./. Poor/JJ air/NN circulation/NN out/IN of/IN the/DT mountain-walled/NNP Mexico/NNP City/NNP aggravates/VBZ pollution/NN./. Satomi/NNP Mitarai/NNP died/VBD of/IN blood/NN loss/NN./. Satomi/NNP Mitarai/NNP bled/VBD to/TO death/NN./.

22 Significance of Parts of Speech A word’s POS tells us a lot about the word and its neighbors: –Can help with pronunciation: object (NOUN) vs object (VERB) –Limits the range of following words for Speech Recognition a personal pronoun is most likely followed by a verb –Can help with stemming A certain category takes certain affixes –Can help select nouns from a document for IR –Parsers can build trees directly on the POS tags instead of maintaining a lexicon –Can help with partial parsing in Information Extraction

23 Choosing a tagset The choice of tagset greatly affects the difficulty of the problem Need to strike a balance between –Getting better information about context (introduce more distinctions) –Make it possible for classifiers to do their job (need to minimize distinctions)

24 Issues in Tagging Ambiguous Tags –hit can be a verb or a noun –Use some context to better choose the correct tag Unseen words –Assign a FOREIGN label to unknowns –Use some morphological information guess NNP for a word with an initial capital closed-class words in English HELP tagging Prepositions, auxiliaries, etc. New ones do not tend to appear

25 How hard is POS tagging? Number of tags Number of word types In the Brown corpus, % of word types ambiguous - 40% of word TOKENS

26 Tagging methods Rule-based POS tagging Statistical taggers –more on this in few weeks Brill’s (transformation-based) tagger

27 Rule-based Tagging Two stage architecture –Dictionary: an entry = word + list of possible tags –Hand-coded disambiguation rules ENGTWOL tagger –56,000 entries in lexicon –1,100 constraints to rule out incorrect POS-es

28 Evaluating a Tagger Tagged tokens – the original data Untag the data Tag the data with your own tagger Compare the original and new tags –Iterate over the two lists checking for identity and counting –Accuracy = fraction correct

29 Evaluating the Tagger This gets 2 wrong out of 16, or 12.5% error Can also say an accuracy of 87.5%.

30 Training vs. Testing A fundamental idea in computational linguistics Start with a collection labeled with the right answers –Supervised learning –Usually the labels are assigned by hand “Train” or “teach” the algorithm on a subset of the labeled text Test the algorithm on a different set of data –Why? Need to generalize so the algorithm works on examples that you haven’t seen yet Thus testing only makes sense on examples you didn’t train on

31 Statistical Baseline Tagger Find the most frequent tag in a corpus Assign to each word the most frequent tag

32 Lexicalized Baseline Tagger For each word detect its possible tags and their frequency Assign the most common tag to each word –90-92% accuracy –Compare to state of the art taggers: 96-97% accuracy –Humans agree on 96-97% of the Penn Treebank’s Brown corpus

33 Tagging with Most Likely Tag Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN Problem: assign most likely tag to race Solution: we choose the tag that has the greater probability –P(VB|race) –P(NN|race) Estimates from the Brown corpus: –P(NN|race) =.98 –P(VB|race) =.02

34 Stastistical Tagger The Linguistic Complaint –Where is the linguistic knowledge of a tagger? –Just a massive table of numbers –Aren’t there any linguistic insights that could emerge from the data? –Could thus use handcrafted sets of rules to tag input sentences, for example, if a word follows a determiner tag it as a noun

35 The Brill tagger An example of TRANSFORMATION- BASED LEARNING Very popular (freely available, works fairly well) A SUPERVISED method: requires a tagged corpus Basic idea: do a quick job first (using the lexicalized baseline tagger), then revise it using contextual rules

36 Brill Tagging: In more detail Training: supervised method –Detect most frequent tag for each word –Detect set of transformations that could improve the lexicalized baseline tagger Testing/Tagging new words in sentences –For each new word apply the lexicalized baseline step –Apply set of learned transformation in order –Use morphological info for unknown words

37 An example Examples: –It is expected to race tomorrow. –The race for outer space. Tagging algorithm: 1.Tag all uses of “race” as NN (most likely tag in the Brown corpus) It is expected to race/NN tomorrow the race/NN for outer space 2.Use a transformation rule to replace the tag NN with VB for all uses of “race” preceded by the tag TO: It is expected to race/VB tomorrow the race/NN for outer space

38 Transformation-based learning in the Brill tagger 1.Tag the corpus with the most likely tag for each word 2.Choose a TRANSFORMATION that deterministically replaces an existing tag with a new one such that the resulting tagged corpus has the lowest error rate 3.Apply that transformation to the training corpus 4.Repeat 5.Return a tagger that a.first tags using most frequent tag for each word b.then applies the learned transformations in order

39 Examples of learned transformations

40 Templates

41 First 20 Transformation Rules From: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging Eric Brill. Computational Linguistics. December, 1995.

42 Transformation Rules for Tagging Unknown Words From: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging Eric Brill. Computational Linguistics. December, 1995.

43 Summary Parts of Speech Part of Speech Tagging

44 Next Time Language Modeling