School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word-counts, visualizations and N-grams Eric Atwell, Language Research.
Word Bi-grams and PoS Tags
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Statistical NLP: Lecture 3
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
BİL711 Natural Language Processing
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Stemming, tagging and chunking Text analysis short of parsing.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
CS Catching Up CS Porter Stemmer Porter Stemmer (1980) Used for tasks in which you only care about the stem –IR, modeling given/new distinction,
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
1 I256: Applied Natural Language Processing Marti Hearst Sept 20, 2006.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Part of speech (POS) tagging
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
Parts of Speech (Lexical Categories). Parts of Speech Nouns, Verbs, Adjectives, Prepositions, Adverbs (etc.) The building blocks of sentences The [ N.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Part-of-Speech Tagging
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
1 CPE 641 Natural Language Processing Lecture 2: Levels of Linguistic Analysis, Tokenization & Part- of-speech Tagging Asst. Prof. Dr. Nuttanart Facundes.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 11.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
© Child language acquisition To what extent do children acquire language by actively working out its rules?
Lecture 10 NLTK POS Tagging Part 3 Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Linguistic Essentials
Speech and Language Processing Ch8. WORD CLASSES AND PART-OF- SPEECH TAGGING.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Natural Language Processing
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
Part-of-speech tagging
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Basic Syntactic Structures of English CSCI-GA.2590 – Lecture 2B Ralph Grishman NYU.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Unit 1 Language Parts of Speech. Nouns A noun is a word that names a person, place, thing, or idea Common noun - general name Proper noun – specific name.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Introduction to Linguistics
Lecture 9: Part of Speech
Introduction to Machine Learning and Text Mining
Statistical NLP: Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
CSCI 5832 Natural Language Processing
Linguistic Essentials
Classical Part of Speech (PoS) Tagging
Natural Language Processing
Vocabulary/Lexis LEXIS: n., collective, uncountable
Presentation transcript:

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing Eric Atwell, Language Research Group (with thanks to Katja Markert, Marti Hearst, and other contributors)

Reminder: PoS-tagging programs Models behind some example PoS-tagging methods in NLTK: Hand-coded Statistical taggers Brill (transformation-based) tagger NB you dont have to use NLTK – useful to illustrate

Training and Testing of Machine Learning Algorithms Algorithms that learn from data see a set of examples and try to generalize from them. Training set: Examples trained on Test set: Also called held-out data and unseen data Use this for evaluating your algorithm Must be separate from the training set Otherwise, you cheated! Gold standard evaluation set A test set that a community has agreed on and uses as a common benchmark. DO NOT USE IN TRAINING OR TESTING

PoS word classes in English Word classes, also called syntactic categories or grammatical categories or Parts of Speech closed class type: classes with fixed and few members, function words e.g. prepositions; open class type: large class of members, many new additions, content words e.g. nouns 8 major word classes: nouns, verbs, adjectives, adverbs, prepositions, determiners, conjunctions, pronouns In English, also most (?all) Natural Languages

What properties define noun? Semantic properties: refer to people, places and things Distributional properties: ability to occur next to determiners, possessives, adjectives (specific locations) Morphological properties: most occur in singular and plural These are properties of a word TYPE, eg man is a noun (usually) Sometimes a given TOKEN may not meet all these criteria … The men are happy … the man is happy … They man the lifeboat (?)

Subcategories Noun Proper Noun v Common Noun (Mass noun v Count Noun) singular v plural Count v mass (often not covered in PoS-tagsets) Some tag-sets may have other subcategories, Eg NNP = common noun with Word Initial Capital (eg Englishman) PoS-tagset Often encodes morphological categories like person, number, gender, tense, case...

Verb: action or process VB present/infinitive teach, eat VBZ 3 rd -person-singular present (s-form) teaches, eats VBG progressive (ing-form) teaching, eating VBD/VBN past taught, ate/eaten Intransitive he died, transitive she killed him, … (transitivity usually not marked in PoS-tags) Auxiliaries: Modal verb e.g. can, must, may Have, be, do can be modal or ma verbs e.g. I have a present v I have given you a present

Adjective: quality or property (of a thing: noun phrase) English is simple: JJ big, JJR comparative bigger, JJT superlative biggest More features in other languages, eg Agreement (number, gender) with noun Before a noun v after be

Adverb: quality or property of verb or adjective (or other functions…) A hodge-podge (!) General adverb often ends –ly slowly, happily (but NOT early) Place adverb home, downhill Time adverb now, tomorrow Degree adverbs very, extremely, somewhat

Function words Preposition e.g. in of on for over with (to) Determiner e.g. this that, article the a Conjunction e.g. and or but because that Pronoun e.g. personal pronouns I we (1 st person), you (2 nd person), he she it they (3 rd person) Possessive pronouns my, your, our, their WH-pronouns what who whoever Others: negatives (not), interjections (oh), existential there, …

Parts of multi word expressions Particle – like preposition but part of a phrasal verb I looked up her address v I looked up her skirt I looked her address up v *I looked her skirt up Big problem for PoS-tagging: common, and ambiguous Other multi-word idioms: ditto tags

Bigram Markov Model tagger Naive Method 1. Get all possible tag sequences of the sentence 2. Compute the probability of each tag sequence given the Sentence, using word-tag and tag-bigram probabilites 3. Take the maximum probability Problem: This method has exponential complexity! Solution: Viterbi Algorithm (not discussed in this module)

N-gram tagger Uses the preceding N-1 predicted tags Also uses the unigram estimate for the current word

Example p(AT NN BEZ IN AT NN|The bear is on the move) = p(the|AT)p(AT|PERIOD)× p(bear|NN)p(NN|AT)... ×p(move|NN)p(NN|AT) p(AT NN BEZ IN AT VB|The bear is on the move) = p(the|AT)p(AT|PERIOD)× p(bear|NN)p(NN|AT)... ×p(move|VB)p(VB|AT)

Bigram tagger: problems Unknown words in new input Parameter estimation: need a tagged training text, what if this is different genre/dialect/language-type from new input? Tokenization of training text and new input: contractions (isnt), multi-word tokens (New York) crude assumptions very short distance dependencies tags are not conditioned on previous words Unintuitive

Transformation-based tagging Markov model tagging: small range of regularities only TB tagging first used by Brill, 1995 Encodes more complex interdependencies between words and tags by learning intuitive rules from a training corpus exploits linguistic knowledge; rules can be tuned manually

Transformation Templates Templates specify general, admissible transformations: Change Tag1 to Tag2 if The preceding (following) word is tagged Tag3 The word two before (after) is tagged Tag3 One of the two preceding (following) words is tagged Tag3 One of the three preceding (following) words is tagged Tag3 The preceding word is tagged Tag3 and the following word is tagged Tag4 The preceding (following) word is tagged Tag3 and the word two before (after) is tagged Tag4

Machine Learning Algorithm Learns rules from tagged training corpus by specialising in templates 1. Assume you do not know the precise tagging sequence in your training corpus 2. Tag each word in the training corpus with its most frequent tag, e.g. move => VB 3. Consider all possible transformations and apply the one that improves tagging most (greedy search), e.g. Change VB to NN if the preceding word is tagged AT 4. Retag whole corpus applying that rule 5. Go back to 3 and repeat until no significant improvements are reached 6. Output all the rules you learnt in order!

Example: 1 st cycle First approximation: Initialise with most frequent tag (lexical information) The/AT bear/VB is/BEZ on/IN the/AT move/VB to/TO race/NN there/RN

Change VB to NN if previous tag is AT Try all possible transformations, choose the most useful one and apply it: The/AT bear/NN is/BEZ on/IN the/AT move/NN to/TO race/NN there/RN

Change NN to VB if previous tag is TO Try all possible transformations, choose the most useful one and apply it: The/AT bear/NN is/BEZ on/IN the/AT move/NN to/TO race/VB there/RN

Final set of learnt rules Brill rules corresponding to syntagmatic patterns 1. Change VB to NN if previous tag is AT 2. Change NN to VB if previous tag is TO Can now be applied to an untagged corpus! uses pre-encoded linguistic knowledge explicitly uses wider context + following context can be expanded to word-driven templates can be expanded to morphology-driven templates (for unknown words) learnt rules are intuitive, easy to understand

Combining taggers Can be combined via backoff: if first tagger finds no tag (None) then try another tagger This really only makes sense with N-gram taggers: If trigram tagger finds no tag, backoff to bigram tagger, if bigram tagger fails then backoff to unigram tagger Better: combine tagger results by a voting system Combinatory Hybrid Elementary Analysis of Text (combines results of morphological analysers / taggers)