10/30/2015CPSC503 Winter 20091 CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.

Slides:



Advertisements
Similar presentations
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Advertisements

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Outline Why part of speech tagging? Word classes
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
Albert Gatt Corpora and Statistical Methods Lecture 8.
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Part-Of-Speech (POS) Tagging.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Part-of-Speech Tagging & Sequence Labeling
Word classes and part of speech tagging Chapter 5.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
Albert Gatt Corpora and Statistical Methods Lecture 9.
8/27/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Part-of-Speech Tagging
8. Word Classes and Part-of-Speech Tagging 2007 년 5 월 26 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.287 ~ 303.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
10/3/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.
Natural Language Processing Lecture 8—2/5/2015 Susan W. Brown.
Lecture 6 POS Tagging Methods Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
CSA3202 Human Language Technology HMMs for POS Tagging.
CPSC 503 Computational Linguistics
12/6/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
Part-of-speech tagging
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
2/29/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Part-of-Speech Tagging & Sequence Labeling Hongning Wang
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
3/20/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Overview  Modelling Evolving Worlds with DBNs  Simplifying Assumptions Stationary Processes, Markov Assumption  Inference Tasks in Temporal Models Filtering.
Lecture 5 POS Tagging Methods
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
CPSC 503 Computational Linguistics
CSCI 5832 Natural Language Processing
CPSC 503 Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
CPSC 503 Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
CPSC 503 Computational Linguistics
Natural Language Processing
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Presentation transcript:

10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini

10/30/2015CPSC503 Winter Knowledge-Formalisms Map Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners Markov Models Markov Chains -> n-grams Hidden Markov Models (HMM) MaxEntropy Markov Models (MEMM)

10/30/2015CPSC503 Winter Today 30/9 Hidden Markov Models: –definition –the three key problems (only one in detail) Part-of-speech tagging –What it is –Why we need it –How to do it

10/30/2015CPSC503 Winter HMMs (and MEMM) intro They are probabilistic sequence-classifier / sequence-lablers: assign a class/label to each unit in a sequence Used extensively in NLP Part of Speech Tagging e.g Brainpower_NN,_, not_RB physical_JJ plant_NN,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN._. Partial parsing [NP The HD box] that [NP you] [VP ordered] [PP from] [NP Shaw] [VP never arrived]. Named entity recognition [John Smith PERSON] left [IBM Corp. ORG] last summer.

10/30/2015CPSC503 Winter Hidden Markov Model (State Emission) s1s1 a b i Start.6 Start.4 s2s2 a s3s3 s4s4 i a b b

10/30/2015CPSC503 Winter Hidden Markov Model Formal Specification as five-tuple Set of States Output Alphabet Initial State Probabilities State Transition Probabilities Symbol Emission Probabilities s1s1 a b i Start.6 Start.4 s2s2 a s3s3 s4s4 i a b b

10/30/2015CPSC503 Winter Three fundamental questions for HMMs Decoding: Finding the probability of an observation sequence brute force or Forward/Backward-Algorithms Manning/Schütze, 2000: 325 Finding the most likely state sequence Viterbi-Algorithm Training: find model parameters which best explain the observations

10/30/20158 Computing the probability of an observation sequence O= o 1... o T X = all sequences of T states e.g., P(b,i | sample HMM ) s1s1 a b i Start.6 Start.4 s2s2 a s3s3 s4s4 i a b b

10/30/2015CPSC503 Winter Decoding Example Manning/Schütze, 2000: 327 s 1, s 1 = 0 ? s 1, s 4 = 1 *.5 *.6 *.7 s 2, s 4 = 0? ………. s 1, s 2 = 1 *.1 *.6 *.3 ………. Complexity  s1s1 a b i Start.6 Start.4 s2s2 a s3s3 s4s4 i a b b

10/30/2015CPSC503 Winter The forward procedure 1. Initialization 2. Induction 3. Total Complexity s1s1 a b i Start.6 Start.4 s2s2 a s3s3 s4s4 i a b b

10/30/2015CPSC503 Winter Three fundamental questions for HMMs Decoding: Finding the probability of an observation sequence brute force or Forward or Backward Algorithm Finding the most likely state sequence Viterbi-Algorithm Training: find model parameters which best explain the observations If interested in details of Backward algorithm and the next two questions, read (Sections 6.4 – 6.5)

10/30/2015CPSC503 Winter Today 30/9 Hidden Markov Models: –definition –the three key problems (only one in detail) Part-of-speech tagging –What it is, Why we need it… –Word classes (Tags) Distribution Tagsets –How to do it Rule-based Stochastic

10/30/2015CPSC503 Winter Parts of Speech Tagging: What Brainpower_NN,_, not_RB physical_JJ plant_NN,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN._. Tag meanings NNP (Proper N sing), RB (Adv), JJ (Adj), NN (N sing. or mass), VBZ (V 3sg pres), DT (Determiner), POS (Possessive ending),. (sentence-final punct) Output Brainpower, not physical plant, is now a firm's chief asset. Input

10/30/2015CPSC503 Winter Parts of Speech Tagging: Why? As a basis for (Partial) Parsing Information Retrieval Word-sense disambiguation Speech synthesis Part-of-speech (word class, morph. class, syntactic category) gives a significant amount of info about the word and its neighbors Useful in the following NLP tasks:

10/30/2015CPSC503 Winter Parts of Speech Eight basic categories –Noun, verb, pronoun, preposition, adjective, adverb, article, conjunction These categories are based on: –morphological properties (affixes they take) –distributional properties (what other words can occur nearby) –e.g, green It is so…, both…, The… is Not semantics!

10/30/2015CPSC503 Winter Parts of Speech Two kinds of category –Closed class (generally are function words) Prepositions, articles, conjunctions, pronouns, determiners, aux, numerals –Open class Nouns (proper/common; mass/count), verbs, adjectives, adverbs Very short, frequent and important Objects, actions, events, properties If you run across an unknown word….??

10/30/2015CPSC503 Winter PoS Distribution Parts of speech follow a usual behavior in Language Words 1 PoS 2 PoS (unfortunately very frequent) >2 PoS …but luckily different tags associated with a word are not equally likely ~35k ~4k

10/30/2015CPSC503 Winter Sets of Parts of Speech:Tagsets Most commonly used: –45-tag Penn Treebank, –61-tag C5, –146-tag C7 The choice of tagset is based on the application (do you care about distinguishing between “to” as a prep and “to” as a infinitive marker?) Accurate tagging can be done with even large tagsets

10/30/2015CPSC503 Winter PoS Tagging Dictionary word i -> set of tags from Tagset Brainpower_NN,_, not_RB physical_JJ plant_NN,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN._. ………. Brainpower, not physical plant, is now a firm's chief asset. ………… Input text Output Tagger

10/30/2015CPSC503 Winter Tagger Types Rule-based ‘95 Stochastic –HMM tagger ~ >= ’92 –Transformation-based tagger (Brill) ~ >= ’95 –MEMM (Maximum Entropy Markov Models) ~ >= ’97 (if interested sec )

10/30/2015CPSC503 Winter Rule-Based (ENGTWOL ‘95) 1.A lexicon transducer returns for each word all possible morphological parses 2.A set of ~3,000 constraints is applied to rule out inappropriate PoS Step 1: sample I/O “Pavlov had show that salivation….” Pavlov N SG PROPER had HAVE V PAST SVO HAVE PCP2 SVO shown SHOW PCP2 SVOO …… that ADV PRON DEM SG CS …….. ……. Sample Constraint Example: Adverbial “that” rule Given input: “that” If (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A) Then eliminate non-ADV tags Else eliminate ADV

10/30/2015CPSC503 Winter HMM Stochastic Tagging Tags corresponds to an HMM states Words correspond to the HMM alphabet symbols Tagging: given a sequence of words (observations), find the most likely sequence of tags (states) But this is…..! We need: State transition and symbol emission probabilities 1) From hand- tagged corpus 2) No tagged corpus: parameter estimation (forward/backward aka Baum-Welch)

10/30/2015CPSC503 Winter Evaluating Taggers Accuracy: percent correct (most current taggers 96-7%) *test on unseen data!* Human Celing: agreement rate of humans on classification (96-7%) Unigram baseline: assign each token to the class it occurred in most frequently in the training set (race -> NN). (91%) What is causing the errors? Build a confusion matrix…

10/30/2015CPSC503 Winter Confusion matrix Look at a confusion matrix Precision ? Recall ?

10/30/2015CPSC503 Winter Error Analysis (textbook) Look at a confusion matrix See what errors are causing problems –Noun (NN) vs ProperNoun (NNP) vs Adj (JJ) –Preterite (VBD) vs Participle (VBN) vs Adjective (JJ)

10/30/2015CPSC503 Winter Knowledge-Formalisms Map (next three lectures) Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

10/30/2015CPSC503 Winter Next Time Read Chapter 12 (syntax & Context Free Grammars)