6/18/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.

Slides:



Advertisements
Similar presentations
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Advertisements

Outline Why part of speech tagging? Word classes
Chapter 8. Word Classes and Part-of-Speech Tagging From: Chapter 8 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams.
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
Statistical NLP: Lecture 11
Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili University of Tehran.
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Albert Gatt Corpora and Statistical Methods Lecture 8.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Part-of-Speech Tagging & Sequence Labeling
Word classes and part of speech tagging Chapter 5.
Albert Gatt Corpora and Statistical Methods Lecture 9.
8/27/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Part-of-Speech Tagging
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Lecture 6 POS Tagging Methods Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Hindi Parts-of-Speech Tagging & Chunking Baskaran S MSRI.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Word classes and part of speech tagging 09/28/2004 Reading: Chap 8, Jurafsky & Martin Instructor: Rada Mihalcea Note: Some of the material in this slide.
CSA3202 Human Language Technology HMMs for POS Tagging.
CPSC 503 Computational Linguistics
12/6/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Human Language Technology Part of Speech (POS) Tagging II Rule-based Tagging.
Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
2/29/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Part-of-Speech Tagging & Sequence Labeling Hongning Wang
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
3/20/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Overview  Modelling Evolving Worlds with DBNs  Simplifying Assumptions Stationary Processes, Markov Assumption  Inference Tasks in Temporal Models Filtering.
Lecture 5 POS Tagging Methods
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
CSCI 5832 Natural Language Processing
CPSC 503 Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Natural Language Processing
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Presentation transcript:

6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini

6/18/2016CPSC503 Winter Today 28/9 Language Model evaluation Markov Models POS tagging

6/18/2016CPSC503 Winter Model Evaluation: Goal You may want to compare: 2-grams with 3-grams two different smoothing techniques (given the same n-grams) On a given corpus…

6/18/2016CPSC503 Winter Model Evaluation: Key Ideas Corpus Training Set Testing set A:split B: train models Models: Q 1 and Q 2 C:Apply models counting frequencies smoothing Compare results

6/18/2016CPSC503 Winter Entropy Def1. Measure of uncertainty Def2. Measure of the information that we need to resolve an uncertain situation –Let p(x)=P(X=x); where x  X. –H(p)= H(X)= -  x  X p(x)log 2 p(x) –It is normally measured in bits.

6/18/2016CPSC503 Winter Model Evaluation Actual distribution Our approximation How different? Relative Entropy (KL divergence) ? D(p||q)=  x  X p(x)log(p(x)/q(x))

6/18/2016CPSC503 Winter Entropy of Entropy rate Language Entropy Assumptions: ergodic and stationary Entropy can be computed by taking the average log probability of a looooong sample NL? Shannon-McMillan-Breiman

6/18/2016CPSC503 Winter Cross-Entropy Between probability distribution P and another distribution Q (model for P) Between two models Q 1 and Q 2 the more accurate is the one with higher =>lower cross- entropy => lower Applied to Language

6/18/2016 CPSC503 Winter Model Evaluation: In practice Corpus Training Set Testing set A:split B: train models Models: Q 1 and Q 2 C:Apply models counting frequencies smoothing Compare cross- perplexities

6/18/2016 CPSC503 Winter k-fold cross validation and t-test Randomly divide the corpus in k subsets of equal size Use each for testing (all the other for training) In practice you do k times what we saw in previous slide Now for each model you have k perplexities Compare average models perplexities with t-test

6/18/2016CPSC503 Winter Today 28/9 Language Model evaluation Markov Models POS tagging

6/18/2016CPSC503 Winter Example of a Markov Chain te h a p i Start.6 Start.4

6/18/2016CPSC503 Winter Markov-Chain Formal description: Probability of initial states t i.6.4 Stochastic Transition matrix A t tip i p a h e ahe te h a p i Start.6 Start.4

6/18/2016CPSC503 Winter Markov Assumptions Let X=(X 1,.., X t ) be a sequence of random variables taking values in some finite set S={s 1, …, s n }, the state space, the Markov properties are: (a) Limited Horizon: For all t, P(X t+1 |X 1,.., X t )=P(X t+1 | X t ) (b)Time Invariant: For all t, P(X t+1 |X t )=P(X 2 | X 1 ) i.e., the dependency does not change over time.

6/18/2016CPSC503 Winter Markov-Chain Probability of a sequence of states X 1 … X T Example: te h a p i Start.6 Start.4

6/18/2016CPSC503 Winter Knowledge-Formalisms Map Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners Markov Models Markov Chains -> n-grams Hidden Markov Models (HMM) MaxEntropy Markov Models (MEMM)

6/18/2016CPSC503 Winter HMMs (and MEMM) intro They are probabilistic sequence-classifier / sequence-lablers: assign a class/label to each unit in a sequence Used extensively in NLP Part of Speech Tagging e.g Brainpower_NN,_, not_RB physical_JJ plant_NN,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN._. Partial parsing [NP The HD box] that [NP you] [VP ordered] [PP from] [NP Shaw] [VP never arrived]. Named entity recognition [John Smith PERSON] left [IBM Corp. ORG] last summer.

6/18/2016CPSC503 Winter Hidden Markov Model (State Emission) s1s1 a b i Start.6 Start.4 s2s2 a s3s3 s4s4 i a b b

6/18/2016CPSC503 Winter Hidden Markov Model Formal Specification as five-tuple Set of States Output Alphabet Initial State Probabilities State Transition Probabilities Symbol Emission Probabilities s1s1 a b i Start.6 Start.4 s2s2 a s3s3 s4s4 i a b b

6/18/2016CPSC503 Winter Three fundamental questions for HMMs Decoding: Finding the probability of an observation sequence brute force or Forward/Backward-Algorithms Manning/Schütze, 2000: 325 Finding the most likely state sequence Viterbi-Algorithm Training: find model parameters which best explain the observations

6/18/ Computing the probability of an observation sequence O= o 1... o T X = all sequences of T states e.g., P(b,i | sample HMM ) s1s1 a b i Start.6 Start.4 s2s2 a s3s3 s4s4 i a b b CPSC503 Winter 2010

6/18/2016CPSC503 Winter Decoding Example Manning/Schütze, 2000: 327 s 1, s 1 = 0 ? s 1, s 4 = 1 *.5 *.6 *.7 s 2, s 4 = 0? ………. s 1, s 2 = 1 *.1 *.6 *.3 ………. Complexity  s1s1 a b i Start.6 Start.4 s2s2 a s3s3 s4s4 i a b b

6/18/2016CPSC503 Winter The forward procedure 1. Initialization 2. Induction 3. Total Complexity s1s1 a b i Start.6 Start.4 s2s2 a s3s3 s4s4 i a b b

6/18/2016CPSC503 Winter Three fundamental questions for HMMs Decoding: Finding the probability of an observation sequence brute force or Forward or Backward Algorithm Finding the most likely state sequence Viterbi-Algorithm Training: find model parameters which best explain the observations If interested in details of Backward algorithm and the next two questions, read (Sections 6.4 – 6.5)

6/18/2016CPSC503 Winter Maybe Today 28/9 ……. Hidden Markov Models: –definition –the three key problems (only one in detail) Part-of-speech tagging –What it is, Why we need it… –Word classes (Tags) Distribution Tagsets –How to do it Rule-based Stochastic

6/18/2016CPSC503 Winter Parts of Speech Tagging: What Brainpower_NN,_, not_RB physical_JJ plant_NN,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN._. Tag meanings NNP (Proper N sing), RB (Adv), JJ (Adj), NN (N sing. or mass), VBZ (V 3sg pres), DT (Determiner), POS (Possessive ending),. (sentence-final punct) Output Brainpower, not physical plant, is now a firm's chief asset. Input

6/18/2016CPSC503 Winter Parts of Speech Tagging: Why? As a basis for (Partial) Parsing Information Retrieval Word-sense disambiguation Speech synthesis … and many others as features for Machine Learning Part-of-speech (word class, morph. class, syntactic category) gives a significant amount of info about the word and its neighbors Useful in the following NLP tasks:

6/18/2016CPSC503 Winter Parts of Speech Eight basic categories –Noun, verb, pronoun, preposition, adjective, adverb, article, conjunction These categories are based on: –morphological properties (affixes they take) –distributional properties (what other words can occur nearby) –e.g, green It is so…, both…, The… is Not semantics!

6/18/2016CPSC503 Winter Parts of Speech Two kinds of category –Closed class (generally are function words) Prepositions, articles, conjunctions, pronouns, determiners, aux, numerals –Open class Nouns (proper/common; mass/count), verbs, adjectives, adverbs Very short, frequent and important Objects, actions, events, properties If you run across an unknown word….??

6/18/2016CPSC503 Winter PoS Distribution Parts of speech follow a usual behavior in Language Words 1 PoS 2 PoS (unfortunately very frequent) >2 PoS …but luckily different tags associated with a word are not equally likely ~35k ~4k

6/18/2016CPSC503 Winter Sets of Parts of Speech:Tagsets Most commonly used: –45-tag Penn Treebank, –61-tag C5, –146-tag C7 The choice of tagset is based on the application (do you care about distinguishing between “to” as a prep and “to” as a infinitive marker?) Accurate tagging can be done with even large tagsets

6/18/2016CPSC503 Winter PoS Tagging Dictionary word i -> set of tags from Tagset Brainpower_NN,_, not_RB physical_JJ plant_NN,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN._. ………. Brainpower, not physical plant, is now a firm's chief asset. ………… Input text Output Tagger

6/18/2016CPSC503 Winter Tagger Types Rule-based ‘95 Stochastic –HMM tagger ~ >= ’92 –Transformation-based tagger (Brill) ~ >= ’95 –MEMM (Maximum Entropy Markov Models) ~ >= ’97 (if interested sec )

6/18/2016CPSC503 Winter Rule-Based (ENGTWOL ‘95) 1.A lexicon transducer returns for each word all possible morphological parses 2.A set of ~3,000 constraints is applied to rule out inappropriate PoS Step 1: sample I/O “Pavlov had show that salivation….” Pavlov N SG PROPER had HAVE V PAST SVO HAVE PCP2 SVO shown SHOW PCP2 SVOO …… that ADV PRON DEM SG CS …….. ……. Sample Constraint Example: Adverbial “that” rule Given input: “that” If (+1 A/ADV/QUANT) (+2 SENT-LIM) (NOT -1 SVOC/A) Then eliminate non-ADV tags Else eliminate ADV

6/18/2016CPSC503 Winter HMM Stochastic Tagging Tags corresponds to an HMM states Words correspond to the HMM alphabet symbols Tagging: given a sequence of words (observations), find the most likely sequence of tags (states) But this is…..! We need: State transition and symbol emission probabilities 1) From hand- tagged corpus 2) No tagged corpus: parameter estimation (forward/backward aka Baum-Welch)

6/18/2016CPSC503 Winter Evaluating Taggers Accuracy: percent correct (most current taggers 96-7%) *test on unseen data!* Human Celing: agreement rate of humans on classification (96-7%) Unigram baseline: assign each token to the class it occurred in most frequently in the training set (race -> NN). (91%) What is causing the errors? Build a confusion matrix…

6/18/2016CPSC503 Winter Confusion matrix Look at a confusion matrix Precision ? Recall ?

6/18/2016CPSC503 Winter Error Analysis (textbook) Look at a confusion matrix See what errors are causing problems –Noun (NN) vs ProperNoun (NNP) vs Adj (JJ) –Preterite (VBD) vs Participle (VBN) vs Adjective (JJ)

6/18/2016CPSC503 Winter Knowledge-Formalisms Map (next three lectures) Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

6/18/2016CPSC503 Winter Next Time Read Chapter 12 (syntax & Context Free Grammars)