N-Gram Based Approaches

Slides:

Advertisements

Similar presentations

Traditional IR models Jian-Yun Nie.

Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.

Approaches for Automatically Tagging Affect Nathanael Chambers, Joel Tetreault, James Allen University of Rochester Department of Computer Science.

Political Party, Gender, and Age Classification Based on Political Blogs Michelle Hewlett and Elizabeth Lingg.

Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.

CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.

Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.

1/13 Parsing III Probabilistic Parsing and Conclusions.

1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.

Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.

1 Advanced Smoothing, Evaluation of Language Models.

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.

Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.

Analysis of a Neural Language Model Eric Doi CS 152: Neural Networks Harvey Mudd College.

Chapter 6: Statistical Inference: n-gram Models over Sparse Data

Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.

ABINIT INPUT MAKER By Simon Pesant and BenjaminTardif 29/01/2007.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

E.g.: MS-DOS interface. DIR C: /W /A:D will list all the directories in the root directory of drive C in wide list format. Disadvantage is that commands.

Chapter 23: Probabilistic Language Models April 13, 2004.

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

I Power Higher Computing Software Development Development Languages and Environments.

1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.

Query Segmentation Using Conditional Random Fields Xiaohui and Huxia Shi York University KEYS’09 (SIGMOD Workshop) Presented by Jaehui Park,

National Taiwan University, Taiwan

C.Watterscsci64031 Probabilistic Retrieval Model.

02/12/03© 2003 University of Wisconsin Last Time Intro to Monte-Carlo methods Probability.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Language Model for Machine Translation Jang, HaYoung.

Exponential Differential Document Count A Feature Selection Factor for Improving Bayesian Filters Fidelis Assis 1 William Yerazunis 2 Christian Siefkes.

Language Identification and Part-of-Speech Tagging

DATA COLLECTION Data Collection Data Verification and Validation.

N-Grams Chapter 4 Part 2.

Tolerant Retrieval Review Questions

CS3055 Beacon Module 12 Task Manager

Maximum Entropy Models and Feature Engineering CSCI-GA.2591

Key Ideas from day 1 slides

Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky

Erasmus University Rotterdam

CSCI 5417 Information Retrieval Systems Jim Martin

Algorithms An algorithm is a sequence of steps written in the form of English phrases that specific the tasks that are performed while solving the problem.It.

Algorithm Algorithm is a step-by-step procedure or formula or set of instruction for solving a problem Its written in English language or natural language.

Lecture 15: Text Classification & Naive Bayes

CSCI 5832 Natural Language Processing

Neural Language Model CS246 Junghoo “John” Cho.

CSc4730/6730 Scientific Visualization

LING/C SC 581: Advanced Computational Linguistics

Learning to Classify Documents Edwin Zhang Computer Systems Lab

N-Gram Model Formulas Word sequences Chain rule of probability

Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13

ML – Lecture 3B Deep NN.

Chapter 6: Statistical Inference: n-gram Models over Sparse Data

GRAPHICAL USER INTERFACE GITAM GADTAULA. OVERVIEW What is Human Computer Interface (User Interface) principles of user interface design What makes a good.

GRAPHICAL USER INTERFACE GITAM GADTAULA KATHMANDU UNIVERSITY CLASS PRESENTATION.

CPSC 503 Computational Linguistics

CBA Assessments in Eduphoria Aware

University of Illinois System in HOO Text Correction Shared Task

Chapter 8: Estimating with Confidence

Word embeddings (continued)

INF 141: Information Retrieval

Naïve Bayes Text Classification

Learning Intention I will learn about the standard algorithm for input validation.

Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi

Primary School Computing

Stance Classification of Ideological Debates

Visual Grounding.

Presentation transcript:

N-Gram Based Approaches n-gram: a sequential list of n words, often used in information retrieval and language modeling to encode the likelihood that the phrase will appear in the future N-gram Based Approaches create probabilistic models of n-grams from a given corpus of text and tag new utterances using these models. “I don’t know what to say” 1-gram (unigram): I, don’t, know, what, to, say 2-gram (bigram): I don’t, don’t know, know what, what to, to say 3-gram (trigram): I don’t know, don’t know what, know what to, etc. … n-gram

N-Gram Motivation Advantages Disadvantages Encode not just keywords, but also word ordering, automatically Models are not biased by hand coded lists of words, but are completely dependent on real data Learning features of each affect type is relatively fast and easy Human intuition is often incorrect and misses subtleties in language Disadvantages Long range dependencies are not captured Dependent on having a corpus of data to train from Sparse data for low frequency affect tags adversely affects the quality of the n-gram model

N-Gram Approaches Naïve Approach Standard n-grams only Weighted Approach Weight the longer n-grams higher in the stochastic model Lengths Approach Include a length-of-utterances factor, capturing the differences in utterance length between affect tags Weights with Lengths Approach Combine Weighted with Lengths Analytical Approach Include word repetition as a factor in the models, isolating acknowledgement utterances from other types

P(tagi | utt) = maxj,k P(tagi | ngramjk) Naïve Approach P(tagi | utt) = maxj,k P(tagi | ngramjk) Find the highest probability n-gram in a given utterance utt for each possible tag tagi and choose the tag with the highest probability.

Naïve Approach Example I don’t want to be chained to a wall. N-gram Tag Top N-gram Probability 1 GEN don’t 0.665 2 to a 0.692 3 <s> I don’t 0.524 4 DTL don’t want to be 0.833 5 I don’t want to be 1.00

P(tagi | utt) = ∑k=0,m ((maxj P(tagi | ngramjk)) * weightk) Weighted Approach P(tagi | utt) = ∑k=0,m ((maxj P(tagi | ngramjk)) * weightk) weightk = hand coded weights for each n-gram length k = {1,2,3,4,5} weightk = { 0.4, 0.4, 0.5, 0.8, 0.8 } Sum the highest probability n-gram in utt of each possible tag tagi and multiple the sum by a weight based on the size of the n-gram (5-grams contain more information than 1-grams).

Weighted Approach Example I don’t want to be chained to a wall. N-gram Tag Top N-gram Probability 1 GEN DTL don’t want 0.665 0.452 2 to a want to 0.692 0.443 3 <s> I don’t I don’t want 0.524 0.592 4 I don’t want to don’t want to be 0.27 0.833 5 <s> I don’t want to I don’t want to be 0.25 1.00 GEN sum (w/weights) 1.255 DTL sum (w/weights) 2.086

P(tagi | utt) = (maxj,k P(tagi | ngramjk)) * lenWeightim Lengths Approach P(tagi | utt) = (maxj,k P(tagi | ngramjk)) * lenWeightim lenWeightim = probability a sentence m is tagged with tagi based on m’s length. Computed using the average length and standard deviations in the training data Find the highest probability n-gram in utt for tag tagi and multiple it by the probability that a sentence of utt’s length is tagged tagi.

Lengths Approach Example I don’t want to be chained to a wall. N-gram Tag Top N-gram Probability 1 GEN don’t 0.665 * 0.0396 = 0.026 2 to a 0.692 * 0.0396 = 0.027 3 <s> I don’t 0.524 * 0.0396 = 0.021 4 DTL don’t want to be 0.833 * 0.0228 = 0.019 5 I don’t want to be 1.000 * 0.0228 = 0.023

Weights with Lengths Approach I don’t want to be chained to a wall. Weighted Approach: GEN sum (w/weights) 1.255 DTL sum (w/weights) 2.086 With Lengths: GEN sum (w/weights) 1.255 * 0.0396 = 0.0497 DTL sum (w/weights) 2.086 * 0.0228 = 0.0476 Adding the lengths weight changes the tag choice from DTL to GEN.

Analytical Approach Many acknowledgement ACK utterances were being mistagged as GEN by the previous approaches. Most of the errors came from grounding that involved word repetition: A - so then you check that your tire is not flat. B - check the tire We created a model that takes into account word repetition in adjacent utterances in a dialogue. We also include a length probability to capture the Lengths Approach. Only unigrams are used to avoid sparseness in the training data.

Analytical Approach P(w1 | T) * P(w2 | T) * … * P(wn | T) * P(Rw1 | Ow1, L, Lp, T) * … * P(Rwn | Own, L, Lp, T) * P(L | T) * P(T) unigram probabilities Probability that each word is repeated given it occurred in the previous utterance and the given the lengths of both utterances Length probability times the tag’s overall probability

N-Gram Approaches Results 6-Fold Cross Validation on the UR Marriage Corpus Naive Weighted Lengths Weights with Lengths Analytical 66.80% 67.43% 64.35% 66.02% 66.60% 6-Fold Cross Validation on the Switchboard Corpus Naive Weighted Lengths Weights with Lengths Analytical 68.41% 68.77% 69.01% 70.08% 61.40%

CATS CATS: An Automated Tagging System for affect and other similar information retrieval tasks. Written in Java for cross-platform interoperability. Implements the Naïve approach with unigrams and bigrams only. Builds the stochastic models automatically off of a tagged corpus, input by the user into the GUI display. Automatically tags new data using the user’s models. Each tag also receives a confidence score, allowing the user to hand check the dialogue quickly and with greater confidence.

The CATS GUI provides a clear workspace for text and tags. Tagging new data and training old data is done with a mouse click.

Customizable models are available Customizable models are available. Create your own list of tags, provide a training corpus, and build a new model.

Tags are marked with confidence scores based on the probabilistic models.