NLP. Introduction to NLP Extrinsic –Use in an application Intrinsic –Cheaper Correlate the two for validation purposes.

Slides:



Advertisements
Similar presentations
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Advertisements

Smoothing N-gram Language Models Shallow Processing Techniques for NLP Ling570 October 24, 2011.
Language modelling using N-Grams Corpora and Statistical Methods Lecture 7.
1 Codes, Ciphers, and Cryptography-Ch 2.3 Michael A. Karls Ball State University.
1 I256: Applied Natural Language Processing Marti Hearst Sept 13, 2006.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
REDUCED N-GRAM MODELS FOR IRISH, CHINESE AND ENGLISH CORPORA Nguyen Anh Huy, Le Trong Ngoc and Le Quan Ha Hochiminh City University of Industry Ministry.
Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University.
Readability Assessment for Text Simplification Sandra Aluisio 1, Lucia Specia 2, Caroline Gasperin 1, Carolina Scarton 1 1 University of São Paulo, Brazil.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Probabilistic Pronunciation + N-gram Models CSPP Artificial Intelligence February 25, 2004.
N-Gram Language Models CMSC 723: Computational Linguistics I ― Session #9 Jimmy Lin The iSchool University of Maryland Wednesday, October 28, 2009.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Document Centered Approach to Text Normalization Andrei Mikheev LTG University of Edinburgh SIGIR 2000.
Introduction to Language Models Evaluation in information retrieval Lecture 4.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Learning Bit by Bit Class 4 - Ngrams. Ngrams Counting words Using observation to make predictions.
Natural Language Processing Expectation Maximization.
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
1 Advanced Smoothing, Evaluation of Language Models.
8/27/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
CMU-Statistical Language Modeling & SRILM Toolkits
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
A hybrid method for Mining Concepts from text CSCE 566 semester project.
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.
9/22/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) LM Smoothing (The EM Algorithm) Dr. Jan Hajič CS Dept., Johns.
1 Introduction to Natural Language Processing ( ) LM Smoothing (The EM Algorithm) AI-lab
Language modelling María Fernández Pajares Verarbeitung gesprochener Sprache.
N-gram Models CMSC Artificial Intelligence February 24, 2005.
Chapter 23: Probabilistic Language Models April 13, 2004.
Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Ngram models and the Sparcity problem. The task Find a probability distribution for the current word in a text (utterance, etc.), given what the last.
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
12/6/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Estimating N-gram Probabilities Language Modeling.
A COMPARISON OF HAND-CRAFTED SEMANTIC GRAMMARS VERSUS STATISTICAL NATURAL LANGUAGE PARSING IN DOMAIN-SPECIFIC VOICE TRANSCRIPTION Curry Guinn Dave Crist.
Search and Decoding Final Project Identify Type of Articles Using Property of Perplexity By Chih-Ti Shih Advisor: Dr. V. Kepuska.
Supertagging CMSC Natural Language Processing January 31, 2006.
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.
Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)
Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,
Natural Language Processing Statistical Inference: n-grams
HANGMAN OPTIMIZATION Kyle Anderson, Sean Barton and Brandyn Deffinbaugh.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
2/29/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Statistical NLP Spring 2011 Lecture 3: Language Models II Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Learning, Uncertainty, and Information: Evaluating Models Big Ideas November 12, 2004.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.
METEOR: Metric for Evaluation of Translation with Explicit Ordering An Improved Automatic Metric for MT Evaluation Alon Lavie Joint work with: Satanjeev.
Exploiting the distance and the occurrence of words for language modeling Chong Tze Yuang.
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
Language Modelling By Chauhan Rohan, Dubois Antoine & Falcon Perez Ricardo Supervised by Gangireddy Siva 1.
The Least-Squares Regression Line
Nearest-Neighbor Classifiers
Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13
Introduction to Text Analysis
CPSC 503 Computational Linguistics
Conceptual grounding Nisheeth 26th March 2019.
Language model using HTK
Presentation transcript:

NLP

Introduction to NLP

Extrinsic –Use in an application Intrinsic –Cheaper Correlate the two for validation purposes

Does the model fit the data? –A good model will give a high probability to a real sentence Perplexity –Average branching factor in predicting the next word –Lower is better (lower perplexity -> higher probability) –N = number of words

Example: –A sentence consisting of N equiprobable words: p(wi) = 1/k –Per = (1/k)^N^(-1/N) = k Perplexity is like a branching factor Logarithmic version:

Consider the Shannon game: –New York governor Andrew Cuomo said... What is the perplexity of guessing a digit if all digits are equally likely? –10 How about a letter? –26 How about guessing A (“operator”) with a probability of 1/4, B (“sales”) with a probability of 1/4 and 10,000 other cases with a probability of 1/2 total (example modified from Joshua Goodman).

What if the actual distribution is very different from the expected one? Example: –All of the 10,000 other cases are equally likely but P(A) = P(B) = 0. Cross-entropy = log (perplexity), measured in bits

Wall Street Journal (WSJ) corpus –38 M words (tokens) –20 K types Perplexity –Evaluated on a separate 1.5M sample of WSJ documents –Unigram 962 –Bigram 170 –Trigram 109

Another evaluation metric –Number of insertions, deletions, and substitutions –Normalized by sentence length –Same as Levenshtein Edit Distance Example: –governor Andrew Cuomo met with the mayor –the governor met the senator –3 deletions + 1 insertion + 1 substitution = WER of 5

Out of vocabulary words (OOV) –Split the training set into two parts –Label all words in part 2 that were not in part 1 as Clustering –e.g., dates, monetary amounts, organizations, years

This is where n-gram language models fail by definition Missing syntactic information –The students who participated in the game are tired –The student who participated in the game is tired Missing semantic information –The pizza that I had last night was tasty –The class that I had last night was interesting

Syntactic models –Condition words on other words that appear in a specific syntactic relation with them Caching models –Take advantage of the fact that words appear in bursts

SRI-LM – CMU-LM – Google n-gram corpus – you.htmlhttp://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to- you.html Google book n-grams –

house a house after house all house and house are house arrest house as house at house before house built house but house by house can house cleaning house design house down house fire house for house former house from house had house has house he house hotel house in house is house music house near house now house of house on house or house party house plan house plans house price house prices house rental house rules house share house so house that house the house to house training house value135820

NLP