Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
1 Fuchun Peng Microsoft Bing 7/23/  Query is often treated as a bag of words  But when people are formulating queries, they use “concepts” as.
Lecture 22: Evaluation April 24, 2010.
Towards Separating Trigram- generated and Real sentences with SVM Jerry Zhu CALD KDD Lab 2001/4/20.
CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.
Confidence Measures for Speech Recognition Reza Sadraei.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Assuming normally distributed data! Naïve Bayes Classifier.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
© sebastian thrun, CMU, The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University
Evaluating Hypotheses
1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Experimental Evaluation
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Ensemble Learning (2), Tree and Forest
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
For Better Accuracy Eick: Ensemble Learning
Review of normal distribution. Exercise Solution.
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
Today Evaluation Measures Accuracy Significance Testing
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Detecting Promotional Content in Wikipedia Shruti Bhosale Heath Vinicombe Ray Mooney University of Texas at Austin 1.
Text Classification, Active/Interactive learning.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Xinxiong Chen, Yabin Zheng, Maosong Sun 2011, FCCNLL Automatic Keyphrase.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Evaluating Results of Learning Blaž Zupan
Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Evaluating Results of Learning
ECE 5424: Introduction to Machine Learning
Boosting For Tumor Classification With Gene Expression Data
Ensemble learning Reminder - Bagging of Trees Random Forest
Machine Learning: Lecture 5
Presentation transcript:

Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

Semantic (in)Coherence Trigram: content words unrelated Effect on speech recognition: –Actual Utterance: “ THE BIRD FLU HAS AFFECTED CHICKENS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING HUMANS SICK” –Top Hypothesis: “THE BIRD FLU HAS AFFECTED SECONDS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING HUMAN SAID” Our goal: model semantic coherence

A Whole Sentence Exponential Model [Rosenfeld 1997] P 0 (s) is an arbitrary initial model (typically N-gram) f i (s)’s are arbitrary computable properties of s (aka features) Z is a universal normalizing constant def

A Methodology for Feature Induction Given corpus T of training sentences: 1. Train best-possible baseline model, P 0 (s) 2. Use P 0 (s) to generate corpus T 0 of “pseudo sentences” 3. Pose a challenge: find (computable) differences that allow discrimination between T and T 0 4. Encode the differences as features f i (s) 5. Train a new model:

Discrimination Task: feel - - sacrifice - - sense meant trust truth kind - free trade agreements living - - ziplock bag university japan's daiwa bank stocks step – Are these content words generated from a trigram or a natural sentence?

Building on Prior Work Define “content words” (all but top 50) Goal: model distribution of content words in sentence Simplify: model pairwise co-occurrences (“content word pairs”) Collect contingency tables; calculate measure of association for them

Q Correlation Measure Q values range from –1 to +1 W 1 yes W 1 no W 2 yesc 11 c 21 W 2 noc 12 c 22 Derived from Co-occurrence Contingency Table

Density Estimates We hypothesized: –Trigram sentences: wordpair correlation completely determined by distance –Natural sentences: wordpair correlation independent of distance kernel density estimation – distribution of Q values in each corpus – at varying distances

Q Distributions Q Value Density ---- Trigram Generated Broadcast News Distance = 1Distance = 3

Likelihood Ratio Feature she is a country singer searching for fame and fortune in nashville Q(country,nashville) = 0.76 Distance = 8 Pr (Q=0.76|d=8,BNews) = 0.32 Pr(Q=0.76|d=8,Trigram) = 0.11 Likelihood ratio = 0.32/0.11 = 2.9

Simpler Features Q Value based –Mean, median, min, max of Q values for content word pairs in the sentence (Cai et al 2000) –Percentage of Q values above a threshold –High/low correlations across large/small distances Other –Word and phrase repetition –Percentage of stop words –Longest sequence of consecutive stop/content words

Datasets LM and contingency tables (Q values) derived from 103 million words of BN From remainder of BN corpus and sentences sampled from trigram LM: –Q value distributions estimated from ~100,000 sentences –Decision tree trained and test on ~60,000 sentences Disregarded sentences with < 7 words –“Mike Stevens says it’s not real” –“We’ve been hearing about it”

Experiments Learners: –C5.0 decision tree –Boosting decision stumps with Adaboost.MH Methodology: –5-fold cross validation on ~60,000 sentences –Boosting for 300 rounds

Results Feature SetClassification Accuracy Q mean, median, min, max (Previous Work) ± 0.36 Likelihood Ratio77.76 ± 0.49 All but Likelihood Ratio80.37 ± 0.42 All Features80.37 ± 0.46 Likelihood Ratio + non-Q

Shannon-Style Experiment 50 sentences –½ “real” and ½ trigram-generated –Stopwords replaced by dashes 30 participants –Average accuracy of 73.77% ± 6 –Best individual accuracy 84% Our classifier: –Accuracy of 78.9% ± 0.42

Summary Introduced a set of statistical features which capture aspects of semantic coherence Trained a decision tree to classify with accuracy of 80% Next step: incorporate features into exponential LM

Future Work Combat data sparsity –Confidence intervals –Different correlation statistic –Stemming or clustering vocabulary Evaluate derived features –Incorporate into an exponential language model –Evaluate the model on a practical application

Agreement among Participants

Expected Perplexity Reduction Semantic coherence feature –78% of broadcast news sentences –18% of trigram-generated sentences Kullback-Leibler divergence:.814 Average perplexity reduction per word =.0419 (2^.814/21) per sentence? Features modify probability of entire sentence Effect of feature on per-word probability is small

Likelihood Value Density ---- Trigram Generated Broadcast News Distribution of Likelihood Ratio

Discrimination Task Natural Sentence: –but it doesn't feel like a sacrifice in a sense that you're really saying this is you know i'm meant to do things the right way and you trust it and tell the truth Trigram-Generated: –they just kind of free trade agreements which have been living in a ziplock bag that you say that i see university japan's daiwa bank stocks step though

Q Value Density ---- Trigram Generated Broadcast News Q Values at Distance 1

Q Value Density ---- Trigram Generated Broadcast News Q Values at Distance 3

Outline The problem of semantic (in)coherence Incorporating this into the whole- sentence exponential LM Finding better features for this model using machine learning Semantic coherence features Experiments and results