Results: Prominence prediction without lexical information Each type of feature reduces the error rate over the baseline. SRF and INF features appear to.

Slides:



Advertisements
Similar presentations
Ani Nenkova Lucy Vanderwende Kathleen McKeown SIGIR 2006.
Advertisements

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Evaluation of Decision Forests on Text Categorization
R OBERTO B ATTITI, M AURO B RUNATO The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Results: Word prominence detection models Each feature set increases accuracy over the 69% baseline accuracy. Word Prominence Detection using Robust yet.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
What is Statistical Modeling
Confidential and Proprietary. Copyright © 2010 Educational Testing Service. All rights reserved. Catherine Trapani Educational Testing Service ECOLT: October.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
MCS 2005 Round Table In the context of MCS, what do you believe to be true, even if you cannot yet prove it?
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Machine Learning CMPT 726 Simon Fraser University
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Introduction to Language Models Evaluation in information retrieval Lecture 4.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
Lecture II.  Using the example from Birenens Chapter 1: Assume we are interested in the game Texas lotto (similar to Florida lotto).  In this game,
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
AdaBoost Robert E. Schapire (Princeton University) Yoav Freund (University of California at San Diego) Presented by Zhi-Hua Zhou (Nanjing University)
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Opinion Mining Using Econometrics: A Case Study on Reputation Systems Anindya Ghose, Panagiotis G. Ipeirotis, and Arun Sundararajan Leonard N. Stern School.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Automated Scoring of Picture- based Story Narration Swapna Somasundaran Chong Min Lee Martin Chodorow Xinhao Wang.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Copyright © 2013 by Educational Testing Service. All rights reserved. 14-June-2013 Detecting Missing Hyphens in Learner Text Aoife Cahill *, Martin Chodorow.
National Taiwan University, Taiwan
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
A Maximum Entropy Based Honorificity Identification for Bengali Pronominal Anaphora Resolution Apurbalal Senapati and Utpal Garain Presented by Samik Some.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Computational Intelligence: Methods and Applications Lecture 33 Decision Tables & Information Theory Włodzisław Duch Dept. of Informatics, UMK Google:
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Language Identification and Part-of-Speech Tagging
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Linguistic knowledge for Speech recognition
CRF &SVM in Medication Extraction
Erasmus University Rotterdam
CSC 594 Topics in AI – Natural Language Processing
Roberto Battiti, Mauro Brunato
Exam #3 Review Zuyin (Alvin) Zheng.
Information Structure and Prosody
Outline S. C. Zhu, X. Liu, and Y. Wu, “Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo”, IEEE Transactions On Pattern Analysis And Machine.
ML – Lecture 3B Deep NN.
Presentation transcript:

Results: Prominence prediction without lexical information Each type of feature reduces the error rate over the baseline. SRF and INF features appear to be more predictive than SIF features. Overall reduction can be as large as 32% over the baseline error when all features are combined. Predicting Relative Prominence in Noun-Noun Compounds Abstract There are several theories regarding what influences prominence assignment in English noun-noun compounds. We have developed corpus-driven models for automatically predicting prominence assignment in noun-noun compounds using feature sets based on two such theories: the informativeness theory and the semantic composition theory. The evaluation of the prediction models indicate that though both of these theories are relevant, they account for different types of variability in prominence assignment. Taniya MishraSrinivas Bangalore AT&T Labs-Research,180 Park Avenue, Florham Park, NJ AT&T Labs-Research, 180 Park Ave, Florham Park, NJ Prosodic Prominence in Text-to-Speech Synthesis Prosody plays a vital role in the intelligibility and naturalness of text-to-speech synthesis. Prosody prediction involves predicting which words of the text are to be perceptually prominent. Prominence of a word is acoustically realized by endowing the synthesis of the word with greater pitch, greater energy and/or longer duration than the neighboring words. Relative prominence prediction in noun-noun compounds remains a challenging problem. Theories about prominence in Noun-noun compounds Structural theory (Bloomfield,1933; Marchand, 1969; Heinz, 2004) NN compounds are regularly left-prominent; right-prominent NN combinations are syntactic phrases. Analogical theory (Schmerling, 1971; Olsen, 2000) Prominence assignment in analogy to similar compounds in lexicon. Semantic theory (Fudge, 1984; Liberman and Sproat, 1992) Relative prominence is decided by semantic relationship between the two nouns. Informativeness theory (Bolinger, 1972; Ladd, 1984) Relatively more informative and unexpected noun is given greater prominence. Our paper: Compares informativeness theory and semantic composition theory using corpus-driven statistical methods in discourse-neutral contexts. Informativeness (INF) Measures We used the following five metrics to compare the individual and relative informativeness of nouns in each noun-noun compound. Unigram Predictability (UP): Logarithm of the probability of a word given a text corpus: Bigram Predictability (BP): Logarithm of the conditional probability of the second noun, given the first noun. Pointwise Mutual Information (PMI): Logarithm of the ratio of the joint probability of the two nouns and the product of their marginal probabilities. Dice Coefficient (DC): A collocation measure defined as: Pointwise Kullback-Leibler Divergence (PKL): Relative entropy of the second noun given the first noun. Semantic Relationship Modeling Each of the two nouns in a noun-noun compound is assigned a semantic category vector. Semantic category vector (SCV): 26 elements representing categories (such as food, event, act, location, artifact) assigned to nouns in WordNet. SVC of a noun is a bit vector of 26 dimensions with an element assigned a value of 1 if the lemmatized noun is assigned the associated category by WordNet, 0 otherwise. Semantic relationship (SRF) of two nouns is defined as a cross-product of their semantic category vectors. Semantic Informativeness Features (SIF): We also maintain, for each noun, 1.Number of possible synsets associated with the noun 2.Left positional family size 3.Right positional family size Positional family size is the number of unique noun-noun compounds that include the particular noun, either on the left or on the right (Bell and Plag, 2010). Intuition: Smaller the synset count, the more specific the meaning of a noun and hence more information content. Larger positional family size indicates that the noun is present in many possible compounds, and less likely to receive higher prominence. Experiments Data description: Corpus of 7767 noun-noun compounds randomly selected from the Associated Press newswire. Hand-labeled for left or right prominence (Sproat, 1994) Computed the informativeness features for each word using LDC English Gigaword corpus. Semantic category vectors for each noun were constructed using Wordnet. Using each of the three feature sets, we built a Boostexter-based discriminative binary classifier (Freund and Schapire, 1996) to predict relative prominence. Training data: 6835 samples; Test data: 932 samples Evaluation: Average prominence prediction error using 5-fold cross validation. Baseline: Majority class (left noun prominence) to all test samples. Results: Prominence prediction with lexical information Incorporating lexical information provides substantial improvement; more than 52% error reduction over baseline error. (Sproat 1994): Relative error reduction over baseline using SIF: 46.6% Summary Presented a comparison of two theories of prominence in noun-noun compounds using data driven methods. Each theory accounts for different types of variability in prominence assignment. Lexical information improves prominence prediction substantially over baseline models; Non-lexical models have broader coverage and still provide significant error reduction. Feature SetsAverage baseline error (in %) Average model error (in %) % Error reduction over baseline INF SRF SIF INF+SRF INF+SIF SRF+SIF All Prominence in Noun-noun compounds Example noun-noun compounds and their discourse- neutral prominence structure: White house, cherry pie, parking lot, Madison Avenue, Wall Street, nail polish, french fries, computer programmer, dog catcher, silk tie, and self reliance. In discouse neutral context, noun-noun compounds have a leftmost prominence structure – the left noun is more prominent than the right noun. However, 25% of noun-noun compounds have right prominence structure. (Liberman and Sproat, 1992). Different theories about relative prominence assignment in noun-noun compounds exist. Feature SetsAverage baseline error (in %) Average model error (in %) % Error reduction over baseline INF SRF SIF INF+SRF INF+SIF SRF+SIF All