Integrating Topics and Syntax Paper by Thomas Griffiths, Mark Steyvers, David Blei, Josh Tenenbaum Presentation by Eric Wang 9/12/2008.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Topic models Source: Topic models, David Blei, MLSS 09.
Hierarchical Dirichlet Process (HDP)
Information retrieval – LSI, pLSI and LDA
Nonparametric hidden Markov models Jurgen Van Gael and Zoubin Ghahramani.
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
Title: The Author-Topic Model for Authors and Documents
Chapter 4 Probability and Probability Distributions
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
John Lafferty, Andrew McCallum, Fernando Pereira
A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability Ramesh Nallapati, William Cohen and John.
Latent Dirichlet Allocation a generative model for text
A probabilistic approach to semantic representation Paper by Thomas L. Griffiths and Mark Steyvers.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
British Museum Library, London Picture Courtesy: flickr.
Part of speech (POS) tagging
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
THE FUNDAMENTALS OF WRITING ONE CAN ONLY READ WHEN SOMEONE HAS WRITTEN SOMETHING.
ELN – Natural Language Processing Giuseppe Attardi
Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará
Parts of Speech Sudeshna Sarkar 7 Aug 2008.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Conditional Topic Random Fields Jun Zhu and Eric P. Xing ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Online Learning for Latent Dirichlet Allocation
Natural Language Processing Lecture 6 : Revision.
Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Hidden Topic Markov Models Amit Gruber, Michal Rosen-Zvi and Yair Weiss in AISTATS 2007 Discussion led by Chunping Wang ECE, Duke University March 2, 2009.
Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
Integrating Topics and Syntax -Thomas L
Linguistics The eleventh week. Chapter 4 Syntax  4.1 Introduction  4.2 Word Classes.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Artificial Intelligence: Natural Language
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Latent Dirichlet Allocation
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Differences between Spoken and Written Discourse Source: Paltridge, p.p
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Online Multiscale Dynamic Topic Models
Advanced English 6 November 1-2, 2017
Parts of Speech Mr. White English I.
Michal Rosen-Zvi University of California, Irvine
Introduction to Text Analysis
Nonparametric Bayesian Texture Learning and Synthesis
Topic Models in Text Processing
Advanced English 6 November 10, 14
Histograms.
Presentation transcript:

Integrating Topics and Syntax Paper by Thomas Griffiths, Mark Steyvers, David Blei, Josh Tenenbaum Presentation by Eric Wang 9/12/2008

Outline Introduction/Motivation –LDA and semantic words –HMM and syntactical words Combining syntax and semantics: HMM- LDA Inference Results Conclusion

Introduction/Motivation 1 In human speech, some words provide meaning while other provide structure. Syntactic words span at most a sentence and serve to provide structure and are also called function words. Semantic words span entire documents, and sometimes entire collections of documents. They lend meaning to a document and are also called content words. How do we learn both learn both topics and structure without any prior knowledge of either?

Introduction/Motivation 2 In traditional topic modeling, such as LDA, we remove most syntactic words since we are only interested in meaning. In doing so, we discard much of the structure, and all of the order the original author intended. In topic modeling, we are concerned long-range topic dependencies rather document structure. We refer to many syntactic words as stopwords.

Introduction/Motivation 3 HMMs are useful for segmenting documents into different types of words, regardless of meaning. For example, all nouns will be grouped together because they play the same role in different passages/documents. Syntactic dependencies last at most for a sentence. The standardized nature of grammar means that it stays fairly constant across different contexts.

Combining syntax and semantics 1 All words (both syntactic and semantic) exhibit short range dependencies. Only content words exhibit long range semantic dependencies. This leads to the HMM-LDA. HMM-LDA is a composite model, in which an HMM decides the parts of speech, and a topic model (LDA) extracts topics only those words which are deemed semantic.

Generative Process 1 Class assignments for each word, where each taking one of C word classes Topic assignments for each word, where each taking one of T topics Words form document d where each word is one of W words Definitions Multinomial distribution over topics for document d Multinomial distribution over semantic words for topic indicated by z. Multinomial distribution over non-semantic words for class indicated by class c. Transition probability from to

Generative Process 2 ~ ~ ~ ~ Where is the row of the transition matrix indicated by c. Draw topic distribution Draw a topic for word i Draw a class for word i from transition matrix Draw a semantic word Draw a syntactic word OR For document d Semantic class

Graphical Model 1 HMM LDA

Simple Example 1 Essentially, LDA-HMM plays a stochastic game of Madlibs, choosing words to fill a function in a passage. The HMM allocates words which vary across context to the semantic class, since grammar is fairly standardized but content is not. Semantic Class Verb Class Preposition class

is the number of words in document assigned to topic Model Inference 1 Topic indicators is the number of words in topic that are the same as All counts include only words for which and exclude word MCMC inference

Model Inference 2 Class indicators is the number of words in document assigned to topic is the number of words in topic that are the same as All counts exclude transitions to and from is the number of words in class that are the same as is the number of transitions from class to class is an indicator variable which equals 1 if argument is true

Extreme Cases 1 If we set the number of semantic topics to T = 1, then the model reduces to an HMM parts of speech tagger. If we set the number of HMM classes to C = 2, where one state is for punctuation, the the model reduces to LDA.

Results 1 LDA only HMM-LDA Semantic Topics HMM-LDA Syntactic Classes Brown + TASA corpus: 38,151 documents; Vocab Size = 37,202; number of word tokens = 13,328,397 words

Results 2 NIPS Papers 1713 documents; Vocabulary Size: 17268; Number of word tokens = 4,321,614 Semantic Words Syntactic Words

Results 2 (cont’d) NIPS Papers 1713 documents; Vocabulary Size: 17268; Number of word tokens = 4,321,614 Black words are semantic, Graylevel words are syntactic. Boxed words are semantic on one passsage and syntactic in another. Asterisked words have low frequency and not considered.

Results 3 Log Marginal probabilities of the data

Results 4 Parts of speech tagging Black bars indicate performance on a fine tagset (297 word types), white bars indicate performance on coarse tagset (10 word types). Composite Model HMM

Conclusion HMM-LDA is a composite topic model which considers both long range semantic dependencies and short range syntactic dependencies. The model is quite competitive with a traditional HMM parts of speech tagger, and outperforms LDA when stopwords and punctuation are not removed.