1Ort ML A 11.1.2010 Figures and References to Topic Models, with Applications to Document Classification Wolfgang Maass Institut für Grundlagen der Informationsverarbeitung.

Slides:



Advertisements
Similar presentations
Clustering Art & Learning the Semantics of Words and Pictures Manigantan Sethuraman.
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Social media == new source of information and the ground for social interaction Twitter: Noisy and content-sparse data Question: Can we carve out fine.
Mixture Models and the EM Algorithm
Information retrieval – LSI, pLSI and LDA
Title: The Author-Topic Model for Authors and Documents
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
Statistical Topic Modeling part 1
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.
Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April TexPoint fonts used in EMF. Read the TexPoint manual before.
1Neural Networks B 2009 Neural Networks B Lecture 1 Wolfgang Maass
Generative Topic Models for Community Analysis
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Discrete random variables Probability mass function Distribution function (Secs )
Latent Dirichlet Allocation a generative model for text
A probabilistic approach to semantic representation Paper by Thomas L. Griffiths and Mark Steyvers.
Jeremy Tantrum, Department of Statistics, University of Washington joint work with Alejandro Murua & Werner Stuetzle Insightful Corporation University.
Models for Authors and Text Documents Mark Steyvers UCI In collaboration with: Padhraic Smyth (UCI) Michal Rosen-Zvi (UCI) Thomas Griffiths (Stanford)
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Integrating Topics and Syntax Paper by Thomas Griffiths, Mark Steyvers, David Blei, Josh Tenenbaum Presentation by Eric Wang 9/12/2008.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Computer Simulation A Laboratory to Evaluate “What-if” Questions.
Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
The identification of interesting web sites Presented by Xiaoshu Cai.
Latent Dirichlet Allocation (LDA) Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University and Arvind Ramanathan of Oak Ridge National.
Probability and Statistics Required!. 2 Review Outline  Connection to simulation.  Concepts to review.  Assess your understanding.  Addressing knowledge.
Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
27. May Topic Models Nam Khanh Tran L3S Research Center.
Finding the Hidden Scenes Behind Android Applications Joey Allen Mentor: Xiangyu Niu CURENT REU Program: Final Presentation 7/16/2014.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Topic Modeling using Latent Dirichlet Allocation
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Web-Mining Agents Topic Analysis: pLSI and LDA
Bayesian Networks in Document Clustering Slawomir Wierzchon, Mieczyslaw Klopotek Michal Draminski Krzysztof Ciesielski Mariusz Kujawiak Institute of Computer.
Latent Dirichlet Allocation (LDA)
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
B. Freeman, Tomasz Malisiewicz, Tom Landauer and Peter Foltz,
April 20th, 2015 – HUST Baptiste Balique, Ngurah Agus Sanjaya ER
The topic discovery models
Document Classification Method with Small Training Data
The topic discovery models
Latent Dirichlet Analysis
Generative Models for probabilistic inference Michael Stewart.
Preliminaries: Distributions
The topic discovery models
Bayesian Inference for Mixture Language Models
Michal Rosen-Zvi University of California, Irvine
Section 11.7 Probability.
CS246: Latent Dirichlet Analysis
Junghoo “John” Cho UCLA
MCMC for PGMs: The Gibbs Chain
Topic Models in Text Processing
Generative Models for probabilistic inference Michael Stewart.
Simulation allows us to perform “experiments” without actually having to do the experiments in real life...this is done through the use of random numbers.
Berlin Chen Department of Computer Science & Information Engineering
Presentation transcript:

1Ort ML A Figures and References to Topic Models, with Applications to Document Classification Wolfgang Maass Institut für Grundlagen der Informationsverarbeitung Technische Universität Graz, Austria Institute for Theoretical Computer Science

2Ort Examples for topics (that have emerged from unsupervised learning for a collection of documents) M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.

3Ort Example for a document, where a topic has been assigned to each (relevant) word or in other words the latent z-variables are indicated for each word T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, vol. 101, , 2004.

4Ort The same word can occur in several topics (but in general receives different probabilities in each topic) M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.

5Ort The latent z-variables choose here the right topic for the word play in each of the three documents M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.

6Ort Graphical model for the joint distribution of a topic model M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.

7Ort A toy example M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.

8Ort Performance of Gibbs sampling for this toy example: Documents were generated by mixing 2 topics in different ways, where topic 1 assigned prob. 1/3 to Bank, Money, Loan, and topic 2 1/3 to River, Stream, Bank M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, Topic assignments to words are indicated by color (b/w). Initially topics are randomly Assigned to words. After Gibbs sampling the 2 original topics are recovered from the documents.

9Ort Application to real world data: abstracts from PNAS T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, vol. 101, , Topics chosen by humans are on the y-axis, topics chosen by the algorithm on the x-axis. Darkness of pixel indicates mean prob. of the latter topic for all abstract belonging to the human-chosen category. Below are the 5 words with the highest prob. for each of the algorithm-generated topics.

10Ort