Download presentation
Presentation is loading. Please wait.
1
1Ort ML A 11.1.2010 Figures and References to Topic Models, with Applications to Document Classification Wolfgang Maass Institut für Grundlagen der Informationsverarbeitung Technische Universität Graz, Austria Institute for Theoretical Computer Science http://www.igi.tugraz.at/maass/
2
2Ort Examples for topics (that have emerged from unsupervised learning for a collection of 37000 documents) M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
3
3Ort Example for a document, where a topic has been assigned to each (relevant) word or in other words the latent z-variables are indicated for each word T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, vol. 101, 5228-5235, 2004.
4
4Ort The same word can occur in several topics (but in general receives different probabilities in each topic) M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
5
5Ort The latent z-variables choose here the right topic for the word play in each of the three documents M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
6
6Ort Graphical model for the joint distribution of a topic model M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
7
7Ort A toy example M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
8
8Ort Performance of Gibbs sampling for this toy example: Documents were generated by mixing 2 topics in different ways, where topic 1 assigned prob. 1/3 to Bank, Money, Loan, and topic 2 1/3 to River, Stream, Bank M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007. Topic assignments to words are indicated by color (b/w). Initially topics are randomly Assigned to words. After Gibbs sampling the 2 original topics are recovered from the documents.
9
9Ort Application to real world data: 28000 abstracts from PNAS T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, vol. 101, 5228-5235, 2004. Topics chosen by humans are on the y-axis, topics chosen by the algorithm on the x-axis. Darkness of pixel indicates mean prob. of the latter topic for all abstract belonging to the human-chosen category. Below are the 5 words with the highest prob. for each of the algorithm-generated topics.
10
10Ort
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.