1Ort ML A Figures and References to Topic Models, with Applications to Document Classification Wolfgang Maass Institut für Grundlagen der Informationsverarbeitung Technische Universität Graz, Austria Institute for Theoretical Computer Science
2Ort Examples for topics (that have emerged from unsupervised learning for a collection of documents) M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
3Ort Example for a document, where a topic has been assigned to each (relevant) word or in other words the latent z-variables are indicated for each word T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, vol. 101, , 2004.
4Ort The same word can occur in several topics (but in general receives different probabilities in each topic) M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
5Ort The latent z-variables choose here the right topic for the word play in each of the three documents M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
6Ort Graphical model for the joint distribution of a topic model M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
7Ort A toy example M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2007.
8Ort Performance of Gibbs sampling for this toy example: Documents were generated by mixing 2 topics in different ways, where topic 1 assigned prob. 1/3 to Bank, Money, Loan, and topic 2 1/3 to River, Stream, Bank M. Steyvers and Tom Griffiths. Probabilistic Topic Models In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, Topic assignments to words are indicated by color (b/w). Initially topics are randomly Assigned to words. After Gibbs sampling the 2 original topics are recovered from the documents.
9Ort Application to real world data: abstracts from PNAS T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, vol. 101, , Topics chosen by humans are on the y-axis, topics chosen by the algorithm on the x-axis. Darkness of pixel indicates mean prob. of the latter topic for all abstract belonging to the human-chosen category. Below are the 5 words with the highest prob. for each of the algorithm-generated topics.