Summary We propose a framework for jointly modeling networks and text associated with them, such as email networks or user review websites. The proposed.

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Scaling Up Graphical Model Inference
Topic models Source: Topic models, David Blei, MLSS 09.
Xiaolong Wang and Daniel Khashabi
Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Information retrieval – LSI, pLSI and LDA
Hierarchical Dirichlet Processes
Title: The Author-Topic Model for Authors and Documents
Text-Based Measures of Document Diversity Date : 2014/02/12 Source : KDD’13 Authors : Kevin Bache, David Newman, and Padhraic Smyth Advisor : Dr. Jia-Ling,
Sharing Features among Dynamical Systems with Beta Processes
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
Statistical Topic Modeling part 1
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Part IV: Monte Carlo and nonparametric Bayes. Outline Monte Carlo methods Nonparametric Bayesian models.
Generative Topic Models for Community Analysis
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Latent Dirichlet Allocation a generative model for text
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
British Museum Library, London Picture Courtesy: flickr.
Models for Authors and Text Documents Mark Steyvers UCI In collaboration with: Padhraic Smyth (UCI) Michal Rosen-Zvi (UCI) Thomas Griffiths (Stanford)
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Online Learning for Latent Dirichlet Allocation
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
27. May Topic Models Nam Khanh Tran L3S Research Center.
Finding the Hidden Scenes Behind Android Applications Joey Allen Mentor: Xiangyu Niu CURENT REU Program: Final Presentation 7/16/2014.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
Integrating Topics and Syntax -Thomas L
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Storylines from Streaming Text The Infinite Topic Cluster Model Amr Ahmed, Jake Eisenstein, Qirong Ho Alex Smola, Choon Hui Teo, Eric Xing Carnegie Mellon.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Topic Modeling using Latent Dirichlet Allocation
Project 2 Latent Dirichlet Allocation 2014/4/29 Beom-Jin Lee.
Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University.
Latent Dirichlet Allocation
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Web-Mining Agents Topic Analysis: pLSI and LDA
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Analysis of Social Media MLD , LTI William Cohen
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Bayesian Conditional Random Fields using Power EP Tom Minka Joint work with Yuan Qi and Martin Szummer.
Online Multiscale Dynamic Topic Models
Nonparametric Latent Feature Models for Link Prediction
Markov Networks.
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
Topic models for corpora and for graphs
Bayesian Inference for Mixture Language Models
Michal Rosen-Zvi University of California, Irvine
Latent Dirichlet Allocation
Junghoo “John” Cho UCLA
Topic models for corpora and for graphs
Topic Models in Text Processing
Hierarchical Relational Models for Document Networks
Presentation transcript:

Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed class of models can be used to recover human-interpretable latent feature representations of the entities in a network. We demonstrate the model on the Enron corpus. Latent Variable Network Models Find low-dimensional representations of the actors Conditional independence assumptions improve tractability Unifying view: probabilistic matrix factorization The NxN network Y is assumed to be generated via E.g. MMSB (Airoldi et al. 2008), LFRM (Miller et al. 2009), RTM (Chang and Blei 2009), Latent Factor Model (Hoff et al. 2002),… Two mode networks and other rectangular matrix data: James R. Foulds, Padhraic Smyth University of California, Irvine Interpretable Latent Feature Models For Text-Augmented Social Networks The Nonparametric Latent Feature Relational Model (Miller et al., 2009) Actor i represented by a binary vector of features Z i Number of features K learned automatically due to the non-parametric Indian Buffet Process prior on Z Probability of edge between actor i and actor j is Binary matrix factorization (BMF), due to Meeds et al. (2007), is the rectangular matrix version of this model. Feature interaction weights Logistic function (or other link function) A C B Cycling Fishing Running Waltz Running Tango Salsa CyclingFishingRunningTangoSalsaWaltz A B C Z = Y ∼ f(Λ), Λ Z ZTZT = NxNNxKNxKKxNKxN W KxKKxK Latent variables Variable interaction terms (optional) Actor Feature Λ Z (1) Z (2)T = NxMNxK (1) K (2) xM W K (1) xK (2) Markov Chain Monte Carlo Inference Gibbs updates on the latent features Metropolis-Hastings updates for Ws, using a Gaussian proposal Collapsed Gibbs sampler for the topic assignments Optimize the hyper-parameters Gradient ascent for λ, γ Iterative procedure for α +, due to Minka (2000). Align the features and topics, maximizing the Polya log-likelihood via the Hungarian algorithm. Latent Dirichlet Allocation (Blei, Ng & Jordan, 2003) A probabilistic model for text corpora Topics are discrete distributions over words Each document has a distribution over topics We can also view LDA as a factorization of the matrix of word probabilities for each document. BMF_LDA: A Joint Model for Networks and Text The generative process is assumed to be as follows: Generate network via BMF (or LFRM) Associate a topic with each latent feature Generate documents via LDA, where the prior for each document’s topics depends on the latent features from BMF: For rectangular networks, this is equivalent to: Future Work / Work in Progress Evaluate the recovered features Quantitative experiments Results on the Yelp dataset References D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent Dirichlet allocation. The Journal of Machine Learning Research, K.T. Miller, T.L. Griffiths, and M.I. Jordan. Nonparametric latent feature models for link prediction. NIPS, E. Meeds, Z. Ghahramani, R. Neal, and S. Roweis. Modeling dyadic data with binary latent factors. In Advances in neural information processing systems, 2007.