Hierarchical Relational Models for Document Networks

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Design of Experiments Lecture I
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
Generative Topic Models for Community Analysis
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
QUANTITATIVE DATA ANALYSIS
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Latent Dirichlet Allocation a generative model for text
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Scalable Text Mining with Sparse Generative Models
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Click to highlight each section of the article one by one Read the section, then click once to view the description of it If you want to read it, you.
Click to highlight each section of the article one by one Read the section, then click once to view the description of it If you want to read it, you.
Left click or use the forward arrows to advance through the PowerPoint Upon clicking, each section of the article will be highlighted one by one Read.
Topics Covered Abstract Headings/Subheadings Introduction/Literature Review Methods Goal Discussion Hypothesis References.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Left click or use the forward arrows to advance through the PowerPoint Upon clicking, each section of the article will be highlighted one by one Read.
Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Thematic Mapping & Data Classification
Understanding Basic Statistics Chapters Covered in Term 1 1.Getting Started 2.Organizing Data 3.Averages and Variation 4.Correlation and Regression 5.Elementary.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Left click or use the forward arrows to advance through the PowerPoint Upon advancing, each section of the article will be highlighted one by one Read.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
--He Xiangnan PhD student Importance Estimation of User-generated Data.
Evaluating Network Security with Two-Layer Attack Graphs Anming Xie Zhuhua Cai Cong Tang Jianbin Hu Zhong Chen ACSAC (Dec., 2009) 2010/6/151.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Topic Modeling using Latent Dirichlet Allocation
A New Method for Automatic Clothing Tagging Utilizing Image-Click-Ads Introduction Conclusion Can We Do Better to Reduce Workload?
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Multi-Abstraction Concern Localization Tien-Duy B. Le, Shaowei Wang, and David Lo School of Information Systems Singapore Management University 1.
Latent Dirichlet Allocation
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Edge Preserving Spatially Varying Mixtures for Image Segmentation Giorgos Sfikas, Christophoros Nikou, Nikolaos Galatsanos (CVPR 2008) Presented by Lihan.
Statistics & Evidence-Based Practice
Online Multiscale Dynamic Topic Models
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
Combining Species Occupancy Models and Boosted Regression Trees
MANAGING DATA RESOURCES
Text Categorization Document classification categorizes documents into one or more classes which is useful in Information Retrieval (IR). IR is the task.
Matching Words with Pictures
CSCI 5822 Probabilistic Models of Human and Machine Learning
The topic discovery models
Topic Modeling Nick Jordan.
Topic models for corpora and for graphs
Causal Models Lecture 12.
Michal Rosen-Zvi University of California, Irvine
Chapter 7: Introduction to Sampling Distributions
Preparing for Resident research day
Topic models for corpora and for graphs
Topic Models in Text Processing
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

Hierarchical Relational Models for Document Networks Jonathan Chang and David Blei Facebook and Princeton University The Annals of Applied Statistics, 2010 Presented by Haojun Chen Images and some text are from the original paper.

Introduction Network data attracted lots of research interests in machine learning and applied statistics. Previous work focused only for the network structure but ignores the attributes of nodes. For example, in a citation network of articles, text and abstracts of documents should be used for exploiting the latent structure in the data too. In this paper, Relational Topic Model (RTM) is developed for network data, which accounts for both links and node attributes.

Data Example for RTM

Graphical Model for RTM

Generative Process for RTM

Link Probability Function Four Link Probability Function: CDF of Normal distribution : Hadamard product

Model Inference, Estimation and Prediction Variational inference for and Maximum likelihood estimate for , and Prediction Link prediction from words Word prediction from links

Empirical Results Data summary Three experiments Evaluating the predictive distribution Automatic link suggestion Modeling spatial data

Evaluating Predictive Distribution (1/2) Lower is Better

Evaluating Predictive Distribution (2/2)

Automatic Link Suggestion (1/3) Citation suggestion Suggest citation given the abstract Cora dataset and number of Topic is set to 10 RTM improves precision over LDA+Regression by 80% in the first 20 documents retrieved from the model

Automatic Link Suggestion (2/3)

Automatic Link Suggestion (3/3)

Modeling Spatial Data (1/4) Local News Data: 51 documents and each document for one state Number of Topic is set to 5 Word are ranked by the following score:

Modeling Spatial Data (2/4) Each color depicts a single topic. Each state’s color intensity indicates the magnitude of that topic’s component. Corresponding words associated with each topic are given in the table. RTM LDA

Modeling Spatial Data (3/4) RTM LDA

Modeling Spatial Data (4/4) RTM LDA

Discussion Relational Topic Model (RTM) is a hierarchical model of networks and per-node attribute data. It is demonstrated qualitatively and quantitatively that RTM is effective and useful mechanism for analyzing and using network data.