Location-Based Topic Evolution Haiqin Yang, Shouyuan Chen, Michael R. Lyu, Irwin King The Chinese University of Hong Kong 1.

Slides:



Advertisements
Similar presentations
Topic models Source: Topic models, David Blei, MLSS 09.
Advertisements

Hierarchical Dirichlet Processes
One Theme in All Views: Modeling Consensus Topics in Multiple Contexts Jian Tang 1, Ming Zhang 1, Qiaozhu Mei 2 1 School of EECS, Peking University 2 School.
Unsupervised Modeling of Twitter Conversations
Title: The Author-Topic Model for Authors and Documents
Statistical Topic Modeling part 1
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Caimei Lu et al. (KDD 2010) Presented by Anson Liang.
Topic Modeling with Network Regularization Md Mustafizur Rahman.
Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1
Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.
Queueing Analysis for Access Points with Failures and Handoffs of Mobile Stations in Wireless Networks Chen Xinyu and Michael R. Lyu The Chinese Univ.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)
Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.
11 Learning to Suggest Questions in Online Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.
ON INCENTIVE-BASED TAGGING Xuan S. Yang, Reynold Cheng, Luyi Mo, Ben Kao, David W. Cheung {xyang2, ckcheng, lymo, kao, The University.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Question Routing in Community Question Answering: Putting Category in Its Place 1 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 AT&T Labs.
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
An Analytical Approach to Optimizing The Utility of ESP Games Chien-Wei Lin, Kuan-Ta Chen, Ling-Jyh Chen Academia Sinica Irwin King, and Jimmy Lee The.
1 Heat Diffusion Classifier on a Graph Haixuan Yang, Irwin King, Michael R. Lyu The Chinese University of Hong Kong Group Meeting 2006.
Storylines from Streaming Text The Infinite Topic Cluster Model Amr Ahmed, Jake Eisenstein, Qirong Ho Alex Smola, Choon Hui Teo, Eric Xing Carnegie Mellon.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Machine Design Under Uncertainty. Outline Uncertainty in mechanical components Why consider uncertainty Basics of uncertainty Uncertainty analysis for.
Xutao Li1, Gao Cong1, Xiao-Li Li2
Jointly Modeling Topics, Events and User Interests on Twitter Qiming DiaoJing Jiang School of Information Systems Singapore Management University.
So Hirai The University of Tokyo Currently NTT DATA Corp. Kenji Yamanishi The University of Tokyo WITMSE 2012, Amsterdam, Netherland Presented at KDD 2012.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Recommender Systems with Social Regularization Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu The Chinese University of Hong Kong Irwin.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Mobile-Assisted Localization by Stitching in Wireless Sensor Networks IEEE ICC 2011 Han Wang, Wangdong Qi, Kun Wang, Peng Liu, Li Wei and Yasong Zhu PLA.
Ch 8 Estimating with Confidence 8.1: Confidence Intervals.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Text Classification Improved through Multigram Models.
Poster Spotlights Conference on Uncertainty in Artificial Intelligence Catalina Island, United States August 15-17, 2012 Session: Wed. 15 August 2012,
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.
Reputation-aware QoS Value Prediction of Web Services Weiwei Qiu, Zhejiang University Zibin Zheng, The Chinese University of HongKong Xinyu Wang, Zhejiang.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu
TribeFlow Mining & Predicting User Trajectories Flavio Figueiredo Bruno Ribeiro Jussara M. AlmeidaChristos Faloutsos 1.
Big data Analytics for Tourism Destination management
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Shuang-Hong Yang, Hongyuan Zha, Bao-Gang Hu NIPS2009
Chen Cheng Haiqin Yang Irwin King Michael R. Lyu
Clustering Uncertain Taxi data
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Hierarchical Topic Models and the Nested Chinese Restaurant Process
Section 7.7 Introduction to Inference
Stochastic Optimization Maximization for Latent Variable Models
Pinjia He, Jieming Zhu, Jianlong Xu, and
Michal Rosen-Zvi University of California, Irvine
Topic Models in Text Processing
TOPTRAC: Topical Trajectory Pattern Mining
Mingzhen Mo and Irwin King
CS639: Data Management for Data Science
Using Clustering to Make Prediction Intervals For Neural Networks
Presentation transcript:

Location-Based Topic Evolution Haiqin Yang, Shouyuan Chen, Michael R. Lyu, Irwin King The Chinese University of Hong Kong 1

Outline  Motivation  Location-Based Topic Evolution Model  Experiments  Conclusion 2

Location Information is attainable IP GPS 3G, Wi-Fi NFC New Mobile Technologies 3

Geo-information  Twitter  Typhoon trajectory estimation  Earthquake location [Sakaki et al.,WWW’10]  Flickr  Geo-tagged photos [Crandall et al., WWW’09]  Geofolk [Sizov, WSDM’10] 4

New Applications- Timeliness  Identify users’ interests in a region 5

New Applications- Commercial Value  Determine appropriate marketing strategy 6

Solution-Topics Learning  Topics: Distributions over words  Location-associated documents  Geo-informaiton with message, posts, tags  Help to learn the topics more accurately 7

Current Problems  Do not consider appearance and disappearance of topics  Do not model topic evolution  Have to determine the number of topics  Location-aware Topic Model [Wang et al. GIR’07]  Geofolk [Sizov, WSDM’10]  Geographical topic discovery [Yin et al. WWW’11] 8

Our Contributions  Propose a location-based topic evolution (LBTE) model  Model topic changes of users’ interests in a region  Allow for appearance and disappearance of topics  Automatically determine topic numbers  Efficient inference 9

Problem Setup  Vocabulary:  Data:  Objective: modeling the topics of data with an unknown number of topics and parameters. 10

Assumptions  Documents from unknown topics  Topic from hidden functions, determined by the function value  Functions from a probability measure 11

Evolution with Regions  Domains of functions include regions  Values of functions represent topics 12

Evolution with Regions and Time  The beginning (end) of function domain correspond to appearance (disappearance) of a topic 13

Generative Process 14

Inference-Gibbs Sampler 1. Sample auxiliary variables: To determine whether the domain of the function contain the region (Bernoulli) Sample auxiliary variables 2. Sample assignment: Calculate the probability of assigning to existing function and that of assigning to a new function Sample assignment 3. Draw topics parameters 15

Experiments  Datasets  Synthetic data  Flickr data  Comparison methods  Dirichlet Process Mixture (DPM)  Location-Based Topic Evolution (LBTE) 16

Synthetic Data  Topics Generation  Topics Initialization-Two topics  Center:  Parameter:  Topics Evolution  Die off rate 40%  New topic follows Poisson distribution with parameter 0.8.  Location-associated Documents Generation  10 documents for each topic  Location of each documents follows the uniform distribution at the center of the topic with radius, 5  Values of topics follow 17

Results of Synthetic Data  LBTE outperforms the DPM at all the time stamps 18  LBTE recovers true topics and achieves zero variation of information

Flickr Data  Geo-tagged photos crawled from 2009/01/01 to 2010/01/01  Only in USA territory. 19 An example { "date": " :34:04", "lat": " ", "lon": " ", "id": " ", "tags": [ "grandcanyon", "nationalpark", "sunset", "limestone", "scenic"] }

Results of National Park  Topics learned from DPM are scattered 20

Results of National Park  LBTE utilizes location information and discovers topics based on the regions 21 Yellow Stone Grand Canyon Big Bend Joshua Tree

Results of National Park 22

Conclusion  Advantages of Location-based Topic Evolution Model  Automatically modeling the number of total topics  Automatically modeling topics’ appearance and disappearance  Succinct sampling-Gibbs sampling 23

Thank you ! 24

Sample Auxiliary Variables

Sample Assignment