Download presentation
Presentation is loading. Please wait.
1
DM-Group Meeting Liangzhe Chen, Nov
2
Papers to be present Learning Social Network Embeddings for Predicting Information Diffusion WSDM’14 S. Bourigault, C. Lagnier, S. Lamprier, L. Denoyer, P. Gallinary. Leveraging Social Context for Modeling Topic Evolution KDD’15 J, Kalyanam, A. Mantrach, D. Saez-Trumper, H. Vahabi, G. Lanckriet. A Decision Tree Framework for Spatiotemporal Sequence Prediction T. Kim, Y. Yue, S. Taylor, I. Matthews.
3
1st Paper Learning Social Network Embeddings for Predicting Information Diffusion WSDM’14 S. Bourigault, C. Lagnier, S. Lamprier, L. Denoyer, P. Gallinary.
4
Goal Learn a mapping of the observed temporal dynamic (the diffusion process) onto a continuous (Euclidean ) space. Nodes participating to diffusion cascades are projected in a latent representation space in such a way that information diffusion can be modeled efficiently using a heat diffusion process. Introduce the purpose as finding the progression of events
5
Diffusion Kernel A diffusion kernel K(t,y,x), which computes the heat at location x and time t, knowing that the heat source is y. Given the mapping Z from the nodes to their location in the latent space, this kernel can be written as:
6
Learning the Best Kernel
We want to find the mapping in a way that let the diffusion kernel explain the cascades observed in the training data. Our objective is to minimize the following loss function:
7
Learning Diffusion as a Ranking Prob.
There is no full supervision. The partial supervision constrains the kernel to contaminate nodes in their actual temporal order of infection. Introduce stages and classes Segmentation, and classification
8
Hinge Loss Function These constraints can be handled by the hinge loss function: Introduce stages and classes Segmentation, and classification
9
Learning Algorithm Stochastic gradient descent
Introduce stages and classes Segmentation, and classification
10
Content-Based Diffusion Kernel
An extension of the previous kernel, taking into account the content of each cascade (different content will propagate differently in the network). The content-based kernel is: Introduce stages and classes Segmentation, and classification
11
Experiment: Datasets ICWSM Memetracker Digg
44 millions blog posts collected over 1 year. Each blog is a user, and cascades are composed of sets of posts that link to each other. Memetracker Tracing the flow of short phrases and memes through the web. Digg Collaborative news portal where users can ‘digg’ (like) stories. The ‘digg’ from users form cascades of a story.
12
Evaluation Measures Calculate a score for each user, indicating how likely that user is to be infected by the cascade. Measure the performance by Mean Average Precision, and Precision Recall curve.
13
Results: MAP On synthetic dataset: On real datasets:
14
Results: Precision Recall Curves
15
Results: the Latent Space
16
2nd Paper Leveraging Social Context for Modeling Topic Evolution
KDD’15 J, Kalyanam, A. Mantrach, D. Saez-Trumper, H. Vahabi, G. Lanckriet.
17
Goal Todays’ corpora have a social context embedded in them in terms of the community of users interested in a particular post, their profiles, etc. The goal is to harness this social context that comes along with the textual content for topic discovery and evolution.
18
Data input At each time stamp, we have:
A textual content matrix X, of size #docs by #textual features. (indicating the feature of each document) A social content matrix U, of size #docs by #users. (indicating which users have mentioned each document) We want to learn the topics from both {X} and {U}
19
Non-negative Matrix Factorization
Decompose both X and U in terms of the underlying latent topics. d by f d by k k by f d by u d by k k by u
20
Modeling Topic and Community Evolution
The current topics/communities is a linear combination of the previous topics/communities. U
21
Loss Function Community level loss Topic level loss Regularization
Use multiplicative update to find the solution to minimize the loss function, see details in the paper.
22
Experiment: Dataset Public dataset released in 2013 consisting of all articles published from 80 international news sources during a period of 14 days. Each article has the textual content, as well as a list of tweets which link to that article over a period of 12 hours from the article’s publication. Extract user names, and hashtag from each tweets. The hashtag is used as the ground truth topic. After filtering, we have finally 33,387 articles, 384 hashtags.
23
Hashtag/Topic Stability
Naturally, there will be three types of hashtag: Content stable hashtags: the content of the these hashtags does change much over time. Community stable hashtags: the community that are interested in these hashtags does not change much over time. Mixed stable hashtags: both the content and the community of these hashtags are stable.
24
Measure the Quality of the Topics
Get the top 10 words of each topics, and compare them to the ground truth using Normalized Cumulative Discounted Gain, and Mean Average Precision.
25
Learning the Stability of Topics
Using a μ of 0.5, we use the evolution matrix MC, MT to measure the topics’ stability on communities and contents. If the evolution matrix is close to I (or its permutation), then there is few evolution over time.
26
Learning the Stability of Topics
27
3rd Paper A Decision Tree Framework for Spatiotemporal Sequence Prediction KDD’15 T. Kim, Y. Yue, S. Taylor, I. Matthews.
28
Problems Let x={x1, x2, … x|x|} denotes an input sequence, and y={y1, y2, … y|y|} denotes a spatiotemporal output sequence, learn a function h(x)=y, that minimizes the loss on a training dataset.
29
Method Decompose the input sequence to overlapping windows
For each windows, predict a fixed length output sequence using decision tree based model h. Construct the final output by blending together the outputs from each window.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.