Download presentation
Presentation is loading. Please wait.
Published byMolly Haye Modified over 9 years ago
1
Finding your friends and following them to where you are by Adam Sadilek, Henry Kautz, Jeffrey P. Bigham Presented by Guang Ling 1
2
Why you should care about this work? It is the best paper of WSDM2012 It is covered by famous news media – Newscientist – Gizmodo It claims to infer your fine-grained location even when you keep your data private (Dangerous!) 2
3
What is it about? Leverage Geo-tagged twitter data Two predictive tasks – Social ties (link prediction) – Location prediction 3
4
The system: FLAP Three main components of Flap(Friendship + Location Analysis and Prediction) – Data crawler – Data visualizervisualizer – Machine learning module Social ties prediction Location prediction 4
5
Data crawler Use Twitter Search API to collect data – To avoid twitter’s query rate limitations: Distribute the work to a number of machines with different IP Asynchronous queries Merge the results to form the dataset 5
6
The Data New York City & Los Angeles 26M tweets crawled in a month – 1.2M unique users – 7.6M geo-tagged tweets 6
7
The Data 11K geo-active users 4M tweets by geo-active users – 123K “follows” relationship – 52K “friend” relationship Geo-active users? User that posted more than 100 GPS-tagged tweets during one month Friend relationship? A pair of user who have reciprocal follow relationships have friend relationship 7
8
Data Visualizer Take a look yourself!look 8
9
Machine learning module Meat of the paper – Feature based link prediction Features Tools – Location prediction Treat users with known GPS positions as noisy sensors of the location of their friends 9
10
Feature based link prediction No single property (feature) of a pair of individuals functions as a good indicator Combine multiple disparate features – What features? 10
11
Feature based link prediction Features – Text similarity – Co-location score – Graph structure (meet/min coefficient) Text similarity Co- location score Regression decision tree One feature 11
12
Feature based link prediction Features – Text similarity – Co-location score – Graph structure What about other features? – Tried and discarded Jaccard coefficient Preferential attachment Hypergeometric coefficient – To keep it efficient and scalable 12
13
Feature based link prediction Features – Text similarity – Co-location score – Graph structure Text similarity Co- location score Regression decision tree One feature 13
14
Feature based link prediction 14
15
Location prediction Idea – Friends ~ as noisy sensors of u’s location Task – Infer the most likely location of person u at any time Input – Sequence of locations visited by u’s friends – Location of u himself over the training period (supervised learning only) 15
16
Location prediction Procedure to extract important locations – For each user Extract a set of distinct locations from which he/she tweet from Merge (cluster) all locations within 100 meters range – To account for GPS sensor noise Remove location with fewer than 5 visits – Merge all the extracted locations 89,077 unique locations 25,830 significant locations 16
17
Location prediction Time? – Location are modeled in 20 minute increments – The domain of the time of day r.v. is 0, 1, …, 71 (24/0.3 = 72) 17
18
Location prediction Two settings – Supervised learning Location of u over the training period is given – Unsupervised learning Only u’s friends location during training period is given 18
19
Location prediction Solution: dynamic Bayesian network 19
20
Location prediction Learning in supervised setting – Optimization objective function 20 Observed values Hidden values
21
Location prediction Learning in unsupervised settings – Optimization objective function – EM algorithm – Intractable for sizable domains – Optimize lower bound 21
22
Location prediction Inference – Given a learnt model, what is the most likely sequence of location visited by a user? – Viterbi decoding algorithm 22
23
Experiments and evaluation Friendship prediction experiments – AUC adopted as evaluation method – Observed edges ranging from 0% to 50% – Two-fold cross validation 23
24
Experiments and evaluation Friendship prediction experiments 24
25
Experiments and evaluation Location prediction experiments – Cross validation over all users – Train on first 3 weeks data, test on the 4 th week 25
26
Experiments and evaluation 26
27
Conclusion A lot of information can be inferred – User friendship Even when no ties are given – User’s fine-grained location Even for a user who have never revealed his location – Ethical questions implied Would you trade your privacy with automated system? 27
28
A thought on the paper This paper… – Solve problems using existing tools Regression decision tree Belief propagation Dynamic Bayesian networks Why this is a best paper? – Demo a working system: FLAP – Creative use of existing tools – Very impressive experiment results – Well written paper 28
29
Q&A Any questions? 29
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.