Download presentation
Presentation is loading. Please wait.
Published byVictoria McDowell Modified over 8 years ago
1
TribeFlow Mining & Predicting User Trajectories Flavio Figueiredo Bruno Ribeiro Jussara M. AlmeidaChristos Faloutsos 1
2
Example: Online Check-Ins 2 2
3
What Happens x What we See 3 3 L1 L2 L1 L3 Time T1T1 T2T2 T3T3
4
Different Behaviors Different Trajectories 4 141561 3h 5h1h 6h Duration in Location 123 1h0.5h
5
General Motivation 5 5 ???? ???? Predict where an agent will go next ????
6
6 How we Navigate?
7
Navigational Constraints Geographical Constraints –Check-in at Eiffel Tower followed by Statue of Liberty Probably not –Starbucks at JFK followed by McDonalds at Beijing Airport. Possible Application Constraints (e.g., Links on the Application) 7
8
Simple Navigation Process (e.g., PageRank) 8 5 2 4 3 6 1 –Visible Links –Random Walks w. Random Jumps to Random Nodes
9
Random Walk Transition Matrix 9 5 2 4 3 6 1 123456 1.025.87 5.025 2.45.025.45.025 3.87 5.025 4.45 5.025.87 5.025 6.166 P[Next| Previous] What if we don’t see the links? Trajectories are all we see. What if we don’t see the links? Trajectories are all we see. 123 1h0.5h And no self-loops: Modeling self-loops is separate problem. See Figueiredo et al. PKDD2014, Benson et al. WWW 2016, …
10
Latent Random Walk Transition Matrices 10 Latent Transition Matrices –Navigation constraints –Navigation preferences based on user types We must learn: –Transition matrices from trajectory data From multiple users –User preferences over matrices 123456 1.165 2.45.025.45.50.025 3.87 5.025 4.45 5.025.87 5.025. 6.166 123456 1.025.87 5.025 2.45.025.45.025 3.87 5.025 4.45 5.025.87 5.025 6.166 123456 1.165 2 3 4.025.45 5.025.47 5.45 6.025.47 5.45
11
TribeFlow’s Environments 11 123456 1.165 2.025.45.025 3.165.87 5.025 4.45 5.025.87 5.025 6.166 123456 1..165 2.45.025.45.50.025 3.87 5.025 4.45 5.025.87 5.025 6.166 123456 1.025.87 5.025 2.45.025.45.025 3.87 5.025 4.45 5.025.87 5.025 6.166 123456 1.35.165 2 3.35..16 5.165 4.025.45 5.025.45 6.025.45 P[Node = 3 | Node = 5, Env = Green] x P[inter-event time > T | Env = Green].45.35.1.50.025.45 P[Env = Green | User = Jane] Time between transitions Transition & Time
12
Problem Definition Input: –Large dataset of user trajectories from thousands of users Output: –Set of latent transition matrices –Set of user preferences over these matrices That is: –Interpretable and accurate Alice will listen to ‘4’ next! 12 123 141 561
13
Defining the Model 13
14
….45.35.1….50.025....3 … How many Transition Matrices? We Use a Bayesian Nonparametric Model –Learned from the data 14 Dirichlet process prior over user preference vector (rows)
15
Decomposing User Trajectories 123 2mi n 1mi n 65 5mi n 71012 1mi n 2mi n.3.4.25.05 Unlikely in Green Environment (Jump) 2 min Unlikely in Red Environment (Jump) 24 hours
16
Dealing With Sparseness Sparseness Issue –Millions of items (e.g., web pages / artists) –~10 12 bigrams (Source, Destination) – Last.FM artists –Only ~10 8 (Source, Destination) pairs largest dataset [LastFM-Groups] Thus, Random Walk transition probabilities over vertices rather than edges (without self-loops) 16
17
Learning the Model 17
18
Generative model variables: –Environments: Transition Matrix –Simplify inference by breaking up sequence into renewal times of length B Inter-event time distribution –Latent user preferences over environments TribeFlow’s Generative Model 18
19
TribeFlow Inference Gibbs sampling –Item Transitions –User Preferences –Inter-event times Merge and Split Moves –To infer Dirichlet Process Fully Distributed –Based on AsyncLDA: Asuncion, 2009, NIPS 19
20
Results 20
21
TribeFlow at Work 21 Years of data 70% train + validation 30% test time
22
Latent Markov Embedding (LME), Chen et al., KDD, 2012 Multi-core Latent Markov Embedding, Chen et al., KDD 2013 PRLME, Personalized Ranking LME, Feng et al., IJCAI 2015 Factorizing Machines (FPMC), Rendle et al. WWW 2010 Progression Stages (Stages), Yang et al., WWW 2014 Gravity Model (Gravity) –Commonly used by various authors to measure flow between locations Silva et al., 2006, García-Galivanes et al., CSCW 2014, Smith et al., CSCW 2013 22 Prior Work for Comparison Common issues (except Stages & Gravity): - Don’t scale - Not personalized - No probabilistic interpretation Common issues (except Stages & Gravity): - Don’t scale - Not personalized - No probabilistic interpretation
23
Predicting Where Users Go Next 23 More Accurate Faster 20 cores one node
24
Using Data Subsamples 24 Always faster & more accurate Always faster and more accurate 20 cores one node
25
Further Comparisons 25 TribeFlow vs. MultiLME – Scalability (Chen et al, 2013) –TribeFlow is 413x Faster –At least 12% higher Predictive Likelihood TribeFlow vs. Stages (Yang et al, 2014) –TribeFlow at least 40% higher MRR (Mean Reciprocal Ranking) –In the datasets that we were able to execute Stages TribeFlow vs. Gravity Model (Silva et al, 2006) –Number of users that go from A to B –Gravity Models are Fast (Poisson Regression) –However, TribeFlow was 800% more accurate in MAE
26
Sense Making: Music Streaming Data 26 Data crawled using Pop artists as seeds. –Natural bias towards Pop music
27
Sense Making: FourSquare Data Why these environments? These are airports TribeFlow is learning the constraints –No use of GPS information 27
28
Users are not Synchronized “Everybody” eventually goes through a “Beatles” phase 28
29
29 X X A A B C G G I x J x K I x r K x t K x s r x s x t = users products time TribeFlow x Temporal Tensors If users are not synchronized:
30
Conclusions TribeFlow –Predicts & mines user trajectories –Fast & scalable –Accurate & interpretable Thank You! 30 http://flaviovdf.github.io/tribeflow
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.