TribeFlow Mining & Predicting User Trajectories Flavio Figueiredo Bruno Ribeiro Jussara M. AlmeidaChristos Faloutsos 1
Example: Online Check-Ins 2 2
What Happens x What we See 3 3 L1 L2 L1 L3 Time T1T1 T2T2 T3T3
Different Behaviors Different Trajectories h 5h1h 6h Duration in Location 123 1h0.5h
General Motivation 5 5 ???? ???? Predict where an agent will go next ????
6 How we Navigate?
Navigational Constraints Geographical Constraints –Check-in at Eiffel Tower followed by Statue of Liberty Probably not –Starbucks at JFK followed by McDonalds at Beijing Airport. Possible Application Constraints (e.g., Links on the Application) 7
Simple Navigation Process (e.g., PageRank) –Visible Links –Random Walks w. Random Jumps to Random Nodes
Random Walk Transition Matrix P[Next| Previous] What if we don’t see the links? Trajectories are all we see. What if we don’t see the links? Trajectories are all we see h0.5h And no self-loops: Modeling self-loops is separate problem. See Figueiredo et al. PKDD2014, Benson et al. WWW 2016, …
Latent Random Walk Transition Matrices 10 Latent Transition Matrices –Navigation constraints –Navigation preferences based on user types We must learn: –Transition matrices from trajectory data From multiple users –User preferences over matrices
TribeFlow’s Environments P[Node = 3 | Node = 5, Env = Green] x P[inter-event time > T | Env = Green] P[Env = Green | User = Jane] Time between transitions Transition & Time
Problem Definition Input: –Large dataset of user trajectories from thousands of users Output: –Set of latent transition matrices –Set of user preferences over these matrices That is: –Interpretable and accurate Alice will listen to ‘4’ next!
Defining the Model 13
… … … How many Transition Matrices? We Use a Bayesian Nonparametric Model –Learned from the data 14 Dirichlet process prior over user preference vector (rows)
Decomposing User Trajectories 123 2mi n 1mi n 65 5mi n mi n 2mi n Unlikely in Green Environment (Jump) 2 min Unlikely in Red Environment (Jump) 24 hours
Dealing With Sparseness Sparseness Issue –Millions of items (e.g., web pages / artists) –~10 12 bigrams (Source, Destination) – Last.FM artists –Only ~10 8 (Source, Destination) pairs largest dataset [LastFM-Groups] Thus, Random Walk transition probabilities over vertices rather than edges (without self-loops) 16
Learning the Model 17
Generative model variables: –Environments: Transition Matrix –Simplify inference by breaking up sequence into renewal times of length B Inter-event time distribution –Latent user preferences over environments TribeFlow’s Generative Model 18
TribeFlow Inference Gibbs sampling –Item Transitions –User Preferences –Inter-event times Merge and Split Moves –To infer Dirichlet Process Fully Distributed –Based on AsyncLDA: Asuncion, 2009, NIPS 19
Results 20
TribeFlow at Work 21 Years of data 70% train + validation 30% test time
Latent Markov Embedding (LME), Chen et al., KDD, 2012 Multi-core Latent Markov Embedding, Chen et al., KDD 2013 PRLME, Personalized Ranking LME, Feng et al., IJCAI 2015 Factorizing Machines (FPMC), Rendle et al. WWW 2010 Progression Stages (Stages), Yang et al., WWW 2014 Gravity Model (Gravity) –Commonly used by various authors to measure flow between locations Silva et al., 2006, García-Galivanes et al., CSCW 2014, Smith et al., CSCW Prior Work for Comparison Common issues (except Stages & Gravity): - Don’t scale - Not personalized - No probabilistic interpretation Common issues (except Stages & Gravity): - Don’t scale - Not personalized - No probabilistic interpretation
Predicting Where Users Go Next 23 More Accurate Faster 20 cores one node
Using Data Subsamples 24 Always faster & more accurate Always faster and more accurate 20 cores one node
Further Comparisons 25 TribeFlow vs. MultiLME – Scalability (Chen et al, 2013) –TribeFlow is 413x Faster –At least 12% higher Predictive Likelihood TribeFlow vs. Stages (Yang et al, 2014) –TribeFlow at least 40% higher MRR (Mean Reciprocal Ranking) –In the datasets that we were able to execute Stages TribeFlow vs. Gravity Model (Silva et al, 2006) –Number of users that go from A to B –Gravity Models are Fast (Poisson Regression) –However, TribeFlow was 800% more accurate in MAE
Sense Making: Music Streaming Data 26 Data crawled using Pop artists as seeds. –Natural bias towards Pop music
Sense Making: FourSquare Data Why these environments? These are airports TribeFlow is learning the constraints –No use of GPS information 27
Users are not Synchronized “Everybody” eventually goes through a “Beatles” phase 28
29 X X A A B C G G I x J x K I x r K x t K x s r x s x t = users products time TribeFlow x Temporal Tensors If users are not synchronized:
Conclusions TribeFlow –Predicts & mines user trajectories –Fast & scalable –Accurate & interpretable Thank You! 30