1 Yuxiao Dong *, Jie Tang $, Tiancheng Lou #, Bin Wu &, Nitesh V. Chawla * How Long will She Call Me? Distribution, Social Theory and Duration Prediction *University of Notre Dame $ Tsinghua University # Google Inc. & Beijing U. of Posts & Telecoms Yuxiao Dong, Jie Tang, Tiancheng Lou, Bin Wu, Nitesh V. Chawla. How Long will She Call Me? Distribution, Social Theory and Duration Prediction. In ECML/PKDD’13.
2 Outline Motivation Dynamic Distribution on Duration Social Theory on Duration Duration Prediction Conclusion
3 Motivation Mobile calls between humans are ubiquitous at any time … 91% of American adults have a mobile phone in May 2013 [1]. Mobile users can’t leave their phone alone for 6 minutes and check it up to 150 times a day [2]. People make, receive or avoid 22 phone calls every day [2]. 1.Pew Internet: Mobile Reports. June 6, Tomi Ahonen. Communities Dominate Brands.
4 Duration Macro-Distribution 1.M. Seshadri, A. Srid. J. Bolot. C. Faloutsos and J. Leskovec. Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions. In KDD’08. 2.P. Melo, L. Akoglu, C. Faloutsos and A. Loureiro. Surprising Patterns for the call duration distribution of mobile phone users. In PKDD’10 Double pareto lognormal distribution (DPLN) [1].Truncated log-logistic distribution(TLAC) [2].
5 Mobile Data Call Detailed Records (CDR): 3.9 million CDRs; 2 months (Dec & Jan. 2008); Non-America. Mobile Network: 272,345 users and 521,925 call edges. Pareto Principle: 20% pairs of users produce 80% calls. One-week data is available at
6 1.V. Palchykov, K. Kaski, J. Kertesz, AL. Bababasi and R.I.M. Dunbar. Sex differences in intimate relationships. Scientific reports 2: Existing Macro-Distribution. DPLN distribution TLAC distribution Dynamic Dist. on Duration Temporal distribution. Demographics distribution. Roadmap [1]
7 1.V. Palchykov, K. Kaski, J. Kertesz, AL. Bababasi and R.I.M. Dunbar. Sex differences in intimate relationships. Scientific reports 2: Existing Macro-Distribution. DPLN distribution TLAC distribution Dynamic Dist. on Duration Temporal distribution. Demographics distribution. Social Theory on Duration Strong/weak tie Homophily Opinion leader Social balance Roadmap [1]
8 1.V. Palchykov, K. Kaski, J. Kertesz, AL. Bababasi and R.I.M. Dunbar. Sex differences in intimate relationships. Scientific reports 2: Existing Macro-Distribution. DPLN distribution TLAC distribution Dynamic Dist. on Duration Temporal distribution. Demographics distribution. Social Theory on Duration Strong/weak tie Homophily Opinion leader Social balance Duration Prediction Dynamic factors Social factors Roadmap [1]
9 Dynamic Distribution on Duration
10 Periodicity Periodic patterns for mobile call duration: Working time (8:00AM-7:00PM), 75 seconds in average; Evening (7:00PM-12:00AM), increasing to150 seconds on mid-night; Early Moring (12:00AM-8:00AM), decreasing to 50 seconds.
11 Demographics Call Duration VS. Demographics: Longer calls by female than male; Longer calls between 2 females than 2 males; Longer calls from M to F than F call M; Longer calls if younger.
12 Social Theory on Duration
13 Social Theory Strong/weak tie: How long do people with a strong or weak tie call? Link homophily: Do similar users tend to call each other with long or short duration? Opinion leader: How different are the calling behaviours between opinion leaders and ordinary users? Social balance: How does the duration-based network satisfy social balance theory?
14 Strong/Weak Tie Using the #calls to measure the tie strength between two users Jure Leskovec and Eric Horvitz. Planetary-Scale views on a large instant-messaging network. In WWW’08. [1]
15 Strong/Weak Tie Call Duration VS. Social Tie: The stronger tie, shorter calls. 80% probability that the call is < 60s if they call each other for 1000 times two month. Different from online instant messaging network [2]. Using the #calls to measure the tie strength between two users Jure Leskovec and Eric Horvitz. Planetary-Scale views on a large instant-messaging network. In WWW’08. Probability that the call is < 60s. [1]
16 Link Homophily Using #common neighbours between two users to measure homophily. 1.Lilian Weng, Fillippo Menczer, Yong-Yeol Ann. Virality Prediction and Community Structure in Social Network. Scientific Reports. Aug [1]
17 Link Homophily Call Duration VS. Link Homophily: More common neighbors, shorter calls. 80% probability that the call is 30 common neighbors. Call Duration VS. Social Tie + Link Homophily: More homophily and stronger ties, shorter calls. Using #common neighbours between two users to measure homophily. 1.Lilian Weng, Fillippo Menczer, Yong-Yeol Ann. Virality Prediction and Community Structure in Social Network. Scientific Reports. Aug Probability that the call is < 60s. [1]
18 Opinion Leader Using PageRank to mine top 1% users as opinion leaders in mobile call network. The other as ordinary users. [1] 1. Katz, E. The two-step flow of communication: an up-to-date report of an hypothesis. In: Enis, Cox (eds.) Marketing Classics, 1973
19 Opinion Leader Call Duration VS. Opinion Leader: OL make shorter calls in general, the prob is about 80% that OL’s calls are < 60s; Calls between 2 OLs are shorter. Using PageRank to mine top 1% users as opinion leaders in mobile call network. The other as ordinary users. OL: opinion leader OU: ordinary user Probability that the call is < 60s. [1] 1. Katz, E. The two-step flow of communication: an up-to-date report of an hypothesis. In: Enis, Cox (eds.) Marketing Classics, 1973
20 Social Balance Structural balance: all three users are friends or only one pair of them are friends. Assume two users are friends if they call each other at least once. Relationship balance: the balance rate is the percentage of triangles with even number of negative ties. Assume a tie is a negative one based on #calls or average duration between two nodes.
21 Social Balance Call Duration VS. Social Balance: Unbalanced in structural balance Balanced in relationship balance Structural balance: all three users are friends or only one pair of them are friends. Assume two users are friends if they call each other at least once. Relationship balance: the balance rate is the percentage of triangles with even number of negative ties. Assume a tie is a negative one based on #calls or average duration between two nodes. < 20%, not balanced
22 Duration Prediction
23 Prediction Scenario v3v3 v4v4 v5v5 v2v2 v1v1 38s 62s 132s 95s Time 1 47s 33s v 1 : female, 29y v 2 : male, 31y v 3 : male, 60y v 4 : female, 63y v 5 : female, 27y Attribute factors
24 Prediction Scenario v3v3 v4v4 v5v5 v2v2 v1v1 47s 38s 62s 132s 95s v3v3 v4v4 v5v5 v2v2 v1v1 19s 40s 441s 78s 63s Time 1 Time 2 Opinion leader: v 5 Strong tie: v 4, v 5 Weak tie: v 1, v 3 Homophily: v 3, v 5 Social balance: v 3, v 4, v 5 33s 76s 16s v 1 : female, 29y v 2 : male, 31y v 3 : male, 60y v 4 : female, 63y v 5 : female, 27y Attribute factorsSocial factors
25 Prediction Scenario v3v3 v4v4 v5v5 v2v2 v1v1 138s 54s 95s 49s Time 3 Can we predict how long this call lasts for? v3v3 v4v4 v5v5 v2v2 v1v1 47s 38s 62s 132s 95s v3v3 v4v4 v5v5 v2v2 v1v1 19s 40s 441s 78s 63s Time 1 Time 2 33s 76s 16s v 5 calls to v 3 on Mon. 10:00PM Opinion leader: v 5 Strong tie: v 4, v 5 Weak tie: v 1, v 3 Homophily: v 3, v 5 Social balance: v 3, v 4, v 5 v 1 : female, 29y v 2 : male, 31y v 3 : male, 60y v 4 : female, 63y v 5 : female, 27y Attribute factorsSocial factors Temporal factors
26 Social Time-dependent Factor Graph (STFG) PFG : partially labeled factor graph [1] TRFG: social triad based factor graph [2] 1.W. Tang, H. Zhuang and J. Tang. Learning to infer social ties in large networks. In ECML/PKDD’11. 2.J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.
27 Social Time-dependent Factor Graph (STFG) PFG : partially labeled factor graph [1] TRFG: social triad based factor graph [2] STFG: partially labeled + social triad + time dependent 1.W. Tang, H. Zhuang and J. Tang. Learning to infer social ties in large networks. In ECML/PKDD’11. 2.J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.
28 Social Time-dependent FG
29 Social Time-dependent FG Joint distribution : Attributes SocialTemporal
30 Social Time-dependent FG Joint distribution : Attributes Social Attribute factor: Social factor: Exponential-linear functions to initialize factors Temporal Temporal factor:
31 STFG objective function: Learning: Parameters: Social Time-dependent FG
32 Learning Algorithm 1. J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11. Gradient decent method.
33 Learning Algorithm 1. J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11. Gradient decent method. Using Loopy Belief Propagation to compute expectation.
34 Experimental Setup Prediction Case 1: predict the duration of next call in the future Case 2: predict the average duration of calls in a future period
35 Experimental Setup Prediction Case 1: predict the duration of next call in the future Case 2: predict the average duration of calls in a future period Data First 7-week CDR data as historic data Case 1: 1 st call duration in 8 th week as next call prediction Case 2: average duration in 8 th week as next average prediction
36 Experimental Setup Prediction Case 1: predict the duration of next call in the future Case 2: predict the average duration of calls in a future period Data First 7-week CDR data as historic data Case 1: 1 st call duration in 8 th week as next call prediction Case 2: average duration in 8 th week as next average prediction Binary Prediction 60% calls are less than 60 seconds and remaining 40% are > 60s; There is a jump on telephone bill when it reaches 1 minute; Setting threshold = 60 seconds to classify calls as long or short calls in this work.
37 Experimental Setup (Cont.) Baseline Predictors SVM: support vector machine by SVM-light. LRC: logistic regression in Weka. Bnet: Bayes Network CRF: conditional random field Evaluation Precision / Recall / F1-Measure
38 Results Case 1: Next Call Duration Prediction Case 2: Average Call Duration Prediction
39 Factor Contribution G: gender A: age B: social balance T: social tie H: homophily O: opinion leader W: week D: day
40 STFG Convergence Our learning algorithm is able to reach convergence quickly.
41 Conclusion & Future Work Conclusions: Social theory and dynamic distribution have obvious existence in duration network; Our proposed model can significantly improve the prediction accuracy. Interesting observations: Young females tend to make long calls, in particular in the evening; Familiar people (more calls and more common neighbors) make shorter calls. Future work: Inferring call duration by regression model. Modeling duration prediction into a mobile application.
42 Thanks Data&Code: Yuxiao Dong, Jie Tang, Tiancheng Lou, Bin Wu, Nitesh V. Chawla. How Long will She Call Me? Distribution, Social Theory and Duration Prediction. In ECML/PKDD’13.