Presentation is loading. Please wait.

Presentation is loading. Please wait.

Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme

Similar presentations


Presentation on theme: "Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme"— Presentation transcript:

1 Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme svitlana@jhu.edu http://www.cs.jhu.edu/~svitlana/ Center for Language and Speech Processing, Johns Hopkins University, Human Language Technology Center of Excellence

2 Social Media Predictive Analytics Personalized, diverse and timely data Can reveal user interests, preferences and opinions Social Network Prediction App - https://apps.facebook.com/snpredictionapp/https://apps.facebook.com/snpredictionapp/ DemographicsPro – http://www.demographicspro.com/http://www.demographicspro.com/ WolphralAlpha Analytics – http://www.wolframalpha.com/facebook/http://www.wolframalpha.com/facebook/

3 User Attribute Prediction Task Political Preference Rao et al., 2010; Conover et al., 2011, Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013; Volkova et. al, 2014...... Communications Gender Garera and Yarowsky, 2009; Rao et al., 2010; Burger et al., 2011; Van Durme, 2012; Zamal et al., 2012; Bergsma and Van Durme, 2013 Age Rao et al., 2010; Zamal et al., 2012; Cohen and Ruth, 2013; Nguyen et al., 2011, 2013; Sap et al., 2014 … … … … AAAI 2015 Demo (joint work with Microsoft Research) Income, Education Level, Ethnicity, Life Satisfaction, Optimism, Personality, Showing Off, Self-Promoting

4 Outline I.Our Approach II.Dynamic (Streaming) Models III.Experimental Results IV.Practical Recommendations

5 Existing Approaches ~1K Tweets* …. … How long does it take for an average Twitter user to produce thousands of tweets? *Rao et al., 2010; Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Burger et al., 2011; Zamal et al., 2012; Nguyen et al., 2013 Tweets as a document What if we want to make reliable predictions immediately after 10 tweets?

6 Attributed Social Networks *Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Zamal et al., 2012; Volkova et al., 2014.

7 Our Approach Static (Batch) Predictions Streaming (Online) Inference Dynamic (Iterative) Learning and Prediction Offline training Offline predictions No or limited network information Offline training Online predictions in time (ACL’14) Exploring 6 types of neighborhoods ①Streaming nature of SM: dynamic training and prediction ②Network structure: joint user-neighbour streams ③Trade-off between prediction time vs. model quality Online predictions Relying on neighbors +Iterative re-training +Active learning +Interactive rationale annotation

8 Online Predictions: Iterative Bayesian Updates Time … ? ?

9 Iterative Batch Learning Time R D ? ? t1t1 … t1t1 Labeled Unlabeled t1t1 t1t1  Iterative Batch Retraining (IB)  Iterative Batch with Rationale Filtering (IBR) ? tmtm … tmtm t2t2 … t2t2 … tmtm t2t2 …

10 Rationales Rationales are explicitly highlighted ngrams in tweets that best justified why the annotators made their labeling decisions

11 Active Learning Labeled Unlabeled 1-Jan-20111-Feb-20111-Nov-20111-Dec-2011 Time … …  Active Without Oracle (AWOO)  Active With Rationale Filtering (AWR)  Active With Oracle (AWO)

12 Performance Metrics Accuracy over time: Find optimal models: – Data steam type (user, friend, user + friend) – Time (more correctly classified users faster) – Prediction quality (better accuracy over time)

13 Experimental Results: Data Stream and Prediction Time IBIBR AWOOAWR

14 Results: Iterative Batch Learning IB: higher recallIBR: higher precision Time: # correctly classified users increases over time IB faster, IBR slower Data stream selection: User + friend stream > user stream

15 Results: Active Learning AWOO: higher recallAWR: higher precision Time: Unlike IB/IBR models, AWOO/AWR models classify more users correctly faster (in Mar) but then plateaus

16 batch < active user + friend > user Results: Model Quality

17 Active with Oracle Annotations Oracle is 100% correct Thousands of tweets in training

18 Summary Active learning > iterative batch N, UN > U: “neighbors give you away” Higher confidence => higher precision, lower confidence => higher recall (as expected) Rationales significantly improve results

19 Practical Recommendations If you want to deliver ads fast but to be less confident in user attribute predictions: – use models with higher recall (AWOO, IB) – apply lower decision threshold e.g., 0.55 If you want to deliver ads to a true target crowd but latter in time: – use models with higher precision (AWR, IBR) – apply higher decision threshold e.g., 0.95 – models with rational filtering (IBR, AWR) require less computation (lower-dimensional feature vectors), are more accurate but annotations cost money (Mechanical Turk) For highly assortative attributes e.g., political preference use a joint user-neighbor stream

20 Applications Online targeted advertising Personalized marketing Large-scale real-time healthcare analytics Personalized recommendation systems and search Large-scale passive and real-time live polling Recruitment and human resource management Dating services

21 Applications (1) Online targeted advertising Targeting ads based on predicted user features Personalized marketing Detecting opinions users express about products or services within targeted populations Large-scale real-time healthcare analytics Identifying patterns of depression or mental illnesses within users of certain demographics Personalized recommendation systems and search

22 Applications (2) Recruitment and human resource management Estimating emotional stability and personality of the potential employees Measuring the overall well-being of the employees e.g., life satisfaction, happiness. Dating services Matching user profiles by comparing their personalities, interests and emotional tone in social media Large-scale passive and real-time live polling Mining political opinions, voting predictions for the groups of users with certain demographics

23 Thank you! Labeled Twitter network data for gender, age, political preference prediction: http://www.cs.jhu.edu/~svitlana/http://www.cs.jhu.edu/~svitlana/ Interested in using our models for your research or collaboration: code and pre-trained models for inferring demographic attributes, personality and 6 Ekman’s emotions available on request: svitlana@jhu.edusvitlana@jhu.edu AAAI Technical Demo Inferring Latent User Properties from Texts Published in Social Media Wednesday, January 28 6:30 – 8:00 Zilker Ballroom I am on a job market. Hire me! Email: svitlana@jhu.edu

24 Cand-Centric Graph: Belief Updates ? … Time ? …

25 Cand-Centric: Prediction Time (1) User-Neighbor 0.75 0.95 User Stream Dem Rep Prediction confidence: 0.95 vs. 0.75 Democrats are easier to predict than republicans Dem Rep Users classified correctly

26 Batch vs. Online Performance

27 Summary: Streaming (Online) Prediction Neighborhood content is useful * (homophily) Streaming models >> batch models Tweeting frequency matters a lot! Generalization of the classifiers depends on data sampling or annotation biases. *Pennacchiotti and Popescu, 2011a, 2001b; Conover et al., 2011a, 2001b; Golbeck et al., 2011; Zamal et al., 2012; Volkova et. al., 2014

28 Iterative Batch vs. Active w/o Oracle Predictions AWOO IB

29 Active with Oracle Learning


Download ppt "Online Bayesian Models for Personal Analytics in Social Media Svitlana Volkova and Benjamin Van Durme"

Similar presentations


Ads by Google