Khalid El-Arini Carnegie Mellon University Joint work with: Ulrich Paquet, Ralf Herbrich, Jurgen Van Gael, Blaise Agüera y Arcas Transparent User Models.

Slides:

Advertisements

Similar presentations

Topic models Source: Topic models, David Blei, MLSS 09.

Advertisements

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.

LYRIC-BASED ARTIST NETWORK Derek Gossi CS 765 Fall 2014.

Content Management & Hashtag Recommendation IN P2P OSN By Keerthi Nelaturu.

Web 2.0: Concepts and Applications 5 Connecting People.

Web 2.0: Concepts and Applications 5 Connecting People.

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.

2. Introduction Multiple Multiplicative Factor Model For Collaborative Filtering Benjamin Marlin University of Toronto. Department of Computer Science.

Caimei Lu et al. (KDD 2010) Presented by Anson Liang.

Fast Query Execution for Retrieval Models based on Path Constrained Random Walks Ni Lao, William W. Cohen Carnegie Mellon University

Personalized Search Result Diversification via Structured Learning

Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,

1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.

Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.

MusicSense: Contextual Music Recommendation using Emotional Allocation Modeling Rui Cai, Chao Zhang, Chong Wang, Lei Zhang, and Wei-Ying Ma Proceedings.

Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.

Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.

Social Media A BETTER WAY TO MANAGE YOUR ONLINE PROFILE!

Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

Towards Detecting Influenza Epidemics by Analyzing Twitter Massages Aron Culotta Jedsada Chartree.

RESEARCH A systematic quest for undiscovered truth A way of thinking

MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.

Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.

FYP Presentation DATA FUSION OF CONSUMER BEHAVIOR DATASETS USING SOCIAL MEDIA Madhav Kannan A R 1.

Scientific Inquiry Mr. Wai-Pan Chan Scientific Inquiry Research & Exploratory Investigation Scientific inquiry is a way to investigate things, events.

Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014.

 Collecting Quantitative  Data  By: Zainab Aidroos.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Recognizing Activities of Daily Living from Sensor Data Henry Kautz Department of Computer Science University of Rochester.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.

Fine-Grained Location Extraction from Tweets with Temporal Awareness Date:2015/03/19 Author:Chenliang Li, Aixin Sun Source:SIGIR '14 Advisor:Jia-ling Koh.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Vocabulary of Statistics Part One. Stastistics Original word came from: Original word came from: State Arithmetic.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.

Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts Zhe Zhao Paul Resnick Qiaozhu Mei Presentation Group 2.

Storylines from Streaming Text The Infinite Topic Cluster Model Amr Ahmed, Jake Eisenstein, Qirong Ho Alex Smola, Choon Hui Teo, Eric Xing Carnegie Mellon.

Recommender Systems. Recommender Systems (RSs) n RSs are software tools providing suggestions for items to be of use to users, such as what items to buy,

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

1 Company Proprietary and Confidential Copyright Info Goes Here Just Like This PRESENTATION: Facebook’s Revolution JUNE 2012 Company Proprietary and Confidential.

Kevin C. Chang. About the collaboration -- Cazoodle 2 Coming next week: Vacation Rental Search.

Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.

Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.

Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.

Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR

Jointly Modeling Aspects, Ratings and Sentiments for Movie Recommendation (JMARS) Authors: Qiming Diao, Minghui Qiu, Chao-Yuan Wu Presented by Gemoh Mal.

Collaborative Deep Learning for Recommender Systems

Section 9 Tagged File Support Design

An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

Accelerated Sampling for the Indian Buffet Process

Representing Documents Through Their Readers

Overview Social media applications inform, educate, and entertain people through online (multi-)media A social networking application allows users to create.

CHAPTER 4 Designing Studies

CHAPTER 4 Designing Studies

Michal Rosen-Zvi University of California, Irvine

CHAPTER 4 Designing Studies

CHAPTER 4 Designing Studies

CHAPTER 4 Designing Studies

CHAPTER 4 Designing Studies

CHAPTER 4 Designing Studies

CHAPTER 4 Designing Studies

Joint Label Inference in Networks

Presentation transcript:

Khalid El-Arini Carnegie Mellon University Joint work with: Ulrich Paquet, Ralf Herbrich, Jurgen Van Gael, Blaise Agüera y Arcas Transparent User Models for Personalization

Personalization is ubiquitous.

YouTube: 72+ hours/minute of new video Facebook: 950 million+ users Twitter: 400+ million tweets/day Shopping: [1994]: 500K unique consumer goods sold in U.S. [2010]: Amazon alone offered 24 million. 3 Personalization is invaluable. Keyword search is not enough.

Personalization is often wrong.

- J. Zaslow, November 26, 2002 “Basil…is not a neo-Nazi. Lukas…is not a shadowy stalker. David…is not Korean. intent on giving them such labels.”

“there's just one way to change its mind: outfox it.” - J. Zaslow, November 26, 2002 What recourse do we have? Can we do better?

You behave like a vegan hipster Vegan? Really? Why? You: tweeted with #meatlessmonday … We propose an alternative. Why am I getting this?

We propose an alternative. Why am I getting this? You behave like a Brooklyn hipster Goal: Achieve transparency via interpretable user features, learned from user activity

You behave like a Brooklyn hipster Goal: Achieve transparency via interpretable user features, learned from user activity Badges

10 ApproachModelExperimentsSummary

11 1. Define a vocabulary of badges Apple fanboy … veganrunnerphotographer Rich, interpretable and explainable

12 1. Define a vocabulary of badges 2. Identify exemplars How do I find vegans?

13 observed label Take advantage of how users describe themselves Take advantage of how users describe themselves

14 Most vegans don’t label themselves as “vegan” on Twitter… we want to infer the attributes of these users

15 1. Define a vocabulary of badges 2. Identify exemplars 3. Model characteristic behavior Hashtags#meatlessmonday Retweets

16 ApproachModelExperimentsSummary

We have no negative training examples. Use a generative model. Actions can be explained by multiple badges, even for the same user. Noisy-or to combine badges. How do we deal with user corrections? Observing a latent variable. Model sketch

18 i=1…B B badges

19 u=1…N i=1…B N users

20 u=1…N i=1…B F actions j=1…F

21 b i (u) u=1…N i=1…B Does user u have badge i? j=1…F

22 b i (u) λ i (u) u=1…N i=1…B j=1…F Does user u have label for badge i in his profile?

23 a j (u) b i (u) λ i (u) j=1…F u=1…N i=1…B Has user u performed action j? j=1…F

24 s ij a j (u) b i (u) λ i (u) j=1…F u=1…N i=1…B Does badge i explain action j?

25 s ij φ ij a j (u) b i (u) w i (u) αφαφ βφβφ j=1…F u=1…N i=1…B What’s the probability that a user with badge i performs action j?

26 s ij φ ij φ bg a j (u) b i (u) w i (u) αφαφ βφβφ j=1…F u=1…N i=1…B What is the background probability for each action?

27 s ij φ ij φ bg a j (u) b i (u) w i (u) αφαφ βφβφ j=1…F u=1…N i=1…B noisy or: Can at least one of my badges (or the background) explain it? noisy or: Can at least one of my badges (or the background) explain it?

28 s ij φ ij φ bg a j (u) b i (u) λ i (u) αφαφ βφβφ j=1…F u=1…N i=1…B

29 s ij φ ij φ bg a j (u) b i (u) λ i (u) αφαφ βφβφ j=1…F u=1…N i=1…B Beta priors to control sparsity

30 s ij φ ij φ bg a j (u) b i (u) λ i (u) γiTγiT γiFγiF αφαφ βφβφ αTαT βTβT αFαF βFβF j=1…F u=1…N i=1…B Beta prior to encode low recall (e.g., 10%) Beta prior to encode high precision (e.g., 99.9%) Beta prior to encode high precision (e.g., 99.9%)

31 ηiηi s ij φ ij φ bg a j (u) b i (u) λ i (u) γiTγiT γiFγiF ωiωi αφαφ βφβφ αηαη βηβη αωαω βωβω αTαT βTβT αFαF βFβF j=1…F u=1…N i=1…B

Collapsed Gibbs sampler (with MH steps) 32 Inference s ij φ ij φ bg b i (u)

33 ηiηi s ij φ ij φ bg a j (u) b i (u) λ i (u) γiTγiT γiFγiF ωiωi αφαφ βφβφ αηαη βηβη αωαω βωβω αTαT βTβT αFαF βFβF j=1…F u=1…N i=1…B You behave like a veganhipster.

34 ηiηi s ij φ ij φ bg a j (u) b i (u) λ i (u) γiTγiT γiFγiF ωiωi αφαφ βφβφ αηαη βηβη αωαω βωβω αTαT βTβT αFαF βFβF j=1…F u=1…N i=1…B You behave like a veganhipster.

35 ApproachModelExperimentsSummary

Start with 7 million Twitter users Manually define 31 sample badges by specifying labels 36 Data description

Start with 7 million Twitter users Manually define 31 sample badges by specifying labels Gather 2 million tweets from August 2011 Recall: actions are hashtags and retweets Remove infrequent actions and inactive users, leaving us with: 75,880 users 32,030 actions Data description

38 artist photographer country music fan book worm Badge statistics

39 Can we learn badges?

40 Vegetarian badge

41 Runner badge

42 Hacker badge

43 Manchester United badge

44 Do all badges look this good? No, but most do.

45 wine lover Over-generalized

46 Overwhelmed Ruby on Rails

47 Can we just use the labels directly?

48 Inferred Apple fanboy badge Self-described Apple fanboys

Compare to labeled LDA [Ramage+ 2009] –LDA extension where each document is labeled with multiple tags –One-to-one mapping between topics and tags –Document explained only by topics associated with its tags Hold out random 10% of labels, treat as ground truth, and try to predict them 49 Comparative Analysis

50 Rank of held-out labels better Better predictive performance Better predictive performance

51 better Better predictions for active users

52 Sparse badges Apple fanboy (badges)Apple fanboy (l-lda)

53 ApproachModelExperimentsSummary

54 Leveraged how users describe themselves

55 Leveraged how users describe themselves to build interpretable user features You behave like a vegan hipster

56 Empirically showed we can infer a user’s attributes from his behavior

57 谢谢

What recourse do we have? Collaborative filtering Content-based filtering Can we do better?

59 Most vegans don’t label themselves as “vegan” on Twitter… …but what about non-vegans? “I drink too much and hate vegans.”