Learning Profiles from User Interactions

Slides:

Advertisements

Similar presentations

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Advertisements

Naive Bayes Classifiers, an Overview By Roozmehr Safi.

LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.

Minimum Redundancy and Maximum Relevance Feature Selection

Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung.

Planning under Uncertainty

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Lecture 5: Learning models using EM

Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Presented by Zeehasham Rasheed

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Abstract Introduction Results and Discussions James Kasson  (Dr. Bruce W.N. Lo)  Information Systems  University of Wisconsin-Eau Claire In a world.

Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI.

User Models for Personalization Josh Alspector Chief Technology Officer.

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.

PRIMARY MARKET RESEARCH Rehabilitation Engineering Research Center on Technology Transfer Training Module #4.

Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.

1 Demand for Repeated Insurance Contracts with Unknown Loss Probability Emilio Venezian Venezian Associates Chwen-Chi Liu Feng Chia University Chu-Shiu.

Real-Time Simultaneous Localization and Mapping with a Single Camera (Mono SLAM) Young Ki Baik Computer Vision Lab. Seoul National University.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)

Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.

Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:

Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Data Transformation: Normalization

Chapter 7. Classification and Prediction

Market-Risk Measurement

Live Customer Support Solution

12. Principles of Parameter Estimation

CS b659: Intelligent Robotics

Reading Notes Wang Ning Lab of Database and Information Systems

Discover How Your Business Can Benefit from a Facebook Fanpage

Discover How Your Business Can Benefit from a Facebook Fanpage

SSL Chapter 4 Risk of Semi-supervised Learning: How Unlabeled Data Can Degrade Performance of Generative Classifiers.

Web Mining Ref:

Martin Rajman, Martin Vesely

Learning Sequence Motif Models Using Expectation Maximization (EM)

Data Mining Lecture 11.

When Security Games Go Green

Roberto Battiti, Mauro Brunato

Sampling and Sampling Distributions

Author: Kazunari Sugiyama, etc. (WWW2004)

Data Mining for Business Analytics

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Statistical NLP: Lecture 4

Towards a Personal Briefing Assistant

Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Markov Random Fields Presented by: Vladan Radosavljevic.

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

LECTURE 07: BAYESIAN ESTIMATION

Machine Learning: Lecture 6

BN Semantics 3 – Now it’s personal! Parameter Learning 1

Machine Learning: UNIT-3 CHAPTER-1

12. Principles of Parameter Estimation

Helpful Things To Know For Successful Digital Marketing Strategy Presented By:- Abhinav Shashtri.

Presentation transcript:

Learning Profiles from User Interactions Pelin Atahan and Sumit Sarkar School of Management, The University of Texas at Dallas pxa041000@utdallas.edu, sumit@utdallas.edu The University of Texas at Dallas

Introduction Personalization systems tailor content and services to individuals Consider vendor selling products through its website Personalize recommendations Learn profiles based on links visited by a user user visits a link (l) to which 70% of visitors are male predict user is male with probability 0.7 and revise this probability as the user navigates through the website, i.e., clicks on other links The University of Texas at Dallas

Research Framework Learn profiles for targeting purposes personal profiles – demographic, psychographic, geographic attributes predetermined set of attributes, e.g., gender, income, risk taker Profile representation –attribute values with relevant probabilities for attribute “gender” (G) profile maybe represented as P(G=m│l)=0.7, and P(G=f │l)=0.3 for attribute “risk taker” (R) with values risk taker (r), conservative (c), P(R=r│l)=0.6, P(R=c│l)=0.4 Probabilistic representation

Data Requirements Data requirements – link level statistics only (for all links) examples: P(G=m│”finance” link)=0.7, P(G=f│”finance” link)=0.3 P(R=r│”finance” link)=0.6 and P(R=c│”finance” link)=0.4 Data can be acquired from one of the following sources registered users, if available sampling – explicitly asking a subset of users professional market research agencies like comScore, Claritas, and Nielsen/Net Ratings.

Research Problems Learn the personal profile of a user based on links traversed during a session Two types of learning considered Learning profiles passively by observing links traversed Learning profiles quickly by dynamically determining links available on a page The University of Texas at Dallas

Literature Review Primarily study profiling in information retrieval context user interests identifying interesting pages based on pages visited. profiles represented as feature (term) vectors Montgomery (2001) address learning demographic profiles from websites visited by a user approach is faulty (conditioning is incorrect) Baglioni et al. (2003) address identifying the gender of a user based on links visited consider a subset of pages apply several classification models The University of Texas at Dallas

Passive Learning Consider, Yahoo wants to learn the gender of a user who is traversing its website user clicks on the following links the “finance” link (l1) the “investing ideas” link (l2) the “insurance” link (l3) the “sports” link (l4) problem: To determine the probability that the visitor is male (or female) given this clickstream { l1, l2, l3, l4} P(G=m│l1’ l2, l3, l4) In general, for attribute (A) and clickstream { l1, l2, …, ln} P(A=ai│l1’ l2, …, ln) The University of Texas at Dallas

Passive Learning Cont’d Use Bayes formula where Assume conditional independence, i.e., probability of clicking a link is independent of the probability of clicking another link, when the user profile is known The University of Texas at Dallas

Passive Learning Cont’d Directly estimating P(lj│ai) is difficult From Bayes formula We obtain The University of Texas at Dallas

Passive Learning Cont’d After algebraic manipulations, we get: We can learn customer profile from simple link statistics The process is not computationally intensive The University of Texas at Dallas

Illustrative Example Consider the following site priors and link probabilities P(m│l1’ l2, l3, l4)= 0.91 and P(f│l1’ l2, l3, l4)= 0.09. site priors P(m)=0.45 P(f)=0.55 finance link (l1), P(m│l1)=0.6 P(f│l1)=0.4 investing ideas link (l2), P(m│l2)=0.7 P(f│l2)=0.3 insurance link (l3), P(m│l3)=0.4 P(f│l3)=0.6 sports link (l4) P(m│l4)=0.7 P(f│l4)=0.3 The University of Texas at Dallas

Learning Profiles in Real Time What happens when the user clicks on a new link? NBA scoreboard link (l5) Incremental belief revision LH – denotes the link history (links clicked prior to the last click) The University of Texas at Dallas

Incremental Revision Example P(m│LH, l5)=? P(m│LH)=P(m│l1’ l2, l3, l4)= 0.91 and P(f│LH)=0.09 P(m│l5)=0.65 and P(f│l5)=0.35 P(m│LH,l5)= 0.96 and P(f│LH, l5)= 0.04 The University of Texas at Dallas

Active Learning of User Profiles By learning profiles quickly, websites start getting the benefits sooner Learning is the reduction in uncertainty of profile attributes Our objective: Learn profiles quickly by carefully selecting the links to offer at each page (offer set) Information value of an offer set is measured as the expected information gain The number of links to offer (n) is predetermined Assume the user will click one of the links available Stop learning when expected additional information is not statistically significant The University of Texas at Dallas

Click Probabilities Conditional on an Offer Set Offer set O={o1,o2,…,on} We estimate P’(lj│ai) for each attribute value and each link in the offer set. From Bayes rule: We need some measure of the likelihood of a link being clicked, P(lj). does not need to be absolute, a relative measure is sufficient e.g., number of clicks a link gets per month The University of Texas at Dallas

Belief Revision Conditional on an Offer Set Manipulating the above expression we get: P’(ai│LH) corresponds to the prior on the attribute value at each iteration The University of Texas at Dallas

Information Gain Given a Link is Clicked Information gain: Defined as the reduction in entropy of attribute’s distribution given a link is clicked Entropy prior to a click Entropy given a link is clicked The University of Texas at Dallas

Expected Information Gain Given an Offer Set When n links are offered P’(lj│LH) is the probability of a link being clicked given the offer set The University of Texas at Dallas

Optimal Offer Set-One Step Look Ahead Prior entropy is constant given the link history We can determine optimal offer set that minimizes the expected entropy The University of Texas at Dallas

Illustrative Example The user has visited the “finance” link and There are three possible links to consider Offer set size n=2 Three possible offer sets: O1={o1, o2}, O2={o1, o3}, O3={o2, o3}. EI(G│lj, O1)=0.06 EI(G│lj, O2)=0.18 EI(G│lj, O3)=0.04 Offering O2 is optimal LH=“finance” link P(m│LH)=0.6 P(f│LH)=0.4 “Investing ideas” link (l1), P(m│l1)=0.7 P(f│l1)=0.3 P(l1)=0.2 “Insurance” link (l2), P(m│l2)=0.4 P(f│l2)=0.6 P(l2)=0.3 “Family and home” link (l3), P(m│l3)=0.2 P(f│l3)=0.8 P(l3)=0.3 The University of Texas at Dallas

Determining the Optimal Offer Set The number of potential offer sets to evaluate could be very large For a site with M links and offer set size n, number of possible combinations: E.g. for M = 100 and n = 10, there are more than 17 trillion combinations The University of Texas at Dallas

Heuristic Approach to Determine the Optimal Offer Set Consider the expected entropy expression for learning the gender (n = 2) P’(lj,ai), is proportional to P(lj,ai), the joint distribution of the aggregate link probabilities The University of Texas at Dallas

Heuristic Approach to Determine the Optimal Offer Set To select n links to offer For each attribute value, select link that maximizes P(ai,lj) If more links needed, evaluate links with the next highest joint probability Continue until all n links have been determined. The University of Texas at Dallas

Discussions Assumption: the probability of clicking a link is conditionally independent of the probability of clicking other links. If this assumption does not hold for some links, we can group the correlated links into disjoint sets, use joint probabilities associated with these groups of links for belief revision, or use aggregate group level probability parameters to revise beliefs The University of Texas at Dallas

Discussions Assumption: the user will follow one of the links being offered. Other possibilities the user may leave the site the user may click the back button, and select a different link if there is a search engine available on the site, the user may submit a query and navigate to the results page The University of Texas at Dallas

Conclusion Presented a framework for modeling user profiles for targeting purposes Showed how the profile can be learnt implicitly from the links traversed Showed how the learning process can be expedited by dynamically determining the offer set at each iteration Data requirements are reasonable Computationally not intensive The University of Texas at Dallas

On-going Work Solution approaches to the optimal offer set selection problem – refine heuristic Validate the models Extend the model to learn multiple attributes simultaneously The University of Texas at Dallas

Thank you! The University of Texas at Dallas