Download presentation
Presentation is loading. Please wait.
1
Learning Profiles from User Interactions
Pelin Atahan and Sumit Sarkar School of Management, The University of Texas at Dallas The University of Texas at Dallas
2
Introduction Personalization systems tailor content and services to individuals Consider vendor selling products through its website Personalize recommendations Learn profiles based on links visited by a user user visits a link (l) to which 70% of visitors are male predict user is male with probability 0.7 and revise this probability as the user navigates through the website, i.e., clicks on other links The University of Texas at Dallas
3
Research Framework Learn profiles for targeting purposes
personal profiles – demographic, psychographic, geographic attributes predetermined set of attributes, e.g., gender, income, risk taker Profile representation –attribute values with relevant probabilities for attribute “gender” (G) profile maybe represented as P(G=m│l)=0.7, and P(G=f │l)=0.3 for attribute “risk taker” (R) with values risk taker (r), conservative (c), P(R=r│l)=0.6, P(R=c│l)=0.4 Probabilistic representation
4
Data Requirements Data requirements – link level statistics only (for all links) examples: P(G=m│”finance” link)=0.7, P(G=f│”finance” link)=0.3 P(R=r│”finance” link)=0.6 and P(R=c│”finance” link)=0.4 Data can be acquired from one of the following sources registered users, if available sampling – explicitly asking a subset of users professional market research agencies like comScore, Claritas, and Nielsen/Net Ratings.
5
Research Problems Learn the personal profile of a user based on links traversed during a session Two types of learning considered Learning profiles passively by observing links traversed Learning profiles quickly by dynamically determining links available on a page The University of Texas at Dallas
6
Literature Review Primarily study profiling in information retrieval context user interests identifying interesting pages based on pages visited. profiles represented as feature (term) vectors Montgomery (2001) address learning demographic profiles from websites visited by a user approach is faulty (conditioning is incorrect) Baglioni et al. (2003) address identifying the gender of a user based on links visited consider a subset of pages apply several classification models The University of Texas at Dallas
7
Passive Learning Consider, Yahoo wants to learn the gender of a user who is traversing its website user clicks on the following links the “finance” link (l1) the “investing ideas” link (l2) the “insurance” link (l3) the “sports” link (l4) problem: To determine the probability that the visitor is male (or female) given this clickstream { l1, l2, l3, l4} P(G=m│l1’ l2, l3, l4) In general, for attribute (A) and clickstream { l1, l2, …, ln} P(A=ai│l1’ l2, …, ln) The University of Texas at Dallas
8
Passive Learning Cont’d
Use Bayes formula where Assume conditional independence, i.e., probability of clicking a link is independent of the probability of clicking another link, when the user profile is known The University of Texas at Dallas
9
Passive Learning Cont’d
Directly estimating P(lj│ai) is difficult From Bayes formula We obtain The University of Texas at Dallas
10
Passive Learning Cont’d
After algebraic manipulations, we get: We can learn customer profile from simple link statistics The process is not computationally intensive The University of Texas at Dallas
11
Illustrative Example Consider the following site priors and link probabilities P(m│l1’ l2, l3, l4)= 0.91 and P(f│l1’ l2, l3, l4)= 0.09. site priors P(m)=0.45 P(f)=0.55 finance link (l1), P(m│l1)=0.6 P(f│l1)=0.4 investing ideas link (l2), P(m│l2)=0.7 P(f│l2)=0.3 insurance link (l3), P(m│l3)=0.4 P(f│l3)=0.6 sports link (l4) P(m│l4)=0.7 P(f│l4)=0.3 The University of Texas at Dallas
12
Learning Profiles in Real Time
What happens when the user clicks on a new link? NBA scoreboard link (l5) Incremental belief revision LH – denotes the link history (links clicked prior to the last click) The University of Texas at Dallas
13
Incremental Revision Example
P(m│LH, l5)=? P(m│LH)=P(m│l1’ l2, l3, l4)= 0.91 and P(f│LH)=0.09 P(m│l5)=0.65 and P(f│l5)=0.35 P(m│LH,l5)= 0.96 and P(f│LH, l5)= 0.04 The University of Texas at Dallas
14
Active Learning of User Profiles
By learning profiles quickly, websites start getting the benefits sooner Learning is the reduction in uncertainty of profile attributes Our objective: Learn profiles quickly by carefully selecting the links to offer at each page (offer set) Information value of an offer set is measured as the expected information gain The number of links to offer (n) is predetermined Assume the user will click one of the links available Stop learning when expected additional information is not statistically significant The University of Texas at Dallas
15
Click Probabilities Conditional on an Offer Set
Offer set O={o1,o2,…,on} We estimate P’(lj│ai) for each attribute value and each link in the offer set. From Bayes rule: We need some measure of the likelihood of a link being clicked, P(lj). does not need to be absolute, a relative measure is sufficient e.g., number of clicks a link gets per month The University of Texas at Dallas
16
Belief Revision Conditional on an Offer Set
Manipulating the above expression we get: P’(ai│LH) corresponds to the prior on the attribute value at each iteration The University of Texas at Dallas
17
Information Gain Given a Link is Clicked
Information gain: Defined as the reduction in entropy of attribute’s distribution given a link is clicked Entropy prior to a click Entropy given a link is clicked The University of Texas at Dallas
18
Expected Information Gain Given an Offer Set
When n links are offered P’(lj│LH) is the probability of a link being clicked given the offer set The University of Texas at Dallas
19
Optimal Offer Set-One Step Look Ahead
Prior entropy is constant given the link history We can determine optimal offer set that minimizes the expected entropy The University of Texas at Dallas
20
Illustrative Example The user has visited the “finance” link and
There are three possible links to consider Offer set size n=2 Three possible offer sets: O1={o1, o2}, O2={o1, o3}, O3={o2, o3}. EI(G│lj, O1)=0.06 EI(G│lj, O2)=0.18 EI(G│lj, O3)=0.04 Offering O2 is optimal LH=“finance” link P(m│LH)=0.6 P(f│LH)=0.4 “Investing ideas” link (l1), P(m│l1)=0.7 P(f│l1)=0.3 P(l1)=0.2 “Insurance” link (l2), P(m│l2)=0.4 P(f│l2)=0.6 P(l2)=0.3 “Family and home” link (l3), P(m│l3)=0.2 P(f│l3)=0.8 P(l3)=0.3 The University of Texas at Dallas
21
Determining the Optimal Offer Set
The number of potential offer sets to evaluate could be very large For a site with M links and offer set size n, number of possible combinations: E.g. for M = 100 and n = 10, there are more than 17 trillion combinations The University of Texas at Dallas
22
Heuristic Approach to Determine the Optimal Offer Set
Consider the expected entropy expression for learning the gender (n = 2) P’(lj,ai), is proportional to P(lj,ai), the joint distribution of the aggregate link probabilities The University of Texas at Dallas
23
Heuristic Approach to Determine the Optimal Offer Set
To select n links to offer For each attribute value, select link that maximizes P(ai,lj) If more links needed, evaluate links with the next highest joint probability Continue until all n links have been determined. The University of Texas at Dallas
24
Discussions Assumption: the probability of clicking a link is conditionally independent of the probability of clicking other links. If this assumption does not hold for some links, we can group the correlated links into disjoint sets, use joint probabilities associated with these groups of links for belief revision, or use aggregate group level probability parameters to revise beliefs The University of Texas at Dallas
25
Discussions Assumption: the user will follow one of the links being offered. Other possibilities the user may leave the site the user may click the back button, and select a different link if there is a search engine available on the site, the user may submit a query and navigate to the results page The University of Texas at Dallas
26
Conclusion Presented a framework for modeling user profiles for targeting purposes Showed how the profile can be learnt implicitly from the links traversed Showed how the learning process can be expedited by dynamically determining the offer set at each iteration Data requirements are reasonable Computationally not intensive The University of Texas at Dallas
27
On-going Work Solution approaches to the optimal offer set selection problem – refine heuristic Validate the models Extend the model to learn multiple attributes simultaneously The University of Texas at Dallas
28
Thank you! The University of Texas at Dallas
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.