Download presentation
Presentation is loading. Please wait.
Published byStewart Ford Modified over 9 years ago
1
WEB PERSONALIZATION NLP Course Seminar Group 14 Vishaal Jatav (04d05013) Varun Garg (04d05015)
2
Roadmap Motivation Introduction The Personalization Process Personalization Approaches Personalization Techniques Issues Conclusion
3
Motivation Some Facts Overwhelming amount of information on web Not all the documents are relevant to the user Users cannot convey their information needs Users never find any document 100% relevant Users expect more personal behavior I don't want results of Delhi when I am in Bombay. I was looking for crane (the bird) not crane (the machine).
4
Google Customization
5
Google (without personalization)
6
Google (with personalization)
7
Google Search History
9
Introduction Personalization React differently to different users System reacts in a way the users want it to Ultimately bring back the user to the system Web Personalization Apply machine learning and data mining Build models of user behavior (called profiles) Predict user's needs and expectations Adaptively estimate better models
10
The Personalization Process Consider the following pieces of information Geographical Location Age, gender, ethnicity, religion, etc. Interests Previous reviews on products ...... How could these pieces of information help? How to collect these information?
11
The Personalization Process (Contd...) Collect lots of information on the user behavior Information must be attributable to a single user Decide on a user model Featuring user needs, lifestyle, situations, etc. Create user profile for each user of the system Profile captures the individuality of the user Habits, browsing behavior, lifestyle, etc. With every interaction, modify the user profile
12
The Personalization Process More Formally Web is a collection of n items I = {i 1,i 2,....i n } User comes from a set U = {u 1,u 2,...u m } User has rated each item by r uk : I → [0,1] U ! where, i j = ! means i j is not rated by the user I k (u) is set of items not yet rated by user u k I k (r) is set of items rated by user u k GOAL: recommend items i j to user u a that are present in I a (u), which might be of his interest
13
Classification of Personalization Approaches Individual Vs Collaborative Reactive Vs Proactive User Vs Item Information
14
Classification of Personalization Approaches Individual Vs Collaborative Individual approach (Google Personalized Search) Use only individual user's data Generate user profile by analyzing User's browsing behavior User's active feedback on the system Advantage Can be implemented on the client-side - no privacy violation Disadvantage Based only on past interactions – lack of serendipity
15
Classification of Personalization Approaches Individual Vs Collaborative Contd... Collaborative approach (Amazon recommendations) Find the neighborhood of the active user React according to an assumption If A is like B, then B likes the same things as A likes Disadvantages New item rating problem New user problem Advantage Better than individual approach - Once the two problems are solved.
16
Classification of Personalization Approaches Reactive Vs Proactive Reactive approach Explicitly ask user for preferences Either in the form of query or feedback Proactive approach Learn user preferences by user behavior No explicit preference demand from the user Behavior is extracted Click-through rates Navigational pattern
17
Classification of Personalization Approaches User Vs Item Information User Information Geographic location (from IP address) age, gender, marital status, etc (explicit query) Lifestyle, etc. (inference from past behavior) Item Information Content of Topics – movie genre, etc. Product/ domain ontology
18
Personalization Techniques Content-Based Filtering Collaborative Filtering Model Based Personalization Rule based Graph theoretic Language Model
19
Content-Based Filtering Syskill and Webert use explicit feedback Individual, Reactive, Item-information Uses naïve Bayes to distinguish likes from dislikes Initial probabilities updated with new interactions Uses 128 most informative words from each item Letizia uses implicit feedback Individual, Proactive, Item-information Find likes/dislikes based on tf-idf similarity Others use nearest-neighborhood for similarity
20
Collaborative Filtering Found successful in recommendation systems General Technique For every user, a user neighborhood is computed Neighborhood contains users who have rated several items almost equally Get candidate items for recommendations Items seen by the neighborhood but not by active user u a Data is stored in the form of a rating matrix Items as rows and users as columns
21
Collaborative Filtering Contd.... System must provide the following algorithms Measure similarity between users For creation of the neighborhood Pearson and Spearman Correlation, cosine similarity, etc. Predicting rank of the item not rated by the user To decide order with which these items will be presented Weighted sum of ranks – most common Select neighborhood subset for prediction To reduce large amount of computation Threshold in similarity value – most common
22
Model Based Personalization Approaches Executed in two stages Offline process – to create the actual model Online process – using the model and interaction Common data used for model generation Web usage data (web history, click-through rates, etc.) Item's structure and content data Examples Rule-Based Models Graph-Theoretic Models Language Models
23
Model Based Personalization Rule Based Models Association rule-based Item i a is in unordered association with i b If user considers i b, then i a is a good recommendation Sequence rule-based Item i a is in sequential association with i b If user considers i a, then i b is a good recommendation Association between items can be stored as a dependency graph
24
Model Based Personalization Graph Theoretic Model Ratings data is transformed into a directed graph Nodes are users A edge between u i and u j means that u i predicts u j Weights on edges represents the predictability To predict if an item i k will be of interest to u i Calculate shortest path from u i to any user u r Where u r has rated i k Predicted rating is calculated as a function of path between u i and u r
25
Model Based Personalization Language Modeling Approaches Without using user's relevance feedback Simple language modeling Using user's relevance feedback N gram based methods Noisy channel model based method
26
Language Model Approach Simple Language Modeling Without using user's feedback History consists of all the words in the past queries Learn User Profile as {(w 1,P(w 1 )),... (w n,P(w n ))} where
27
Language Model Approach Simple Language Modeling Sample User profile
28
Language Model Approach Simple Language Modeling Re-ranking of unpersonalized results Re-ranking is done according to P(Q|D,u) α Is a weighter parameter between 0 and 1 UP is user profile
29
Language Model Approach N gram based approach Using user's relevance feedback Learn User Profile Let H u represent the search history of user u H = {(q 1, rf 1 ), (q 2, rf 2 ), (q 3, rf 3 ),...., (q n, rf n )} Unigram Now the user profile consists of {(w 1, P(w 1 )), (w 2, P(w 2 )), (w 3, P(w 3 )),...., (w n, P(w n ))}
30
Language Model Approach N gram based approach Sample Unigram User Profile
31
Language Model Approach N gram based approach Bigram the user profile consists of {(w 1 w 2, P(w 2 |w 1 )), (w 2 w 3, P(w 3 |w 2 )),..., (w n-1 w n, P(w n |w n-1 ))}
32
Language Model Approach N gram based approach Sample Bigram User Profile
33
Language Model Approach N gram based approach Re-ranking unpersonalized results Based on unigram (α = weighting parameter) Q = q 1 q 2 q 3.... q n P(q 1 q 2 q 3.... q n )= P(q 1 ) P(q 2 ) P(q 3 )....... P(q n )
34
Language Model Approach N gram based approach Based on bigrams Q = q 1 q 2 q 3.... q n P(q 1 q 2 q 3.... q n )= P(q 1 |q 2 ) P(q 2 |q 3 )....... P(q n-1 |q n )
35
Language Model Approach Noisy Channel based approach With using User's Feedback (Implicit) User history is represented as H i = (Q 1,D 1 ), (Q 2,D 2 ),.... (Q N,D N ) D i is the document visited for Q i D consists of words w 1, w 2,.... w m Basic Idea – Statistical Machine Translation Given Parallel Text of languages S and T We get P(t i |s i ) ∀ s i S and t i T Using EM we get the optimized model P(T|S)
36
Language Model Approach Noisy Channel based approach Similarly T = past queries Q 1, Q 2,.... Q K S = text of relevant documents for queries T We learn the model P(Q|D) or more precisely P(q i |w j ) Assumption Translate the ideal [information containing] document into a query Document – a verbose language Query – a compact language User profile is stored as Tuples
37
Language Model Approach Noisy Channel based approach Sample Noisy Channel User Profile
38
Language Model Approach Noisy Channel based approach Re-ranking Re-rank the documents using P(Q|D,u) α = weighting parameter P(q i |GE) is the lexical probability of q i
39
Issues in Personalization Cold Start Problem (new user problem) Latency Problem (new item problem) Data sparseness Scalability Privacy Recommendation List Diversity Robustness
40
Conclusion Web personalization is the need of the hour for e-businesses A relatively new research topic Several issues are yet to be solved effectively Data should be collected without evading user privacy Creating user models effectively and scaling it to the size of a large number of users/ items is at the core of Personalization
41
Bibliography Rohini U, Vamshi Ambati and Vasudeva Varma. Statistical Machine Translation Models for Personalized Search. In the Proceedings of 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008), January 7-12, 2008, Hyderabad, India. Sarabjot S. Anand and Bamshad Mobasher. Intelligent techniques for web personalization. In Intelligent Techniques for Web Personalization, pages 1-36. Springer, 2005. Vasudeva Verma. Personalization in Information Retrieval, Extraction and Access. In Workshop On Ontology, NLP, Personalization And IE/IR - IIT Bombay, Mumbai 15-17 July 2008 http://en.wikipedia.org/wiki/Personalisation Snapshots from Google Inc.
42
Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.