WEB PERSONALIZATION NLP Course Seminar Group 14 Vishaal Jatav (04d05013) Varun Garg (04d05015)‏

Roadmap Motivation Introduction The Personalization Process Personalization Approaches Personalization Techniques Issues Conclusion

Motivation Some Facts  Overwhelming amount of information on web  Not all the documents are relevant to the user  Users cannot convey their information needs  Users never find any document 100% relevant Users expect more personal behavior  I don't want results of Delhi when I am in Bombay.  I was looking for crane (the bird) not crane (the machine).

Google Customization

Google (without personalization)‏

Google (with personalization)‏

Google Search History

Introduction Personalization  React differently to different users  System reacts in a way the users want it to  Ultimately bring back the user to the system Web Personalization  Apply machine learning and data mining  Build models of user behavior (called profiles)‏  Predict user's needs and expectations  Adaptively estimate better models

The Personalization Process Consider the following pieces of information  Geographical Location  Age, gender, ethnicity, religion, etc.  Interests  Previous reviews on products ...... How could these pieces of information help? How to collect these information?

The Personalization Process (Contd...)‏ Collect lots of information on the user behavior  Information must be attributable to a single user Decide on a user model  Featuring user needs, lifestyle, situations, etc. Create user profile for each user of the system  Profile captures the individuality of the user Habits, browsing behavior, lifestyle, etc. With every interaction, modify the user profile

The Personalization Process More Formally Web is a collection of n items I = {i 1,i 2,....i n } User comes from a set U = {u 1,u 2,...u m } User has rated each item by r uk : I → [0,1] U !  where, i j = ! means i j is not rated by the user I k (u) is set of items not yet rated by user u k I k (r) is set of items rated by user u k GOAL: recommend items i j to user u a that are present in I a (u), which might be of his interest

Classification of Personalization Approaches Individual Vs Collaborative Reactive Vs Proactive User Vs Item Information

Classification of Personalization Approaches Individual Vs Collaborative Individual approach (Google Personalized Search)‏  Use only individual user's data  Generate user profile by analyzing User's browsing behavior User's active feedback on the system  Advantage Can be implemented on the client-side - no privacy violation  Disadvantage Based only on past interactions – lack of serendipity

Classification of Personalization Approaches Individual Vs Collaborative Contd... Collaborative approach (Amazon recommendations)‏  Find the neighborhood of the active user  React according to an assumption If A is like B, then B likes the same things as A likes  Disadvantages New item rating problem New user problem  Advantage Better than individual approach - Once the two problems are solved.

Classification of Personalization Approaches Reactive Vs Proactive Reactive approach  Explicitly ask user for preferences Either in the form of query or feedback Proactive approach  Learn user preferences by user behavior No explicit preference demand from the user  Behavior is extracted Click-through rates Navigational pattern

Classification of Personalization Approaches User Vs Item Information User Information  Geographic location (from IP address)‏  age, gender, marital status, etc (explicit query)‏  Lifestyle, etc. (inference from past behavior)‏ Item Information  Content of Topics – movie genre, etc.  Product/ domain ontology

Personalization Techniques Content-Based Filtering Collaborative Filtering Model Based Personalization  Rule based  Graph theoretic  Language Model

Content-Based Filtering Syskill and Webert use explicit feedback  Individual, Reactive, Item-information  Uses naïve Bayes to distinguish likes from dislikes  Initial probabilities updated with new interactions  Uses 128 most informative words from each item Letizia uses implicit feedback  Individual, Proactive, Item-information  Find likes/dislikes based on tf-idf similarity Others use nearest-neighborhood for similarity

Collaborative Filtering Found successful in recommendation systems General Technique  For every user, a user neighborhood is computed Neighborhood contains users who have rated several items almost equally  Get candidate items for recommendations Items seen by the neighborhood but not by active user u a Data is stored in the form of a rating matrix  Items as rows and users as columns

Collaborative Filtering Contd.... System must provide the following algorithms  Measure similarity between users For creation of the neighborhood Pearson and Spearman Correlation, cosine similarity, etc.  Predicting rank of the item not rated by the user To decide order with which these items will be presented Weighted sum of ranks – most common  Select neighborhood subset for prediction To reduce large amount of computation Threshold in similarity value – most common

Model Based Personalization Approaches Executed in two stages  Offline process – to create the actual model  Online process – using the model and interaction Common data used for model generation  Web usage data (web history, click-through rates, etc.)‏  Item's structure and content data Examples  Rule-Based Models  Graph-Theoretic Models  Language Models

Model Based Personalization Rule Based Models Association rule-based  Item i a is in unordered association with i b  If user considers i b, then i a is a good recommendation Sequence rule-based  Item i a is in sequential association with i b  If user considers i a, then i b is a good recommendation Association between items can be stored as a dependency graph

Model Based Personalization Graph Theoretic Model Ratings data is transformed into a directed graph  Nodes are users  A edge between u i and u j means that u i predicts u j  Weights on edges represents the predictability To predict if an item i k will be of interest to u i  Calculate shortest path from u i to any user u r Where u r has rated i k  Predicted rating is calculated as a function of path between u i and u r

Model Based Personalization Language Modeling Approaches Without using user's relevance feedback  Simple language modeling Using user's relevance feedback  N gram based methods  Noisy channel model based method

Language Model Approach Simple Language Modeling Without using user's feedback History consists of all the words in the past queries Learn User Profile as {(w 1,P(w 1 )),... (w n,P(w n ))} where

Language Model Approach Simple Language Modeling Sample User profile

Language Model Approach Simple Language Modeling Re-ranking of unpersonalized results  Re-ranking is done according to P(Q|D,u)‏ α Is a weighter parameter between 0 and 1 UP is user profile

Language Model Approach N gram based approach Using user's relevance feedback Learn User Profile  Let H u represent the search history of user u H = {(q 1, rf 1 ), (q 2, rf 2 ), (q 3, rf 3 ),...., (q n, rf n )}  Unigram Now the user profile consists of {(w 1, P(w 1 )), (w 2, P(w 2 )), (w 3, P(w 3 )),...., (w n, P(w n ))}

Language Model Approach N gram based approach Sample Unigram User Profile

Language Model Approach N gram based approach  Bigram the user profile consists of {(w 1 w 2, P(w 2 |w 1 )), (w 2 w 3, P(w 3 |w 2 )),..., (w n-1 w n, P(w n |w n-1 ))}

Language Model Approach N gram based approach Sample Bigram User Profile

Language Model Approach N gram based approach Re-ranking unpersonalized results  Based on unigram (α = weighting parameter)‏ Q = q 1 q 2 q 3.... q n P(q 1 q 2 q 3.... q n )= P(q 1 ) P(q 2 ) P(q 3 )....... P(q n )‏

Language Model Approach N gram based approach  Based on bigrams Q = q 1 q 2 q 3.... q n P(q 1 q 2 q 3.... q n )= P(q 1 |q 2 ) P(q 2 |q 3 )....... P(q n-1 |q n )‏

Language Model Approach Noisy Channel based approach With using User's Feedback (Implicit)‏ User history is represented as  H i = (Q 1,D 1 ), (Q 2,D 2 ),.... (Q N,D N )‏  D i is the document visited for Q i  D consists of words w 1, w 2,.... w m Basic Idea – Statistical Machine Translation  Given Parallel Text of languages S and T  We get P(t i |s i ) ∀ s i S and t i T  Using EM we get the optimized model P(T|S)‏

Language Model Approach Noisy Channel based approach Similarly  T = past queries Q 1, Q 2,.... Q K  S = text of relevant documents for queries T  We learn the model P(Q|D) or more precisely P(q i |w j )‏ Assumption  Translate the ideal [information containing] document into a query  Document – a verbose language  Query – a compact language User profile is stored as  Tuples

Language Model Approach Noisy Channel based approach Sample Noisy Channel User Profile

Language Model Approach Noisy Channel based approach Re-ranking  Re-rank the documents using P(Q|D,u)‏ α = weighting parameter P(q i |GE) is the lexical probability of q i

Issues in Personalization Cold Start Problem (new user problem)‏ Latency Problem (new item problem)‏ Data sparseness Scalability Privacy Recommendation List Diversity Robustness

Conclusion Web personalization is the need of the hour for e-businesses A relatively new research topic  Several issues are yet to be solved effectively Data should be collected without evading user privacy Creating user models effectively and scaling it to the size of a large number of users/ items is at the core of Personalization

Bibliography Rohini U, Vamshi Ambati and Vasudeva Varma. Statistical Machine Translation Models for Personalized Search. In the Proceedings of 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008), January 7-12, 2008, Hyderabad, India. Sarabjot S. Anand and Bamshad Mobasher. Intelligent techniques for web personalization. In Intelligent Techniques for Web Personalization, pages 1-36. Springer, 2005. Vasudeva Verma. Personalization in Information Retrieval, Extraction and Access. In Workshop On Ontology, NLP, Personalization And IE/IR - IIT Bombay, Mumbai 15-17 July 2008 http://en.wikipedia.org/wiki/Personalisation Snapshots from Google Inc.

Questions

WEB PERSONALIZATION NLP Course Seminar Group 14 Vishaal Jatav (04d05013) Varun Garg (04d05015)‏

Similar presentations

Presentation on theme: "WEB PERSONALIZATION NLP Course Seminar Group 14 Vishaal Jatav (04d05013) Varun Garg (04d05015)‏"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

WEB PERSONALIZATION NLP Course Seminar Group 14 Vishaal Jatav (04d05013) Varun Garg (04d05015)‏

Similar presentations

Presentation on theme: "WEB PERSONALIZATION NLP Course Seminar Group 14 Vishaal Jatav (04d05013) Varun Garg (04d05015)‏"— Presentation transcript:

Similar presentations

About project

Feedback