Recommender Systems Session C Robin Burke DePaul University Chicago, IL
Roadmap Session A: Basic Techniques I Session A: Basic Techniques I –Introduction –Knowledge Sources –Recommendation Types –Collaborative Recommendation Session B: Basic Techniques II Session B: Basic Techniques II –Content-based Recommendation –Knowledge-based Recomme –Knowledge-based Recommendation Session C: Domains and Implementation I Session C: Domains and Implementation I –Recommendation domains –Example Implementation – –Lab I Session D: Evaluation I Session D: Evaluation I –Evaluation Session E: Applications Session E: Applications –User Interaction –Web Personalization Session F: Implementation II Session F: Implementation II –La –Lab II Session G: Hybrid Recommendation Session G: Hybrid Recommendation Session H: Robustness Session H: Robustness Session I: Advanced Topics Session I: Advanced Topics –Dynamics –Beyond accuracy
New schedule Tuesday Tuesday –15:00-18:00 Session C and part of Session E –18:00-20:00 Independent lab (programming) Wednesday Wednesday –8:00-11:00 Session D (Evaluation) –11:15-13:00 Rest of Session E –14:30-16:00 Session H (Seminar room IST) –17:00-19:00 Session G Programming assignment Programming assignment Thursday Thursday –8:00-9:45 Session I –10:00-11:00 Exam
Activity With your partner With your partner Come up with a domain for recommendation Come up with a domain for recommendation –Cannot be music music movies movies books books restaurants restaurants Can't be already the topic of your research Can't be already the topic of your research 10 minutes 10 minutes
Domains?
Characteristics Heterogeneity Heterogeneity –the diversity of the item space Risk Risk –the cost associated with system error Churn Churn –the frequency of changes in the item space Interaction style Interaction style –how users interact with the recommender Preference stability Preference stability –the lifetime of user preferences Scrutability Scrutability –the requirement for transparency Portfolio Portfolio –whether recommendation needs to take history into account Novelty Novelty –the need for novel / atypical items
Heterogeneity How broad is the range of recommended items? How broad is the range of recommended items? Example Example –Netflix movies / TV shows movies / TV shows diversity of subject matter, etc. diversity of subject matter, etc. still essentially serving the same goal: entertainment still essentially serving the same goal: entertainment relatively homogeneous relatively homogeneous –Amazon.com everything from books to electronics to gardening tools everything from books to electronics to gardening tools many different goals to be satisfied many different goals to be satisfied relatively heterogeneous relatively heterogeneous
Considerations Homogeneous items can have standardized descriptions Homogeneous items can have standardized descriptions –movies have actors, directors, plot summary, etc. –possible to develop a solid body of content data Heterogeneous items will be harder to represent Heterogeneous items will be harder to represent
Impact Content knowledge is a problem in heterogeneous domains Content knowledge is a problem in heterogeneous domains –hard to develop a good schema that represents everything –hard to cover all items with useful domain knowledge Social knowledge is one’s best bet Social knowledge is one’s best bet
Risk Some products are inherently low risk Some products are inherently low risk –99 cent music track Some are not low risk Some are not low risk –a house We mean We mean –the cost of a false positive accepted by the user Sometimes false negatives are also costly Sometimes false negatives are also costly –scientific research –legal precedents
Considerations In a low risk domain In a low risk domain –it doesn’t matter so much how we choose –user will be less likely to have strong constraints In a high risk domain In a high risk domain –important to gather more information about exactly what the requirements are
Impact Pure social recommendation will not work so well for high risk domains Pure social recommendation will not work so well for high risk domains –inability to take constraints into account –possibility of bias Knowledge-based recommendation has great potential in high risk Knowledge-based recommendation has great potential in high risk –knowledge engineering costs worthwhile –user’s constraints can be employed
Churn High churn means that items come and go quickly High churn means that items come and go quickly –news Low churn items will be around for awhile Low churn items will be around for awhile –books In the middle In the middle –restaurants –package vacations
Considerations New item problem New item problem –constant in high churn domains –Consequence difficult to build up a history of opinions difficult to build up a history of opinions Freshness may matter Freshness may matter –a good match from yesterday might be worse than –a weaker match from today
Impact Difficult to employ social knowledge alone Difficult to employ social knowledge alone –items won’t have big enough profiles Need flexible representation for content data Need flexible representation for content data –since catalog characteristics aren’t known in advance
Interaction style Some recommendation scenarios are passive Some recommendation scenarios are passive –the recommender produces content as part of a web site Others are active Others are active –the user makes a direct request Sometimes a quick hit is important Sometimes a quick hit is important –mobile application Sometimes more extensive exploration is called for Sometimes more extensive exploration is called for –rental apartment
Considerations Passive style means user requirements are harder to tap into Passive style means user requirements are harder to tap into –not necessarily impossible Brief interaction Brief interaction –means that only small amounts of information can be communicated Long interaction Long interaction –like web site browsing –may make up for deficiencies in passive data gathering
Impact Passive interactions Passive interactions –favor learning-based method –don’t need user requirements Active interactions Active interactions –favor knowledge-based interactions –other techniques don’t adapt quickly Extended, passive interaction Extended, passive interaction –allow large amounts of opinion data to be gathered
Preference stability Are users’ preferences stable over time? Are users’ preferences stable over time? Some taste domains may be consistent Some taste domains may be consistent –movies –music (purchasing) But others not But others not –restaurants –music (playlists) Not the same as churn Not the same as churn –that has to do with items coming and going
Considerations Preference instability makes opinion data less useful Preference instability makes opinion data less useful Approaches Approaches –temporal decay –contextual selection Preference stability Preference stability –large profiles can be built
Impact Preference instability Preference instability –opinion data will be sparse –knowledge-based recommendation may be better Preference stability Preference stability –best case for learning-based techniques
Scrutability “The property of being testable; open to inspection.” “The property of being testable; open to inspection.” –wiktionary Usually refers to explanatory capabilities of a recommender system Usually refers to explanatory capabilities of a recommender system Some domains need explanations of recommendations Some domains need explanations of recommendations –usually high risk –also domains where users are non-experts complex products like digital cameras complex products like digital cameras
Considerations Learning-based recommendations are hard to explain Learning-based recommendations are hard to explain –the underlying models are statistical –some research in this area but no conclusive “best way” to explain
Impact Knowledge-based techniques are usually more scrutable Knowledge-based techniques are usually more scrutable
Portfolio The “portfolio effect” occurs when an item is purchased or viewed The “portfolio effect” occurs when an item is purchased or viewed –and then is no longer interesting Not always the case Not always the case –I can recommend your favorite song again in a couple of days Sometimes recommendations have to take into account the entire history Sometimes recommendations have to take into account the entire history –investments, for example
Considerations A domain with the portfolio effect requires knowledge of the user’s history A domain with the portfolio effect requires knowledge of the user’s history –standard formulation of collaborative recommendation –only recommend items that are unrated Music recommender might need to know Music recommender might need to know –when a track was played –what a reasonable time-span between repeats –avoid over-rotation News recommendation News recommendation –tricky because new stories on same topic might be interesting –as long as there is new material
Impact A problem for content-based recommendation A problem for content-based recommendation –another copy of an item will match best –must have another way to identify overlap –or threshold “not too similar” Domain-specific requirements for rotation and portfolio composition Domain-specific requirements for rotation and portfolio composition –domain knowledge requirement
Novelty “Milk and bananas” “Milk and bananas” –two most-purchased items in US grocery stores Could recommend to everybody Could recommend to everybody –correct very frequently But... But... –not interesting –people know they want these things –profit margin low –recommender very predictable
Consideration Think about items Think about items –where the target users predicted rating is significantly higher than the average –where there is high variance (difference of opinion) These recommendations might be more valuable These recommendations might be more valuable –more “personal”
Impact Collaborative methods are vulnerable to the “tyranny of the crowd” Collaborative methods are vulnerable to the “tyranny of the crowd” –“Coldplay” effect May be necessary to May be necessary to –smooth popularity spikes –use thresholds
Categorize Domains 15 min 15 min 10 min Discussion 10 min Discussion
Break 10 minutes
Interaction Input Input –implicit –explicit Duration Duration –single response –multi-step Modeling Modeling –short-term –long-term
Recommendation Knowledge Sources Taxonomy Recommendation Knowledge Collaborative Content User Opinion Profiles Demographic Profiles Opinions Demographics Item Features Means-ends Domain Constraints Contextual Knowledge Requirements Query Constraints Preferences Context Domain Knowledge Feature Ontology
Also, Output How to present results to users? How to present results to users?
Input Explicit Explicit –ask the user what you want to know Queries Queries Ratings Ratings Preferences Preferences Implicit Implicit –gather information from behavior Ratings Ratings Preferences Preferences Queries Queries
Explicit Queries Query elicitation problem Query elicitation problem Problem Problem –How to get the user’s preferred features / constraints? Issues Issues –User expertise / terminology
Example Ambiguity Ambiguity –“madonna and child” Imprecision Imprecision –“a fast processor” Terminological mismatch Terminological mismatch –“an iTunes player” Lack of awareness Lack of awareness –(I hate lugging a heavy laptop)
Feature Lists Assume user familiarity Assume user familiarity
Recommendation Dialog Fewer questions Fewer questions –future questions can depend on current answers Mixed-initiative Mixed-initiative –recommender can propose solutions Critiquing Critiquing –examining solutions can help users define requirements –(more about critiquing later)
Implicit Evidence Watch user’s behavior Watch user’s behavior –infer preferences Benefit Benefit –no extra user effort –no terminological gap Typical sources Typical sources –web server logs more about this later more about this later –purchase / shopping cart history –CRM interactions
Problems Noise Noise –gift shopping –distractions on the web Interpretation Interpretation –visit = interest? –long stay = interest? –purchase but what about purchase and then return? but what about purchase and then return?
Tradeoffs ExplicitImplicit PlusDirect from user Expressive Less data needed Easy to gather No user effort MinusRequires user effort Requires user interface design May require user expertise Possibly noisy Challenges in interpretation
Modeling Short-term Short-term –usually we mean “single-session” Long-term Long-term –multi-session
Long-term Modeling Preferences with a long duration Preferences with a long duration –tend to be general 50s jazz vs Sonny Rollin’s albums on Prestige 50s jazz vs Sonny Rollin’s albums on Prestige –tend to be personally meaningful preference for non-smoking hotel rooms preference for non-smoking hotel rooms –may be not have an non-conscious component prefer the “look” of certain kinds of houses prefer the “look” of certain kinds of houses
Short-term Modeling What does the user want right now? What does the user want right now? –usually need some kind of query Preferences with short duration Preferences with short duration –may be very task-specific preference for a train that connects with my arriving flight preference for a train that connects with my arriving flight
Application design Have to consider the role of recommendation in the overall application Have to consider the role of recommendation in the overall application –how would the user want to interact? –how can the recommendation be delivered?
Simple Coding Exercise Recommender systems evaluation framework Recommender systems evaluation framework –a bit different than what you would use for a production system –goal to evaluate different alternatives
Three exercises Implement a simple baseline Implement a simple baseline –average prediction Implement a new similarity metric Implement a new similarity metric –Jaccard coefficient Evaluate results on a data set Evaluate results on a data set
Download Eclipse workspace file – –student-ws.zip
Structure Evaluator Predictor Set Profile Map RatingMovie DatasetReader
Predictor Predictor ThreePredictor PearsonPredictor AvePredictor JaccardPredictor initialize( ) predict( user, item )
Evaluator Evaluator MaeEvaluatorRmseEvaluator evaluate( )
Basic flow Create dataset reader for dataset Create dataset reader for dataset Read profiles Read profiles Create predictor using profiles Create predictor using profiles Create an evaluator for the predictor Create an evaluator for the predictor Call the evaluate method Call the evaluate method Output the evaluation statistic Output the evaluation statistic
PearsonPredictor Similarity caching – –We need to calculate each user’s similarity to the others anyway For each prediction – –Might as well do it only once standard time v space tradeoff
Data sets Four data sets – –Tiny (3 users) synthetic For unit testing – –Test (5 users) Also synthetic For quick tests – –U-filtered Subset of MovieLens dataset Standard for recommendation research – –u Full MovieLens 100K dataset
Demo
Task Implement a better baseline ThreePredict is weak – –Better to use the item average
AvePredictor Non-personalized prediction – –What Amazon.com shows Idea – –Cache the average score for an item – –When predict(user, item) is called Ignore the target user Better idea – –Norm for user average – –Calculate the average deviation for the item above each user’s average – –Average these deviations – –Add to target user’s average
Existing unit test
Compare With Pearson predictor
Process Class time scheduled for 18:00-20:00 Use this time to complete the assignment Due before class tomorrow Work in pairs if you prefer – –Submit by – –subject line: GRAZ H1 – –body: names of students – –attach: AvePredictor.java