Collaborative Filtering: Searching and Retrieving Web Information Together Huimin Lu December 2, 2004 INF 385D Fall 2004 Instructor: Don Turnbull
Outline Introduction Collaborative Search Family Collaborative Filtering Systems Process Algorithm Problems & Solutions Privacy
Collaborative Search into IR World Inverted Index Yellow-pages-like information gateway & Internet search engine (Sun, 1999) Needs for collaborative retrieval Information-resources-focused systems - By CSCW: structuring mechanisms & recommendation techniques User-preferences-focused systems
Collaborative Search Types Collaborative browsing Mediated searching Collaborative information filtering Collaborative agents - meda-search engines Collaborative re-use of results (Setten, 2000)
Collaborative Filtering User-based filtering Collects the taste information from users who like to collaborate in the process of searching and automatically predict or filter the relevant information to users (Wikipedia, 2004). Store profile & preferences Build users’ database Recommended list by collaborative filter
Collaborative Filtering Systems Commercial - Amazon Amazon - Barnes and Noble - Netflix Non-commercial - MoonrankerMoonranker - MovieLensMovieLens - AmphetaRate - Audioscrobbler - Findory - Gnomoradio - iRATE radio
System Example I: Amazon.com Recommendation page Back
System Example II: Moonranker.com ranking page Back
System Example I: Movielens.com rating page Back
Collaborative Filtering Process
Collaborative Filtering Algorithm Goal - Suggest new items/predict the utility based on previous likings (Sarwar, 2001) Memory-based - use entire user-item database - Pearson-correlation based approach, vector similarity based approach, the extended generalized vector space model Model-based - develop a model of user rating - Bayesian network approach, the aspect model
Problems and Solutions Memory-based algorithm problems - Sparsity: insufficient user rating information - Scalability: nearest neighbor algorithm (compute user number and item number) - Solution: automatic weighting scheme by MSU & CMU Model-based algorithm problem - Inherent static structure: updating problem & learning exact cluster number and specifying user classes problem Systems problems - Scarcity: less rating for some items - Early-rater: no recommendations for new items - Solution: collaborative information filtering (communicating agents, correlating profile, and filterbots - automated rating robots)
Privacy Unsafe server-based system Monopolies Peer-to-peer architecture - Multi-party computation
Conclusion The computer environment turns to be more ubiquitous and pervasive. To meet IR user’s needs, future collaborative filtering system should be easily maintained with well-designed algorithms and highly-protected user privacy.
References
Questions or Comments?