Yoda: An Accurate and Scalable Web-based Recommendation Systems Cyrus Shahabi, Farnoush Banaei-Kashani, Yi-Shin Chen, and Dennis McLeod Integrated Media.

Yoda: An Accurate and Scalable Web-based Recommendation Systems Cyrus Shahabi, Farnoush Banaei-Kashani, Yi-Shin Chen, and Dennis McLeod Integrated Media Systems Center and Computer Science Department, University of Southern California E-mail:{shahabi, banaeika, yishinc, mcleod}@usc.edu

Outline Motivation Related Work Content-based Filtering Collaborative Filtering Offline Process: Clustering, Voting, Aggregation Online Process: Classification & Aggregation Performance Evaluation Conclusion & Future Work

Motivation The amount of data is enormous on the Web Users suffer from information overload Recommendation systems can personalize and customize the Web environment in real-time Similar to Amazon.com “real-time” recommendations (people who bought this book also purchased …) Different approach (vs. association-rule mining) Challenges: Scalability : As the # of items and users grow, the system stay efficient Sparsity: Not enough information available on the user

Related Work: Content-Based Filtering From the Information Retrieval community [Maes1994] [Shardanand and Maes 1995] [Balabanovi and Shoham 1997] Based on a comparison between the feature vectors of items (e.g., artist, style) in the database and the user’s interest list Major weakness [Balabanovi and Shoham 1997] Content limitation: only can be applied to few kinds of content, can only capture certain aspects of the content Over-specialization: users can only obtain information based on the content of their profiles

Related Work: Collaborative Filtering(CF) Employ a user’s item evaluations (not the actual content) to find other similar users: nearest-neighbor algorithm [Resnick et al. 1994] Three major weaknesses Scalability: time complexity O(U*I) (I : #items, U: #users) Clustering [Breese et al. 2000] Bayesian network [Kitts et al. 2000] Sparsity: profile matrix (i.e., # of user evaluated items) is sparse SVD [Sarwar et al. 2000] Synonymy: latent association between items is not considered Content analysis [Balabanovi and Shoham 1997] Categorization [Kohrs and Merialdo 2000]

Fuzzy Aggregation Fuzzy Aggregation Clusters Offline Process PPED Similarity Measure and Clustering PPED Similarity Measure and Clustering User Navigation Behaviors User 1 User 2 User 3 User 4 User 5 User U-6 User U-5 User U-4 User U-3 User U-2 User U-1 User U User 6 Voting Favorite PVs (Rock= High Classical= Low Pop= Low Rap= High) Item Database Cluster Wish-list 0.87 0.83 0.72 0.47 0.61

Voting Mechanism Favorite PVs (Rock= High Classical= Low Pop= Low Rap= High Blues= Low) Rock Classical Pop Rap Blues High Low Mid High Low Property Values Voting Rock Classical Blues H M L H M L H M L 51 22 10 7 15 61 21 25 37 C p,f (k) M pf =Max{ C p,f (k) } f in F

Ranking Items Item Database Cluster Wish-List 0.87 0.83 0.82 0.79 0.72 0.70 0.68 0.65 0.63 0.61 0.54 0.47 0.42 Fuzzy Aggregation Fuzzy Aggregation fmax{ … } Favorite PVs (Rock= High Classical= Low Pop= Low Rap= High Blues= Low) F p (k) (High*High), (Mid*Low), (Low*Low) V k (i) Rock Classical Pop Rap Blues High Low Mid Mid Low Property Values Locality Sensitive Hashing algorithm

Favorite PVs (Rock= High Classical= Low Pop= Low Rap= High Blues= Low) Rock Classical Pop Rap Blues High Low Mid Mid Low Property Values Fuzzy Aggregation Fuzzy Aggregation fmax{ } Optimized Equation Why optimized: time complexity O(#P*I) (#P: # of properties, I: # of items) Intuition: the v k (i) value comes from the maximum value among M high (k) f (High*High), (Low*Mid)

Optimized Equation Time complexity: O(f*I) I=#items f=#fuzzy terms Satisfy a triangular norm form Time complexity can be further reduced to O(N) (N: constant number) by Fagin’s A 0 algorithm [Fagin 1996]

PPED Similarity Measure PPED Similarity Measure Fuzzy Aggregation Clusters Online Process Current User’s Navigation Behavior A List of Similarity Values 0.65 0.79 0.32 User Wish-List 0.87 0.83 0.82 0.79 0.72 0.70 0.68 0.65 0.63 0.61 0.54 0.47 0.42 Cluster Wish-lists 0.87 0.83 0.72 0.47 0.61 0.87 0.83 0.72 0.47 0.61 0.87 0.83 0.72 0.47 0.61

Optimized Method Original Time complexity: O(K*I) K=#clusters I=#items Time complexity of optimized method: O(f*I) f=#fuzzy terms Time complexity can be further reduced to O(N) (N: constant number) by Fagin’s A 0 algorithm [Fagin 1996]

Experimental Methodology Clustering Generate Item Database User Set Clusters Similarity Matrix cluster user Cluster Favorite PVs Ranking of Items in Clusters Generate User Navigation Behaviors Assign Property Values to Items: Item-PV = f(Cluster-PV, noise) noise ~ item-rank

Experimental Methodology Clusters Similarity Matrix cluster user Cluster Favorite PVs Ranking of Items in Clusters User Navigation Behaviors Item Database User Set H L M N F F L L M N F F L M N F F L M N F F L M N F F L L M N F F L Assign evaluation values to items Item-Rating = f(Cluster-Ranking, weight) weight ~ user-cluster similarities

Experimental Methodology Item Database User Set User Navigation Behaviors H L M N F F L L M N F F L M N F F L M N F F L M N F F L L M N F F L Training Testing Current SessionRecommendation

Accuracy Comparison 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 10005000 Number of Items Harmonic Mean 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 Improvement Nearest Neighbor MethodYodaImprovement

Processing Time Comparison 0 500 1000 1500 2000 2500 050010001500200025003000350040004500 Number of Users CPU Time (milliseconds/user) YodaBNN: Basic Nearest Neighbor Method Processing Time= CPU +IO In BNN process: #Items = 5000; #Users = 1000 In Yoda process: #Items in each cluster wish-list = 250 #Clusters = 18

Conclusion Yoda scales as the # of users/items grow Higher accuracy Future Work Compare other techniques Run more experiments with real data Incorporate the content-based filtering mechanism into the user clustering & classification phases Incorporate the user profiles

Reference [Shardanand and Maes 1995] U. Shardanand and P. Maes, Social Information Filtering: Algorithm for automating ''Word of Mouth'', proceedings on Human factors in computing systems, Denver,CO,USA, p. 210-217, May, 1995 [Maes 1994] Pattie Maes, Agents that reduce work and information overload, Communications of the ACM, 37(7), p.30-40, 1994 [Balabanovi and Shoham 1997]Marko Balabanovi and Yoav Shoham, Fab: content- based, collaborative recommendation, Communications of the ACM, 40(3), p. 66-72, 1997 [Resnick et al. 1994] P. Resnick and N. Iacovou and M. Suchak and P. Bergstrom and J. Riedl, GroupLens: An Open Architecture for Collaborative Filtering of Netnews, Proceedings of ACM conference on Cumputer-Supported Cooperative Work, Chapel Hill, NC, p.175-186, 1994 [Sarwar et al. 2000] B. Sarwar and G. Karypis and J. Konstan and J.Riedl, Application of Dimensionality Reduction in Recommender System -- A Case Study, ACM WebKDD 2000 Web Mining for E-Commerce Workshop, 2000 [Kohrs and Merialdo 2000] A. Kohrs and B. Merialdo, Using category-based collaborative filtering in the Active WebMuseum, Proceedings of IEEE International Conference on Multimedia and Expo, 1, p.351-354, 2000

Reference [Kitts et al. 2000] Brendan Kitts and David Freed and Martin Vrieze, Cross-sell: a fast promotion-tunable customer-item recommendation method based on conditionally independent probabilities, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, MA USA, p. 437-446, August, 2000 [Breese et al. 2000] J. Breese and D. Heckerman and C. Kadie, Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI USA, p.43-52, July, 1998 Shahabi C., A.M. Zarkesh, J. Adibi, and V. Shah: Knowledge, Discovery from Users Web Page Navigation, Proceedings of the IEEE, RIDE97 Workshop, April, 1997. Shahabi C., F. Banaei-Kashani, J. Faruque, and A. Faisal: Feature Matrices: A Model for Efficient and Anonymous Web Usage Mining, EC-Web 2001, Germany, September 2001 Fagin R.: Combining Fuzzy Information from Multiple Systems, Proceedings of Fifteenth ACM Symposyum on Principles of Database Systems, Montreal, pp. 216-226, 1996. Shahabi C., and Y. Chen: A Unified Framework to Incorporate Soft Query into Image Retrieval Systems, International Conference on Enterprise Information Systems, Setubal, Portugal, July 2001

Yoda: An Accurate and Scalable Web-based Recommendation Systems Cyrus Shahabi, Farnoush Banaei-Kashani, Yi-Shin Chen, and Dennis McLeod Integrated Media.

Similar presentations

Presentation on theme: "Yoda: An Accurate and Scalable Web-based Recommendation Systems Cyrus Shahabi, Farnoush Banaei-Kashani, Yi-Shin Chen, and Dennis McLeod Integrated Media."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Yoda: An Accurate and Scalable Web-based Recommendation Systems Cyrus Shahabi, Farnoush Banaei-Kashani, Yi-Shin Chen, and Dennis McLeod Integrated Media.

Similar presentations

Presentation on theme: "Yoda: An Accurate and Scalable Web-based Recommendation Systems Cyrus Shahabi, Farnoush Banaei-Kashani, Yi-Shin Chen, and Dennis McLeod Integrated Media."— Presentation transcript:

Similar presentations

About project

Feedback