Download presentation
1
Sparsity, Scalability and Distribution in Recommender Systems
Doctoral Thesis Proposal Badrul M. Sarwar Computer Science & Engineering Dept. University of Minnesota Advisor: Professor John Riedl
2
Talk Outline Introduction to Recommender Systems Research Challenges
Previous Work Future Work and Completion Plan Contributions and Conclusions
3
Information Overload News items, TV programs, Books, Journals,
Research papers TV programs, Music CDs, Movie titles Consumer products, e- commerce items, Web pages, Usenet articles, s
4
Computerized Solution techniques
Information Retrieval Immediate information needs Information Filtering Content based filtering Information filtering agents Collaborative Filtering (CF) Recommender systems (RS) - interface We’ll use the term CF and RS interchangeably
5
Collaborative Filtering
Why another filtering technique? Problems with content-based filtering Limitations due to computer processing Lack of aesthetic sense Different techniques for different media CF adds the missing piece into the picture Human judgements
6
Collaborative Filtering Process
3
7
CF used successfully in e-commerce
8
Talk Outline Introduction to Recommender Systems Research Challenges
Previous Work Future Work and Completion Plan Contributions and conclusions
9
Research Challenges RC1: How can we improve RS quality and performance by using dimensionality reduction techniques? RC2: How can we design better interface for RS? RC3: How can we design distributed RS to make them widely available? RC4: How can utilize clustering algorithms to improve scalability in RS?
10
RC1: Motivation and Importance
RS Performance challenge Meet two important goals Quality Best CF is 77% accurate Scalability Response time Storage space
11
RC1: Motivation and Importance (contd.)
Stumbling blocks High-dimensional data Computational complexity Noise and data over-fitting Sparsity Reduced number of predictions Inferior quality
12
RC1: Specific Aims Select a dimensionality reduction technique
Apply the technique Evaluate quality Study performance implications
13
Research Challenges RC1: How can we improve RS quality and performance by using dimensionality reduction techniques? RC2: How can we design better interface for RS? RC3: How can we design distributed RS to make them widely available? RC4: How can utilize clustering algorithms to improve scalability in RS?
14
RC 2: Motivation and Importance
Need for explanation interface End-user point of view Explanation of recommendations Algorithmic explanation Visual explanation Visualization amplifies cognition Benefits Increases usability and confidence
15
RC 2: Specific aims Identify techniques Implementation Evaluation
Use of dimension reduction results Implementation Evaluation Usability study Comparison with text-based system
16
Research Challenge 3 How can we improve RS quality and performance by using dimensionality reduction techniques? How can we design better interface for RS? How can we design distributed RSs to make them widely available? How can utilize clustering algorithms to improve scalability in RS?
17
RC3: Motivation and Importance
Increasing needs for RS services Availability challenge Travelling users Centralized RS problems Problems of scale and robustness Privacy concerns
18
RC3: Specific aims Taxonomy of RS application space Design framework
Key design issues Implementation models Evaluation criteria Analysis of different models
19
Research Challenge 4 How can we improve RS quality and performance by using dimensionality reduction techniques? How can we design better interface for RS? How can we design distributed RS to make them widely available? How can we utilize clustering algorithms to improve scalability in RSs?
20
RC4: Motivation and Importance
Scalability Sparsity Benefits of Clustering Usenet (newsgroup) Recent studies Performance implications
21
RC4: Specific aims Identify clustering algorithms
Soft cluster Hard cluster Partition the data set Apply Galaxy algorithm Evaluate results
22
Talk Outline Introduction to Recommender Systems Research Challenges
Previous Work Future Work and Completion Plan Contributions and conclusions
23
Research Approach Identify Problem Develop Hypotheses Discover
Algorithm and solution techniques Validate solution techniques Create Dataset Separate training and test data Create Experiment framework Apply solution techniques on experimental data
24
Dimension Reduction Experiments
Singular Value Decomposition Matrix factorization Dimension reduction Prediction generation by re-constructing matrix Result highlights Quality of prediction improved We expect to see improved performance
25
Applying dimension reduction in RS
We applied LSI/SVD based technique SVD decomposes a matrix into three factors = R m X n U m X r S r X r V’ r X n Sk k X k Uk m X k Vk’ k X n Rk The reconstructed matrix Rk = Uk.Sk.Vk’ is the closest rank-k matrix to the original matrix R.
26
SVD as prediction generator
Uk ÖS’k ÖSkV’k ith row jth col ÖSk k X k Uk m X k Vk’ k X n
27
Results: SVD as prediction generator
ROC and MAE plots for Data set 1 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 2 5 10 15 18 19 20 50 100 DBLens Dimension, k ROC sensitivity/MAE ROC MAE Data set 1 ROC and MAE plots for Data set 2 0.73 0.74 0.75 0.76 0.77 0.78 0.79 2 5 10 15 18 19 20 50 100 DBLens Dimension, k ROC sensitivity/MAE ROC MAE Data set 2
28
Visual Interface: Initial Prototype
Used SVD results Plotted user and items in 2-D feature space Prototype tested in Spotfire Problems: Distance is non-Euclidean
29
Design of Visual Interface
Use of LSI/SVD for user-item visualization
30
Distributed RS: Work done
Taxonomy of the application space Based on <Neighborhood and prediction> Identification of key design issues Three implementation models proposed Local profile model Central profile model Geographically distributed profile model
31
Talk Outline Introduction to Recommender Systems Research Challenges
Previous Work Future Work and Completion Plan Contributions and conclusions
32
Future Work: Dimension Reduction
Study performance implications SVD based prediction Offline (model building) Online Offline part is time-consuming Incremental SVD Fold-in Online is very promising
33
Future Work: Distributed RS
Evaluation Possible approaches Identify suitable evaluation criteria Select applications from taxonomy Analyze using each model (hypothetical) Analyze each implementation in terms of the evaluation criteria
34
Future Work: Visual Interface
Implement Visual interface Perform usability studies Setup live user experiment Identify usability questionnaires Conduct the usability survey Analyze results Revise/redesign interface
35
Future Work: Clustering in RS
Identify effective clustering algorithms For soft and hard cluster (K-means and E-M) Partition the dataset Apply galaxy algorithm Test for quality Accuracy and coverage Test for performance Response time
36
Future Work: Completion Plan
37
Contributions Use of dimension reduction technique (SVD) to be a high-quality prediction generator Submitted to ICDE 2000 Framework design for distributed RS. Submitted to CIKM’99 Visual interfaces Clustering to improve scalability
38
That's all folks!
39
Distributed RS: Local Profile Model
Local RS Remote RS User carries his profile to User Profile data
40
Distributed RS: Central Profile Model
CPS RS Remote RS User Profile storage Remote RS
41
Geographically Distributed RS
GDPS 1 RS User Profile database User Remote RS GDPS 3 User GDPS 2 User Remote RS
42
Problems of high dimensional data
A: A is highly correlated with B B is highly correlated with C We can’t say that C is also highly correlated with A. B: C: A:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.