Download presentation
Presentation is loading. Please wait.
1
Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering Anadolu University, Turkey
2
Collaborative Filtering(CF) 14.06.2015IWSEC'072 Problem Information Overload Solution Collaborative Filtering
3
Recent technique for filtering and recommendation Applications ◦E◦E -commerce ◦S◦S earch engines ◦D◦D irect recommendations 14.06.2015IWSEC'073
4
14.06.2015IWSEC'07 4 Collaborative Filtering Process i1i1 i2i2 iqiq imim u 1 u2u2 uaua unun Active user Prediction P aq = Prediction on item q for active user Item for which prediction is sought
5
Proposed by Goldberg et al in 2001 The main feature: Online computation in constant time. Secondly, flexibly usage of several clustering algorithms. Based on Principal Component Analysis Application in Jester: online joke recommendation. http://eigentaste.berkeley.edu/ 14.06.2015IWSEC'075 EigenTaste
6
Eigentaste Algorithm Step.1 Find correlation matrix of A Step.2 Find eigenvectors(E) and eigenvalues( ) of C 14.06.2015IWSEC'076 D:nxm A: nxk User-item matrix n users m items k gauge items Correlation Matrix of A
7
Eigentaste Algorithm cont’d Step.3 Take first m=2 eigenvectors and project A. x = AE m T = AE 2 T Step.4 Cluster the projected data using RRC. 14.06.2015IWSEC'077 Recursive Rectangular Clustering(RRC) Step.5 Construct a lookup table with mean of nongauge item ratings for each clusters.
8
Eigentaste- online When active user(a) enters, ◦R◦Rate the items in gauge set. ◦U◦Using PCs of his data, a is projected ◦F◦Find representative cluster ◦R◦Recommend objects based on preconstructed lookup table. 14.06.2015IWSEC'078 Disapprove Approve
9
Motivation Mentioned algorithm is succesful But due to privacy risks, collection of truthful and trustworthy data is challenge!!! Therefore, how can users give data for CF purposes without jeopardizing their privacy? Is it possible to use perturbed data in Eigentaste-based algorithms? 14.06.2015IWSEC'079
10
Modifications on Original Normalization: ◦I◦I nstead of item mean and std, user mean and std. Clustering: ◦I◦I nstead of RRC, k-means clustering is used. Prediction ◦I◦I nstead of look up table directly, denormalize then predict. 14.06.2015IWSEC'0710
11
Masking data 14.06.2015IWSEC'0711 CF Process Central Database User 1 User 2 User n-1 User n +R 1 +R 2 +R n-1 +R n Randomized Pertubation Technique (RPT) Aggrawal&Srikant, 2000
12
Masking Process 1. Users and servers agree on γ, θ, δ 2. Each user u compute z-scores of their ratings 3. u selects σ u over [0, γ] uniformly randomly, use it as std of masking data 4. u selects r u over [0,1], if ru<= θ, use uniform otherwise gaussian 5. u selects x er over [0, δ]. %x er of unfilled cells to be filled with noise 14.06.2015IWSEC'0712 γ θ δ
13
Masking Process u creates m u number of random numbers where ◦ m u= number of rated cell+x er ◦ std= σ u, μ =0, gaussian or uniform( √3. σ u ) wrt r u Mask his private data by adding this noise data. Here empty cells are selected randomly. 14.06.2015IWSEC'0713
14
Eigentaste-based CF with Privacy Now server holds disguised user-item matrix, D’ and user-gauge matrix A’ In some steps, the effects of perturbation must be considered and handled! ◦ Correlation matrix construction ◦ Projection ◦ Active user’s entry of gauge set 14.06.2015IWSEC'0714
15
Correlation Matrix Constrction 14.06.2015IWSEC'0715 If f≠g means for nondiagonal entries of C’ Expected values 0 0 0 since μ=0 Then
16
Correlation Matrix Constrction 14.06.2015IWSEC'07 If f=g means for diagonal entries of C’ Expected value is 0 since μ=0 Then, assumming n≈n-1
17
Projection 14.06.2015IWSEC'0717 Similarly, expected values are 0, then approximated matrix is obtained
18
Remaining Parts After determining clusters depending on estimated data ◦ Z-score means of nongauge items are stored in look up table. ◦ When active user, enters disguised gauge ratings the effect of randomization is got rid of by the same way. ◦ The representative cluster is defined, corresponding value from the table denormalized and the prediction is obtained! 14.06.2015IWSEC'0718
19
Experiments Data Set ◦ Jester is a web-based joke data 17,988 users, 100 jokes Ratings over a range (-10,+10),continuos 50% of all ratings are present Evaluation Metrics 14.06.2015IWSEC'0719 p:predicted value r:original value d:size of test set r max :max rating r min : min rating
20
Eigentaste vs. Modified 9000 training users, 5000 test users (10 test items) 14.06.2015IWSEC'0720 MAENMAE Eigentaste3,7400,187 Modified Eigentaste3,3340,167
21
Protecting active users’ privacy M1M2M3 MAE3,35083,47103,4807 NMAE0,16760,17350,1741 14.06.2015IWSEC'0721 M1: No disguise, but requires additional cost M2: Just considering gauge mean and std M3: Considering whole mean and std
22
Accuracy vs. Varying Numbers of Users n5001000200040008000 MAE4,6784,2423,8323,6243,483 NMAE0,2340,2120,1920,1810,174 14.06.2015IWSEC'0722 Fix 5000 users and random 10 test items By increasing number of users, accuracy improves since random numbers will converge to zero n>=2000, results are satisfying!
23
Accuracy with Varying δ Values δ 03570100 MAE3,44603,45673,46153,4710 NMAE0,17230,17280,17300,1735 14.06.2015IWSEC'0723 Accuracy slightly becomes better with decreasing δ values!
24
Conclusion We showed that how to achieve privacy preserving CF tasks using Eigentaste- based algorithms? We will study ◦ whether we can employ other clustering algorithms ◦ How to improve recommendation qualitiesby using correlation based CF algorithms. 14.06.2015IWSEC'0724
25
Thanks for your interests! Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.