Presentation is loading. Please wait.

Presentation is loading. Please wait.

Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat Department of Computer Engineering.

Similar presentations


Presentation on theme: "Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat Department of Computer Engineering."— Presentation transcript:

1 Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat {iyakut,polath}@anadolu.edu.tr Department of Computer Engineering Anadolu University, Turkey

2 Collaborative Filtering(CF) 14.06.2015IWSEC'072 Problem Information Overload Solution Collaborative Filtering

3 Recent technique for filtering and recommendation Applications ◦E◦E -commerce ◦S◦S earch engines ◦D◦D irect recommendations 14.06.2015IWSEC'073

4 14.06.2015IWSEC'07 4 Collaborative Filtering Process i1i1 i2i2 iqiq imim u 1 u2u2 uaua unun Active user Prediction P aq = Prediction on item q for active user Item for which prediction is sought

5 Proposed by Goldberg et al in 2001 The main feature: Online computation in constant time. Secondly, flexibly usage of several clustering algorithms. Based on Principal Component Analysis Application in Jester: online joke recommendation. http://eigentaste.berkeley.edu/ 14.06.2015IWSEC'075 EigenTaste

6 Eigentaste Algorithm Step.1 Find correlation matrix of A Step.2 Find eigenvectors(E) and eigenvalues(  ) of C 14.06.2015IWSEC'076 D:nxm A: nxk User-item matrix n users m items k gauge items Correlation Matrix of A

7 Eigentaste Algorithm cont’d Step.3 Take first m=2 eigenvectors and project A. x = AE m T = AE 2 T Step.4 Cluster the projected data using RRC. 14.06.2015IWSEC'077 Recursive Rectangular Clustering(RRC) Step.5 Construct a lookup table with mean of nongauge item ratings for each clusters.

8 Eigentaste- online When active user(a) enters, ◦R◦Rate the items in gauge set. ◦U◦Using PCs of his data, a is projected ◦F◦Find representative cluster ◦R◦Recommend objects based on preconstructed lookup table. 14.06.2015IWSEC'078 Disapprove Approve

9 Motivation Mentioned algorithm is succesful But due to privacy risks, collection of truthful and trustworthy data is challenge!!! Therefore, how can users give data for CF purposes without jeopardizing their privacy? Is it possible to use perturbed data in Eigentaste-based algorithms? 14.06.2015IWSEC'079

10 Modifications on Original Normalization: ◦I◦I nstead of item mean and std, user mean and std. Clustering: ◦I◦I nstead of RRC, k-means clustering is used. Prediction ◦I◦I nstead of look up table directly, denormalize then predict. 14.06.2015IWSEC'0710

11 Masking data 14.06.2015IWSEC'0711 CF Process Central Database User 1 User 2 User n-1 User n +R 1 +R 2 +R n-1 +R n Randomized Pertubation Technique (RPT) Aggrawal&Srikant, 2000

12 Masking Process 1. Users and servers agree on γ, θ, δ 2. Each user u compute z-scores of their ratings 3. u selects σ u over [0, γ] uniformly randomly, use it as std of masking data 4. u selects r u over [0,1], if ru<= θ, use uniform otherwise gaussian 5. u selects x er over [0, δ]. %x er of unfilled cells to be filled with noise 14.06.2015IWSEC'0712 γ θ δ

13 Masking Process u creates m u number of random numbers where ◦ m u= number of rated cell+x er ◦ std= σ u, μ =0, gaussian or uniform( √3. σ u ) wrt r u Mask his private data by adding this noise data. Here empty cells are selected randomly. 14.06.2015IWSEC'0713

14 Eigentaste-based CF with Privacy Now server holds disguised user-item matrix, D’ and user-gauge matrix A’ In some steps, the effects of perturbation must be considered and handled! ◦ Correlation matrix construction ◦ Projection ◦ Active user’s entry of gauge set 14.06.2015IWSEC'0714

15 Correlation Matrix Constrction 14.06.2015IWSEC'0715 If f≠g means for nondiagonal entries of C’ Expected values 0 0 0 since μ=0 Then

16 Correlation Matrix Constrction 14.06.2015IWSEC'07 If f=g means for diagonal entries of C’ Expected value is 0 since μ=0 Then, assumming n≈n-1

17 Projection 14.06.2015IWSEC'0717 Similarly, expected values are 0, then approximated matrix is obtained

18 Remaining Parts After determining clusters depending on estimated data ◦ Z-score means of nongauge items are stored in look up table. ◦ When active user, enters disguised gauge ratings the effect of randomization is got rid of by the same way. ◦ The representative cluster is defined, corresponding value from the table denormalized and the prediction is obtained! 14.06.2015IWSEC'0718

19 Experiments Data Set ◦ Jester is a web-based joke data  17,988 users, 100 jokes  Ratings over a range (-10,+10),continuos  50% of all ratings are present Evaluation Metrics 14.06.2015IWSEC'0719 p:predicted value r:original value d:size of test set r max :max rating r min : min rating

20 Eigentaste vs. Modified 9000 training users, 5000 test users (10 test items) 14.06.2015IWSEC'0720 MAENMAE Eigentaste3,7400,187 Modified Eigentaste3,3340,167

21 Protecting active users’ privacy M1M2M3 MAE3,35083,47103,4807 NMAE0,16760,17350,1741 14.06.2015IWSEC'0721 M1: No disguise, but requires additional cost M2: Just considering gauge mean and std M3: Considering whole mean and std

22 Accuracy vs. Varying Numbers of Users n5001000200040008000 MAE4,6784,2423,8323,6243,483 NMAE0,2340,2120,1920,1810,174 14.06.2015IWSEC'0722 Fix 5000 users and random 10 test items By increasing number of users, accuracy improves since random numbers will converge to zero n>=2000, results are satisfying!

23 Accuracy with Varying δ Values δ 03570100 MAE3,44603,45673,46153,4710 NMAE0,17230,17280,17300,1735 14.06.2015IWSEC'0723 Accuracy slightly becomes better with decreasing δ values!

24 Conclusion We showed that how to achieve privacy preserving CF tasks using Eigentaste- based algorithms? We will study ◦ whether we can employ other clustering algorithms ◦ How to improve recommendation qualitiesby using correlation based CF algorithms. 14.06.2015IWSEC'0724

25 Thanks for your interests! Questions?


Download ppt "Privacy-PreservingEigentaste-based Collaborative Filtering Ibrahim Yakut and Huseyin Polat Department of Computer Engineering."

Similar presentations


Ads by Google