Presentation is loading. Please wait.

Presentation is loading. Please wait.

Privacy risks of collaborative filtering Yuval Madar, June 2012 Based on a paper by J.A. Calandrino, A. Kilzer, A. Narayanan, E. W. Felten & V. Shmatikov.

Similar presentations


Presentation on theme: "Privacy risks of collaborative filtering Yuval Madar, June 2012 Based on a paper by J.A. Calandrino, A. Kilzer, A. Narayanan, E. W. Felten & V. Shmatikov."— Presentation transcript:

1 Privacy risks of collaborative filtering Yuval Madar, June 2012 Based on a paper by J.A. Calandrino, A. Kilzer, A. Narayanan, E. W. Felten & V. Shmatikov

2  Help suggesting users items and other users to their liking by deducing them from previous purchases.  Numerous examples ◦ Amazon, iTunes, CNN, Last.fm, Pandora, Netflix, Youtube, Hunch, Hulu, LibraryThing, IMDb and many others.

3  User to item – “You might also like…”  Item to item – Similar items list  User to user – Another customer with common interests

4  Content Based filtering ◦ Based on A-priori similarity between items, and recommendations are derived from a user’s own history. ◦ Doesn’t pose a privacy threat.  Collaborative filtering ◦ Based on correlations between other uses purchases. ◦ Our attacks will target this type of systems.  Hybrid ◦ A system employing both filtering techniques

5  The data the recommendation system uses is modeled as a matrix, where the rows correspond to users, and columns correspond to items.  Some auxiliary information on the target user is available (A subset of a target user’s transaction history)  An attack is successful, if it allows the attacker to learn transactions not part of its auxiliary information.

6  User public rating and comments on products  Shared transactions (Via facebook, or other mediums)  Discussions in 3 rd party sites  Favorite books in facebook profile  Non-online interactions (With friends, neighbors, coworkers, etc.)  Other sources…

7  Input: ◦ a set of target items T and a set of auxiliary items A  Observe the related items list of A, until an item in T appears, or moves up.  If a target item appears in enough related items lists in the same time, the attacker may infer it was bought by the target user.  Note 1 – Scoring may be far more complex, since different items in A are correlated. (Books which belong to a single series, bundle discounts, etc.)  Note 2 – It is preferable that A consist of obscure and uncommon items, to improve the effect of the target user’s choices on its related items lists.

8  In some sites, the covariance matrix, describing the correlation between items in the site, is exposed to the users. (Hunch is one such website)  Similarly, the attacker is required to watch for improvement in the correlation between the auxiliary items and the target items.  Note 1 – Asynchronous updates to different matrix cells.  Note2 – inference probability improves if the auxiliary items are user-unique. (No other user bought all auxiliary items) More likely if some of them are unpopular, or if there are enough of them.

9  System model ◦ For each user, the system finds the k users most similar to it, and ranks items purchased by them by total number of sales.  Active Attack  Create k dummy users, each buying all known auxiliary items.  With high probability, the k dummy users and the target user will be clustered together. (Given auxiliary items list of size logarithmic in the total number of users. In practice, 8 items were found to be enough for most sites)  In that case, the recommendations to the dummy users will consist of transactions of the target user previously unknown to the attacker.  Note – The attack is more feasible in a system where user interactions with items does not involve spending money.

10  The main parameters for evaluation of an inference attack are: ◦ Yield – How many inferences are produced. ◦ Accuracy – How likely is each inference.  Yield-accuracy tradeoff - stricter accuracy algorithms reject less probable inferences.

11  The paper further discusses specific attacks performed against: ◦ Hunch ◦ LibraryThing ◦ Last.fm ◦ Amazon  And measures the accuracy and yield of these attacks, arriving in some instances to impressive tradeoff figures. (Such as 70% accuracy for 100% yield in Hunch)

12  Not discussed in the paper.  Achieved in other papers for static recommendation databases.  Remains an open problem for dynamic systems. (Which all real world examples are)

13  Limited-length related items list – The first elements of such lists have low sensitivity to single purchases.  Factoring item popularity into update frequency – less popular items are more sensitive to single purchases. Batching their purchases together will decrease the information leak.

14  Limit data access rate – Preventing large- scale privacy attacks, though lowering utility and may be circumvented using a botnet.  User opt-out – A privacy conscious user may decide to opt-out of recommender systems entirely. (At clear cost of utility)

15  A passive attack on recommender systems using auxiliary information on a certain user’s purchases, allowing the attacker to infer undisclosed private transactions.  Increased user awareness is required  Suggested several methods to decrease the information leaked by these systems

16 Questions?


Download ppt "Privacy risks of collaborative filtering Yuval Madar, June 2012 Based on a paper by J.A. Calandrino, A. Kilzer, A. Narayanan, E. W. Felten & V. Shmatikov."

Similar presentations


Ads by Google