Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.

Slides:



Advertisements
Similar presentations
A CGA based Source Address Authentication Method in IPv6 Access Network(CSA) Guang Yao, Jun Bi and Pingping Lin Tsinghua University APAN26 Queenstown,
Advertisements

Google News Personalization: Scalable Online Collaborative Filtering
Cookies, Sessions. Server Side Includes You can insert the content of one file into another file before the server executes it, with the require() function.
Expressive Privacy Control with Pseudonyms Seungyeop Han, Vincent Liu, Qifan Pu, Simon Peter, Thomas Anderson, Arvind Krishnamurthy, David Wetherall University.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Networking Problems in Cloud Computing Projects. 2 Kickass: Implementation PROJECT 1.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Attacking Session Management Juliette Lessing
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
SIA: Secure Information Aggregation in Sensor Networks Bartosz Przydatek, Dawn Song, Adrian Perrig Carnegie Mellon University Carl Hartung CSCI 7143: Secure.
Evaluating Search Engine
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
1 A Privacy-Preserving Defense Mechanism Against Request Forgery Attacks Ben S. Y. Fung and Patrick P. C. Lee The Chinese University of Hong Kong TrustCom’11.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
ASP.NET 2.0 Chapter 6 Securing the ASP.NET Application.
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
Preserving Privacy in Clickstreams Isabelle Stanton.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
HTTP: cookies and advertising Concepts to cover:  web page content (including ads) from multiple site: composition at client  cookies  third-party cookies:
Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
EUBA: The Emory User Behavior Analysis System Eugene Agichtein, Qi Guo and Ryan Kelly Intelligent Information Access Lab
Unit 1 Living in the Digital WorldChapter 4 – Smart Working This presentation will cover the following topic: Running a business online Name:
IT533 Lectures Session Management in ASP.NET. Session Tracking 2 Personalization Personalization makes it possible for e-businesses to communicate effectively.
Strategies for improving Web site performance Google Webmaster Tools + Google Analytics Marshall Breeding Director for Innovative Technologies and Research.
Fast Portscan Detection Using Sequential Hypothesis Testing Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan Publication: IEEE.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
I Do Not Know What You Visited Last Summer: Protecting users from stateful third-party web tracking with TrackingFree browser Xiang Pan §, Yinzhi Cao †,
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Privacy-Aware Personalization for Mobile Advertising
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Chapter 8 Cookies And Security JavaScript, Third Edition.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.
COOKIES. INTERNET COOKIES What are they Where are they found What should you do about them.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
STATE MANAGEMENT.  Web Applications are based on stateless HTTP protocol which does not retain any information about user requests  The concept of state.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
PEERSPECTIVE.MPI-SWS.ORG ALAN MISLOVE KRISHNA P. GUMMADI PETER DRUSCHEL BY RAGHURAM KRISHNAMACHARI Exploiting Social Networks for Internet Search.
Cookies and Sessions IDIA 618 Fall 2014 Bridget M. Blodgett.
Cookies Bill Chu. © Bei-Tseng Chu Aug 2000 Definition A cookie is a TEXT object of max 4KB sent from a web server to a browser It is intended for the.
ASP.Net, Web Forms and Web Controls 1 Outline Session Tracking Cookies Session Tracking with HttpSessionState.
DETECTING TARGETED ATTACKS USING SHADOW HONEYPOTS AUTHORS: K. G. Anagnostakisy, S. Sidiroglouz, P. Akritidis, K. Xinidis, E. Markatos, A. D. Keromytisz.
Securing Passwords Against Dictionary Attacks Presented By Chad Frommeyer.
Saving State on the WWW. The Issue  Connections on the WWW are stateless  Every time a link is followed is like the first time to the server — it has.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Xinyu Xing, Wei Meng, Dan Doozan, Georgia Institute of Technology Alex C. Snoeren, UC San Diego Nick Feamster, and Wenke Lee, Georgia Institute of Technology.
Post-Ranking query suggestion by diversifying search Chao Wang.
1 State and Session Management HTTP is a stateless protocol – it has no memory of prior connections and cannot distinguish one request from another. The.
Presented By Amarjit Datta
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
Session 11: Cookies, Sessions ans Security iNET Academy Open Source Web Development.
Strategies for improving Web site performance
The Variable-Increment Counting Bloom Filter
COOKIES.
Latest Updates on BlackHawk Mines Music : Privacy Policy
563.10: Bloom Cookies Web Search Personalization without User Tracking
Cookies BIS1523 – Lecture 23.
Web Privacy Chapter 6 – pp 125 – /12/9 Y K Choi.
Presentation transcript:

Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers

Outline Introduction Personalization and Privacy in Searching Bloom Cookie Evaluation Results Limitations Advantages Questions

Personalization’s Edge Online services have become increasingly personalized. Personalization comes from building user profiles on the server side by tracking multiple online activities. Privacy-concerned users and stricter privacy regulations have made it difficult to build a complete profile without infringing on privacy. Yet a search engine that provides personalized results while maintaining privacy has a competitive edge.

Outline Introduction Personalization and Privacy in Searching Bloom Cookie Evaluation Results Limitations Advantages Questions

Personalization and Privacy Personalization and privacy are not mutually exclusive Other works prove it’s possible to maintain user profiles at the client But they need to put the personalization algorithms on the clients Communication overhead between client and server is very large The authors suggest obfuscating profile to preserve user privacy and yet allows personalization

Outline Introduction Personalization and Privacy in Searching Bloom Cookie Evaluation Results Limitations Advantages Questions

A Fitted Suit For an Invisible Man Authors propose Bloom Cookies, which encodes a user’s profile in an efficient and privacy-preserving manner. They use generalization and noise addition to preserve user privacy and allows the user to personalize results. Generalization – shares items in a user’s profile only at a coarse granularity Noise addition – adds fake items to the profile to hide the real items.

What is a Bloom Cookie? A Bloom Cookie is generated and maintained by the client and sent to online services every time the user makes a service request. The user controls what profile information is included in the Bloom Cookie and when the cookie is sent. The authors inject noisy bits into the Bloom filter and exploit false positives naturally occurring in it as noise.

Noise vs. Generalization Authors found noise addition provides a better privacy tradeoff than generalization. However, noisy profiles rely on a large communication overhead and need a large noise dictionary provided by a third party. Bloom Cookies use a smaller noise profile and do not require a noise dictionary.

Contributions Bloom Cookies provide a convenient privacy, personalization, and network efficiency tradeoff. The authors also prove that generalization provides anonymity but does not naturally translate into unlinkability. If a user is able to identify whether two requests are coming from the same or different clients, it can collect info to identify the user over time. Unlike all previous work, they leverage Bloom filters for privacy.

Web Search Goals They assume each client is associated with a profile which is made up of URLS that they visit. Assume that a server doesn’t collude with others. Personalization: Tailor search queries or re-rank search results based on web sites the user visits. Privacy: Client obfuscates profile before sending it to the server. Unlinkability is key goal. Add noise so that the server cannot identify if two queries are coming from the same client or different clients.

Unlinkability Goals Aim for unlinkability across IP-sessions, a sequence of all queries with the same IP source. Determined that an average home network scenario has an IP-session that lasts about 2 weeks. Assume that the browser keeps track of the user’s online activity and maintains a profile. This is encoded as a Bloom Cookie and sent with each query.

Obfuscation Design A noise profile provides a similar level of unlinkability as a generalized profile, but with better personalization. Negative effect is offset by finer granularity of profile items resulting in net positive improvement. Size of noise profile can be in tens of kB – larger than actual requests or responses. Impractical. Bloom Cookies smaller communication overhead and lack of noise dictionary make it practical.

Outline Introduction Privacy and Personalization in Searching Bloom Cookie Evaluation Results Limitations Advantages Questions

Evaluation To build URL-based profiles, they extract a satisfied click, a click follow by a period of inactivity. They extract clicked URLs and assemble the user profile as a list of domain names ordered by recurrence in the log. They categorize the log by Google’s ODP categories. The profiles are then used for ranking search results.

Ranking queries Assign a score to each of the top 50 results returned for a query. If the domain of the search result is present in a User’s profile, then the result receives a rank of personalization. They calculate an average rank. R is the set of results clicked for query i and rank is rank of result as assigned by their personalization algorithm.

Noise Addition Techniques The authors explored two different techniques for noise addition: RAND and HYBRID. Both assume a dictionary containing URLS and top-3 ODP categories associated with each URL. RAND draws fake URLS randomly from D. HYBRID draws fake URLS only from ODP categories matching the user’s interests.

Outline Introduction Privacy and Personalization in Searching Bloom Cookie Evaluation Results Limitations Advantages Questions

Impact of Noise Both noise techniques provide higher unlinkability – compared to exact profiles where 98.7% users were correctly linked, noise lowers to 20% (RAND) or 13.6% (HYBRID). But, as they authors said, generating a dictionary for noise addition inflates communication overhead.

RAND and HYBRID

Bloom Cookies A Bloom filter is used to store elements from a set E and is implemented as a bit string with k hash functions. When querying if an element exits in the Bloom filter, false positives exist but false negatives do not. They generate the noisy profile and insert the URLS from it into a Bloom filter, which the clients sends with queries. The server queries the Bloom filter for all URLS contained in the search results from the submitted query and reranks accordingly.

Framework for Web Search

Controlling Obfuscation Obfuscation can be controlled by changing the number of hash functions and level of noise in the Bloom filter. When increasing k (hash), probability is reduced and reduces noise bits controlled by l. Increasing m (noise) increases unlinkability. The authors provide an algorithm for configuring Bloom Cookies, one that considers personalization and privacy.

Personalization/Privacy/Efficie ncy Tradeoffs

Outline Introduction Privacy and Personalization in Searching Bloom Cookie Evaluation Results Limitations Advantages Questions

Limitations More users than the dataset they used are likely to improve the privacy of Bloom Cookies – more users in the system means a user’s noisy profile is more likely to collide with another user Some users activity may be so distinct that even with noise added, they could be linkable – feedback from those users is needed. Authors’ assumptions of current techniques of web search could become wrong. Different services might share the data and personalization algorithms might become more sophisticated. Tracking across services is outside their bounds – they just chose web search for its mature personalization.

Outline Introduction Privacy and Personalization in Searching Bloom Cookie Evaluation Results Limitations Advantages Questions

Advantages of Bloom Cookies Efficient Noisy by design Noise is non-deterministic Dictionary free Dictionary attacks are much more expensive Provide better unlinkability than HYBRID for the same level of personalization!

Outline Introduction Privacy and Personalization in Searching Bloom Cookie Evaluation Results Limitations Advantages Questions

What are two advantages of Bloom Cookies over generalization? Why can’t the authors solely use the RAND or HYBRID noise addition techniques they outlined in the paper to add noise to their profiles? What two parameters do the authors use to control the efficiency/personalization/privacy tradeoffs of Bloom Cookies?