563.10: Bloom Cookies Web Search Personalization without User Tracking

Slides:

Advertisements

Similar presentations

Cloud Computing Security Monir Azraoui, Kaoutar Elkhiyaoui, Refik Molva, Melek Ӧ nen, Pasquale Puzio December 18, 2013 – Sophia-Antipolis, France.

Advertisements

Expressive Privacy Control with Pseudonyms Seungyeop Han, Vincent Liu, Qifan Pu, Simon Peter, Thomas Anderson, Arvind Krishnamurthy, David Wetherall University.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

SplitX: High-Performance Private Analytics Ruichuan Chen (Bell Labs / Alcatel-Lucent) Istemi Ekin Akkus (MPI-SWS) Paul Francis (MPI-SWS)

Non-tracking Web Analytics Istemi Ekin Akkus 1, Ruichuan Chen 1, Michaela Hardt 2, Paul Francis 1, Johannes Gehrke 3 1 Max Planck Institute for Software.

Personalization and Search Jaime Teevan Microsoft Research.

How Much Anonymity does Network Latency Leak? Paper by: Nicholas Hopper, Eugene Vasserman, Eric Chan-Tin Presented by: Dan Czerniewski October 3, 2011.

Evaluating Search Engine

CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.

ASP.NET 2.0 Chapter 6 Securing the ASP.NET Application.

1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.

 Proxy Servers are software that act as intermediaries between client and servers on the Internet.  They help users on private networks get information.

Web Proxy Server Anagh Pathak Jesus Cervantes Henry Tjhen Luis Luna.

Lecture 21: Privacy and Online Advertising. References Challenges in Measuring Online Advertising Systems by Saikat Guha, Bin Cheng, and Paul Francis.

SSL (Secure Socket Layer) and Secure Web Pages Rob Sodders, University of Florida CIS4930 “Advanced Web Design” Spring 2004

Norman SecureSurf Protect your users when surfing the Internet.

Computer Science Open Research Questions Adversary models –Define/Formalize adversary models Need to incorporate characteristics of new technologies and.

I Do Not Know What You Visited Last Summer: Protecting users from stateful third-party web tracking with TrackingFree browser Xiang Pan §, Yinzhi Cao †,

Privacy-Aware Personalization for Mobile Advertising

Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.

STATE MANAGEMENT.  Web Applications are based on stateless HTTP protocol which does not retain any information about user requests  The concept of state.

Non-tracking Web Analytics Istemi Ekin Akkus, Ruichuan Chen, Michaela Hardt, Paul Francis, Johannes Gehrke Presentation by David Ferreras.

How Others Compromise Your Location Privacy: The Case of Shared Public IPs at Hotspots N. Vratonjic, K. Huguenin, V. Bindschaedler, and J.-P. Hubaux PETS.

The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.

The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

Xinyu Xing, Wei Meng, Dan Doozan, Georgia Institute of Technology Alex C. Snoeren, UC San Diego Nick Feamster, and Wenke Lee, Georgia Institute of Technology.

Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.

Mapping Internet Sensor With Probe Response Attacks Authors: John Bethencourt, Jason Franklin, and Mary Vernon. University of Wisconsin, Madison. Usenix.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Keyword search on encrypted data. Keyword search problem  Linux utility: grep  Information retrieval Basic operation Advanced operations – relevance.

Internet Privacy Define PRIVACY? How important is internet privacy to you? What privacy settings do you utilize for your social media sites?

 Attacks and threats  Security challenge & Solution  Communication Infrastructure  The CA hierarchy  Vehicular Public Key  Certificates.

Internet Basics 10/23/2012. What is the Internet? It’s a world-wide network of computer networks. It grows hourly and involves national governments, communities,

Advanced Higher Computing Science

CRLite: A Scalable System for Pushing All TLS Revocations to All Browsers By Kartik Patel.

Social Network.

Searchable Encryption in Cloud

Author: Heeyeol Yu; Mahapatra, R.; Publisher: IEEE INFOCOM 2008

Hotspot Shield Protect Your Online Identity

Updating SF-Tree Speaker: Ho Wai Shing.

Improving searches through community clustering of information

Authors – Johannes Krupp, Michael Backes, and Christian Rossow(2016)

Anonymous Communication

Unit 12 Using the Internet & Browsing the Web

The Variable-Increment Counting Bloom Filter

Internet and security.

Privacy-preserving Release of Statistics: Differential Privacy

Create your Benner - intro

Search Engine comparison

Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity and Identity Management – A Consolidated Proposal for Terminology Authors: Andreas.

What is Cookie? Cookie is small information stored in text file on user’s hard drive by web server. This information is later used by web browser to retrieve.

Bloom filters Probability and Computing Michael Mitzenmacher Eli Upfal

Differential Privacy in Practice

Unit 27 Web Server Scripting Extended Diploma in ICT

Phillipa Gill University of Toronto

Anupam Das , Nikita Borisov

Author: Kazunari Sugiyama, etc. (WWW2004)

Anonymous Communication

Listing Builder.

HTML5 and Local Storage.

Ch 17 - Binding Protocol Addresses

Bloom filters From Probability and Computing

Information Security CS 526 Topic 9

Hash Functions for Network Applications (II)

WJEC GCSE Computer Science

Lecture 1: Bloom Filters

Personal Privacy and the Public Internet

Anonymous Communication

Presentation transcript:

563.10: Bloom Cookies Web Search Personalization without User Tracking Presented by Ben Ujcich CS563/ECE524 Advanced Computer Security University of Illinois

Background A trade-off between privacy and personalization from what we give search engines when we perform searches If I search for UIUC-related websites often, would I want Google to show UIUC pages when I simply type “university”? What do I lose when I make myself more private in my searches (e.g., browsing through Tor)?

A Compromise Profile obfuscation masks the exact profile of a user’s previous searches and URLs visited Provides some degree of privacy while allowing personalization (How can this be quantified?) Implemented client-side or through a personalization proxy Downsides? Costly in bandwidth Need for a trusted third party

Profile Obfuscation Techniques Generalization: make specifics coarser Noise addition: add fake information User visited URLs ece.illinois.edu nytimes.com facebook.com youtube.com User visited URL categories Higher education News Social media Videos and media User visited URLs ece.illinois.edu nytimes.com facebook.com youtube.com User visited URLs ece.illinois.edu nytimes.com facebook.com youtube.com wsj.com umich.edu instagram.com reddit.com

Research Challenges “What obfuscation technique is more suitable for privacy-preserving personalization of web search?” “How big a dictionary and how much noise are required to achieve reasonable unlinkability?” “Is it possible to receive the advantages of noisy profiles without incurring the aforementioned costs (i.e., noise dictionary and large communication overhead)?”

Citation Bloom Cookies: Web Search Personalization without User Tracking Nitesh Mor (UC Berkeley), Oriana Riva (Microsoft), Suman Nath (Microsoft), John Kubiatowicz (UC Berkeley) NDSS ‘15

Overview Providing personalization while preserving privacy in web searches can be done through profile obfuscation, but it is often costly or impractical. The authors quantify and evaluate whether generalization or noise addition is better for the privacy-personalization trade-off. The authors propose the Bloom cookie, based on the properties of a Bloom filter’s false positives, as a cost-efficient mechanism for adding noise and preserving configurable amounts of privacy.

Threat Model Server not trusted by client (user) Techniques for hiding IP addresses are not assumed (“unlinkability” across IP addresses) IP addresses change frequently Browsers prevent online services from tracking (though browsers themselves keep track of previous activity) Large population size No collusion with other services

Evaluation Techniques Personalization (measured by average rank) URL-based: URLs users visit most often Interest-based: preferred interest based on prior searches Privacy (measured by unlinkability) RAND: add random noise from dictionary containing URLs and their associated interests HYBRID: add random noise only from dictionary entries that correspond to interests that user has already has looked at in the past

Results: Generalization Higher unlinkability (44.1% linkable users) than using exact URLs (98.7% linkable users) Is this reasonable?

Results: Noise Addition Better unlinkability (20% linkable) than generalization (44%), but large cost to send noise HYBRID makes personalization worse than with equivalent in RAND

Results: Noise Addition Better unlinkability (20% linkable) than generalization (44%), but large cost to send noise HYBRID makes personalization worse than with equivalent parameters in RAND

An uninitialized Bloom filter with m = 12 Review: Bloom Filters Space and time efficient probabilistic membership data structure May have false positives; no false negatives Stored as a bit array m = size of array k = # of hashes to use for inserting/querying elements n = # of inserted elements An uninitialized Bloom filter with m = 12

set corresponding bit locations to 1 Review: Bloom Filters Adding an element (m = 12, k = 3, n = 1) hash1(“Hello”) = 1 hash2(“Hello”) = 5 hash3(“Hello”) = 10 hashes to Inserting element: “Hello” set corresponding bit locations to 1 1 1 2 3 4 5 6 7 8 9 10 11

set corresponding bit locations to 1 Review: Bloom Filters Adding an element (m = 12, k = 3, n = 2) hash1(“Hello”) = 3 hash2(“Hello”) = 5 hash3(“Hello”) = 9 hashes to Inserting element: “World” set corresponding bit locations to 1 1 1 2 3 4 5 6 7 8 9 10 11

Review: Bloom Filters ✓ ✓ ✓ Querying for an element (m = 12, k = 3, n = 2) hash1(“Hello”) = 1 hash2(“Hello”) = 5 hash3(“Hello”) = 10 Membership query: Is “Hello” in the list? hashes to check that all corresponding bit locations are 1 1 1 2 3 4 5 6 7 8 9 10 11 ✓ ✓ ✓ Answer: Possibly (with some probability)

Review: Bloom Filters ✓ ✓ ✗ Querying for an element (m = 12, k = 3, n = 2) hash1(“Hello”) = 1 hash2(“Hello”) = 5 hash3(“Hello”) = 7 Membership query: Is “Goodbye” in the list? hashes to check that all corresponding bit locations are 1 1 1 2 3 4 5 6 7 8 9 10 11 ✓ ✓ ✗ Answer: No (guaranteed)

In effect, the false positive rate increases. Bloom Cookies Add exact profile of user’s previously visited URLs as elements into Bloom filter: Then, add noise by setting random fake bits to 1 to achieve at least l proportion of 1 bits: [“nytimes.com”,”wsj.com”, “google.com”] 1 1 In effect, the false positive rate increases.

Bloom Cookie Properties Efficiency More compact since filter size is fixed Noise by design False positives are advantages Non-deterministic noise Noise changes as filter changes Dictionary-free No noise dictionary required Expensive dictionary attacks Adversary would need to query for membership from the Bloom filter rather than already having the membership list

Bloom Cookie System Design

Results: Bloom Cookies Cost to send is constant (2000 bits) Linkability decreases with higher l value No dependency on a noise dictionary

Pros and Cons Pros: Cons: Use of real search logs Bloom cookie design described well Using a “negative” of Bloom filters as a positive No need for a third party Limitations section Clear and well written Useful diagrams Cons: Assumption that user has browser not sending tracking info to services No collusion assumption Don’t justify 1,000 users to smooth outliers Single data set Design is described late into the paper Study period too small