Non-tracking Web Analytics Istemi Ekin Akkus 1, Ruichuan Chen 1, Michaela Hardt 2, Paul Francis 1, Johannes Gehrke 3 1 Max Planck Institute for Software.

Slides:



Advertisements
Similar presentations
Secure Naming structure and p2p application interaction IETF - PPSP WG July 2010 Christian Dannewitz, Teemu Rautio and Ove Strandberg.
Advertisements

The Internet and the Web
Cipher Techniques to Protect Anonymized Mobility Traces from Privacy Attacks Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip and Nageswara S. V. Rao.
PIR-Tor: Scalable Anonymous Communication Using Private Information Retrieval Prateek Mittal University of Illinois Urbana-Champaign Joint work with: Femi.
SplitX: High-Performance Private Analytics Ruichuan Chen (Bell Labs / Alcatel-Lucent) Istemi Ekin Akkus (MPI-SWS) Paul Francis (MPI-SWS)
Georgios Kontaxis, Michalis Polychronakis Angelos D. Keromytis, Evangelos P. Markatos Siddhant Ujjain (2009cs10219) Deepak Sharma (2009cs10185)
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
Chapter 12: Web Usage Mining - An introduction
Project Summary Everybody’s Google is a web browser extension which mines personalized Google search results and redistributes them to extension users.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
1 Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2007 (MW 1:30-2:50 in Friend 004) Ioannis Avramopoulos Instructor:
Preserving Privacy in Clickstreams Isabelle Stanton.
Lecture 21: Privacy and Online Advertising. References Challenges in Measuring Online Advertising Systems by Saikat Guha, Bin Cheng, and Paul Francis.
COMPUTER TERMS PART 1. COOKIE A cookie is a small amount of data generated by a website and saved by your web browser. Its purpose is to remember information.
CrowdLogging: Distributed, private, and anonymous search logging Henry Feild James Allan Joshua Glatt Center for Intelligent Information Retrieval University.
WebQuilt and Mobile Devices: A Web Usability Testing and Analysis Tool for the Mobile Internet Tara Matthews Seattle University April 5, 2001 Faculty Mentor:
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
1 Dr. Michael D. Featherstone Introduction to e-Commerce Revenue Generating Mechanisms.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
HTTP: cookies and advertising Concepts to cover:  web page content (including ads) from multiple site: composition at client  cookies  third-party cookies:
Privacy-Preserving P2P Data Sharing with OneSwarm -Piggy.
Fall, Privacy&Security - Virginia Tech – Computer Science Click to edit Master title style Design Extensions to Google+ CS6204 Privacy and Security.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Describe several.
Build a Free Website1 Build A Website For Free 2 ND Edition By Mark Bell.
Application-Layer Anycasting By Samarat Bhattacharjee et al. Presented by Matt Miller September 30, 2002.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Privacy-Aware Personalization for Mobile Advertising
©2010 John Wiley and Sons Chapter 12 Research Methods in Human-Computer Interaction Chapter 12- Automated Data Collection.
Dynamic Content On Edge Cache Server (using Microsoft.NET) Name: Aparna Yeddula CS – 522 Semester Project Project URL: cs.uccs.edu/~ayeddula/project.html.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
CSCE 201 Web Browser Security Fall CSCE Farkas2 Web Evolution Web Evolution Past: Human usage – HTTP – Static Web pages (HTML) Current: Human.
Information Systems & Enhancing Decision Making for the Digital Firm
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Data Tagging Architecture for System Monitoring in Dynamic Environments Bharat Krishnamurthy, Anindya Neogi, Bikram Sengupta, Raghavendra Singh (IBM Research.
Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
Monitoring the acquisition process by web widgets Leonardo Tininini and Antonino Virgillito ISTAT Meeting on the Management of Statistical Information.
Non-tracking Web Analytics Istemi Ekin Akkus, Ruichuan Chen, Michaela Hardt, Paul Francis, Johannes Gehrke Presentation by David Ferreras.
Maintaining and Updating Windows Server Monitoring Windows Server It is important to monitor your Server system to make sure it is running smoothly.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 14 Database Connectivity and Web Technologies.
PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.
© 2005 BEA Systems, Inc. | 1 Portal Server Cache Settings Plumtree (BEA ALUI) March, 2007.
Delivering Fixed Content to Oracle Portal Doug Daniels & Ken Barrette Quest Software.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
Organisations and Data Management 1 Data Collection: Why organisations & individuals acquire data & supply data via websites 2Techniques used by organisations.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Paul Graham Software Architect, EPCC PCP – The P robes C oordination P rotocol A secure, robust framework.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Web Browsing *TAKE NOTES*. Millions of people browse the Web every day for research, shopping, job duties and entertainment. Installing a web browser.
Secure middleware patterns E.B.Fernandez. Middleware security Architectures have been studied and several patterns exist Security aspects have not been.
Traffic Correlation in Tor Source and Destination Prediction PETER BYERLEY RINDAL SULTAN ALANAZI HAFED ALGHAMDI.
The overview How the open market works. Players and Bodies  The main players are –The component supplier  Document  Binary –The authorized supplier.
SpyProxy SpyProxy Execution-based Detection of MaliciousWeb Content Execution-based Detection of MaliciousWeb Content Hongjin, Lee.
Protecting your search privacy A lesson plan created & presented by Maria Bernhey (MLS) Adjunct Information Literacy Instructor
RETS Working Group August 5, 2004Slide 1 RETS 2.0 – Bridging the Gap Sergio Del Rio Templates 4 Business Inc.
Distributed Web Systems Cookies and Session Tracking Lecturer Department University.
Windows Vista Configuration MCTS : Internet Explorer 7.0.
Maninda Edirisooriya. Introduction Extension for Google Chrome. Privacy protection system for online chat. Encrypts chat text using 128 bit AES. Decrypts.
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
Constraint Framework, page 1 Collaborative learning for security and repair in application communities MIT site visit April 10, 2007 Constraints approach.
Hummingbird: Privacy at the time of Twitter
 Google analytics add your word press to help you to track your website visitors  That what they are looking for  Google + help you to access your.
When small data is better data
Improving searches through community clustering of information
W3 Status Analyzer.
Practical Censorship Evasion Leveraging Content Delivery Networks
Google Analytics & Search Console
563.10: Bloom Cookies Web Search Personalization without User Tracking
Training course on Euro SDMX Registry
Unit 27 Web Server Scripting Extended Diploma in ICT
Presentation transcript:

Non-tracking Web Analytics Istemi Ekin Akkus 1, Ruichuan Chen 1, Michaela Hardt 2, Paul Francis 1, Johannes Gehrke 3 1 Max Planck Institute for Software Systems 2 Twitter Inc. 3 Cornell University

Web Analytics Statistics about users visiting a publisher website Akkus et al.Non-tracking Web Analytics2

Analytics by Data Aggregators Collect analytics for many publishers from many clients Infer extended analytics – Age, gender, education level, other sites visited, … Provide aggregate information to publishers & advertisers Akkus et al.Non-tracking Web Analytics3 Aggregate Extended Analytics Data AggregatorPublisher

Analytics Today Akkus et al.Non-tracking Web Analytics4 Publisher Client Data Aggregator

Tracking Data aggregators criticized – Collection of individual information Criticisms led to reactions – Do-not-Track proposal, EU cookie law – Voluntary opt-out mechanisms by aggregators – Client-side tools to blacklist aggregators Fewer tracked users  less data for inference  worse extended analytics for publishers Akkus et al.Non-tracking Web Analytics5

Goal Replicate the functionality of today’s systems without tracking Replicate the functionality of today’s systems without tracking Akkus et al.Non-tracking Web Analytics6

Specific Goals Privacy – No individual information collected by publishers & aggregators Functionality – Aggregate information for publishers & aggregators – No new organizational components – Practical and efficient Akkus et al.Non-tracking Web Analytics7

Outline Motivation & Goals Components & Assumptions Non-tracking Analytics Implementation & Evaluation Conclusion Akkus et al.Non-tracking Web Analytics8

Components Client locally stores information about the user Publisher serves webpages to clients Aggregator provides aggregation service Akkus et al.Non-tracking Web Analytics9

Assumptions Akkus et al.Non-tracking Web Analytics10 Potentially malicious client – May try to distort results Potentially malicious publisher – May try to violate individual user privacy Honest-but-curious data aggregator – Follows the protocol – Doesn’t collude with publishers

Outline Motivation & Goals Components & Assumptions Non-tracking Analytics – Publisher as Proxy – Noise – Yes-No Queries – Auditing Implementation & Evaluation Conclusion Akkus et al.Non-tracking Web Analytics11

Today Not anonymous; need a proxy… …, but don’t want a new component Publisher already interacts with clients! Akkus et al.Non-tracking Web Analytics12

Publisher as Anonymizing Proxy 4.Aggregator counts anonymous answers and returns results 1.Publisher distributes queries to be executed 2.Publisher collects encrypted answers 3.Publisher forwards answers to the aggregator Clients never exposed to the data aggregator 1. Queries 2. Encrypted Answers 3. Encrypted Answers 4. Results Akkus et al.Non-tracking Web Analytics13

Identifiers in Responses Rare attributes – Job: CEO of ACME Enc(CEO of ACME) Enc(CEO of ACME) CEO of ACME visits my site! CEO of ACME visits example.com Akkus et al.Non-tracking Web Analytics example.com 14

Noise 2. Encrypted Answers 4. Noisy Encrypted Answers 6. Double-noisy Result 3. Add Noise_Publisher 5. Add Noise_Aggregator 7. Remove Noise_Publisher Both entities obtain noisy results Both entities obtain noisy results Result with Noise_Aggregator Result with Noise_Publisher Akkus et al.Non-tracking Web Analytics15

Differentially-private Noise Hides the existence of an individual answer CEO: real or noise?? Requires numerical values ? Akkus et al.Non-tracking Web Analytics16

Yes-No Questions Convert queries to binary & count answers “What is your job?”  “Is your job ‘CEO’?”  Noise as additional answers – Enc(‘Yes’), Enc(‘No’) Bonus: limits a malicious client – Either +1 or 0 Many possible values  Many questions – Job: ‘CEO’, ‘Student’, ‘Gardener’,... Akkus et al.Non-tracking Web Analytics17

Buckets Multiple yes-no questions with one query 1.Enumerate possible answer values – Job: {‘CEO’, ‘Student’, `Gardener’, `Teacher’,...} 2.A fixed number of ‘Yes’ answers – Job: 1 3.Clients choose ‘Yes’ for the matching bucket – Enc(‘CEO = Yes’) 4.Publisher generates additional answers – Enc(‘CEO = Yes’), Enc(‘Student = Yes’),... Akkus et al.Non-tracking Web Analytics18

Impracticalities of Differential Privacy Requires a privacy budget – Stop answering when budget expires – No answers from clients  low-utility results Assumes a static database; our setting is dynamic – User population of a publisher changes – Certain user data may change  Clients keep answering queries Akkus et al.Non-tracking Web Analytics19

Malicious Publishers Isolation attacks – Isolate a user’s response – Repeat the same query – Cancel out noise 1.Specific query conditions or buckets – Monitoring and approval by the data aggregator 2.Selectively dropping client responses Akkus et al.Non-tracking Web Analytics20

Isolation via Dropping Responses Enc(CEO) Enc(Student) Enc(Gardener) Enc(CEO) Enc(Student) Enc(Gardener) Enc(Driver) Enc(Mechanic) Enc(Driver) Mechanic: 1 + noise Driver: 2 + noise CEO: 1 + noise User in the middle is a CEO! Akkus et al.Non-tracking Web Analytics21 example.com

Auditing Enc(CEO) Enc(Student) Enc(CEO) Enc(Student) Enc(nonce) Enc(Driver) Enc(Mechanic) Enc(Driver) Enc(nonce) Enc(example.com, nonce) Enc(example.com, nonce) Akkus et al.Non-tracking Web Analytics22 example.com nonce? example.com

Outline Motivation & Goals Components & Assumptions Non-tracking Analytics – Publisher as Proxy – Noise – Yes-No Answer – Auditing Implementation & Evaluation Conclusion Akkus et al.Non-tracking Web Analytics23

Implementation 2000 lines of code in total – Client: Firefox extension – Publisher software: Piwik plugin – Aggregator software: simple server Deployed and tested with over 200 users RSA public key cryptosystem Akkus et al.Non-tracking Web Analytics24

Evaluation – Decryption Overhead Aggregator: 2.4 GHz CPU, 2048-bit key Publisher: 50K users, 2 sets of queries/week 1.Information currently provided – Demographics, other sites – 3.6 CPU hours/week 2.Information available through our system – # pages browsed, search engines, visit frequency to other sites – 3 CPU hours/week Akkus et al.Non-tracking Web Analytics25

Evaluation – Client Overhead Bandwidth overhead – <100KB/week to download 11 queries – 8KB/week for all query responses CPU overhead for encryption – Google Chrome: 380 enc/sec – Firefox: 20 enc/sec Akkus et al.Non-tracking Web Analytics26

Summary Extended analytics without tracking – Differential privacy guarantees for users – Aggregate information for publishers & aggregators No new organizational component Practical & feasible to deploy Akkus et al.Non-tracking Web Analytics27