Download presentation
Presentation is loading. Please wait.
Published byOsvaldo Chalk Modified over 9 years ago
1
Non-tracking Web Analytics Istemi Ekin Akkus 1, Ruichuan Chen 1, Michaela Hardt 2, Paul Francis 1, Johannes Gehrke 3 1 Max Planck Institute for Software Systems 2 Twitter Inc. 3 Cornell University
2
Web Analytics Statistics about users visiting a publisher website Akkus et al.Non-tracking Web Analytics2
3
Analytics by Data Aggregators Collect analytics for many publishers from many clients Infer extended analytics – Age, gender, education level, other sites visited, … Provide aggregate information to publishers & advertisers Akkus et al.Non-tracking Web Analytics3 Aggregate Extended Analytics Data AggregatorPublisher
4
Analytics Today Akkus et al.Non-tracking Web Analytics4 Publisher Client Data Aggregator
5
Tracking Data aggregators criticized – Collection of individual information Criticisms led to reactions – Do-not-Track proposal, EU cookie law – Voluntary opt-out mechanisms by aggregators – Client-side tools to blacklist aggregators Fewer tracked users less data for inference worse extended analytics for publishers Akkus et al.Non-tracking Web Analytics5
6
Goal Replicate the functionality of today’s systems without tracking Replicate the functionality of today’s systems without tracking Akkus et al.Non-tracking Web Analytics6
7
Specific Goals Privacy – No individual information collected by publishers & aggregators Functionality – Aggregate information for publishers & aggregators – No new organizational components – Practical and efficient Akkus et al.Non-tracking Web Analytics7
8
Outline Motivation & Goals Components & Assumptions Non-tracking Analytics Implementation & Evaluation Conclusion Akkus et al.Non-tracking Web Analytics8
9
Components Client locally stores information about the user Publisher serves webpages to clients Aggregator provides aggregation service Akkus et al.Non-tracking Web Analytics9
10
Assumptions Akkus et al.Non-tracking Web Analytics10 Potentially malicious client – May try to distort results Potentially malicious publisher – May try to violate individual user privacy Honest-but-curious data aggregator – Follows the protocol – Doesn’t collude with publishers
11
Outline Motivation & Goals Components & Assumptions Non-tracking Analytics – Publisher as Proxy – Noise – Yes-No Queries – Auditing Implementation & Evaluation Conclusion Akkus et al.Non-tracking Web Analytics11
12
Today Not anonymous; need a proxy… …, but don’t want a new component Publisher already interacts with clients! Akkus et al.Non-tracking Web Analytics12
13
Publisher as Anonymizing Proxy 4.Aggregator counts anonymous answers and returns results 1.Publisher distributes queries to be executed 2.Publisher collects encrypted answers 3.Publisher forwards answers to the aggregator Clients never exposed to the data aggregator 1. Queries 2. Encrypted Answers 3. Encrypted Answers 4. Results Akkus et al.Non-tracking Web Analytics13
14
Identifiers in Responses Rare attributes – Job: CEO of ACME Enc(CEO of ACME) Enc(CEO of ACME) CEO of ACME visits my site! CEO of ACME visits example.com Akkus et al.Non-tracking Web Analytics example.com 14
15
Noise 2. Encrypted Answers 4. Noisy Encrypted Answers 6. Double-noisy Result 3. Add Noise_Publisher 5. Add Noise_Aggregator 7. Remove Noise_Publisher Both entities obtain noisy results Both entities obtain noisy results Result with Noise_Aggregator Result with Noise_Publisher Akkus et al.Non-tracking Web Analytics15
16
Differentially-private Noise Hides the existence of an individual answer CEO: real or noise?? Requires numerical values ? Akkus et al.Non-tracking Web Analytics16
17
Yes-No Questions Convert queries to binary & count answers “What is your job?” “Is your job ‘CEO’?” Noise as additional answers – Enc(‘Yes’), Enc(‘No’) Bonus: limits a malicious client – Either +1 or 0 Many possible values Many questions – Job: ‘CEO’, ‘Student’, ‘Gardener’,... Akkus et al.Non-tracking Web Analytics17
18
Buckets Multiple yes-no questions with one query 1.Enumerate possible answer values – Job: {‘CEO’, ‘Student’, `Gardener’, `Teacher’,...} 2.A fixed number of ‘Yes’ answers – Job: 1 3.Clients choose ‘Yes’ for the matching bucket – Enc(‘CEO = Yes’) 4.Publisher generates additional answers – Enc(‘CEO = Yes’), Enc(‘Student = Yes’),... Akkus et al.Non-tracking Web Analytics18
19
Impracticalities of Differential Privacy Requires a privacy budget – Stop answering when budget expires – No answers from clients low-utility results Assumes a static database; our setting is dynamic – User population of a publisher changes – Certain user data may change Clients keep answering queries Akkus et al.Non-tracking Web Analytics19
20
Malicious Publishers Isolation attacks – Isolate a user’s response – Repeat the same query – Cancel out noise 1.Specific query conditions or buckets – Monitoring and approval by the data aggregator 2.Selectively dropping client responses Akkus et al.Non-tracking Web Analytics20
21
Isolation via Dropping Responses Enc(CEO) Enc(Student) Enc(Gardener) Enc(CEO) Enc(Student) Enc(Gardener) Enc(Driver) Enc(Mechanic) Enc(Driver) Mechanic: 1 + noise Driver: 2 + noise CEO: 1 + noise User in the middle is a CEO! Akkus et al.Non-tracking Web Analytics21 example.com
22
Auditing Enc(CEO) Enc(Student) Enc(CEO) Enc(Student) Enc(nonce) Enc(Driver) Enc(Mechanic) Enc(Driver) Enc(nonce) Enc(example.com, nonce) Enc(example.com, nonce) Akkus et al.Non-tracking Web Analytics22 example.com nonce? example.com
23
Outline Motivation & Goals Components & Assumptions Non-tracking Analytics – Publisher as Proxy – Noise – Yes-No Answer – Auditing Implementation & Evaluation Conclusion Akkus et al.Non-tracking Web Analytics23
24
Implementation 2000 lines of code in total – Client: Firefox extension – Publisher software: Piwik plugin – Aggregator software: simple server Deployed and tested with over 200 users RSA public key cryptosystem Akkus et al.Non-tracking Web Analytics24
25
Evaluation – Decryption Overhead Aggregator: 2.4 GHz CPU, 2048-bit key Publisher: 50K users, 2 sets of queries/week 1.Information currently provided – Demographics, other sites – 3.6 CPU hours/week 2.Information available through our system – # pages browsed, search engines, visit frequency to other sites – 3 CPU hours/week Akkus et al.Non-tracking Web Analytics25
26
Evaluation – Client Overhead Bandwidth overhead – <100KB/week to download 11 queries – 8KB/week for all query responses CPU overhead for encryption – Google Chrome: 380 enc/sec – Firefox: 20 enc/sec Akkus et al.Non-tracking Web Analytics26
27
Summary Extended analytics without tracking – Differential privacy guarantees for users – Aggregate information for publishers & aggregators No new organizational component Practical & feasible to deploy Akkus et al.Non-tracking Web Analytics27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.