Download presentation
Presentation is loading. Please wait.
1
Preserving Privacy in Clickstreams Isabelle Stanton
2
Outline Introduction Prior Work My Solution Future Work
3
Introduction Useful data, search algorithms, search optimization, ad targeting etc AOL Scandal: 651,000 users over 3 months = 21 million queries Current work – Input perturbation: k-anonymity – Output perturbation: Laplacian noise and sensitivity – User published: Psuedorandom Sketches
4
k-anonymity “Hiding in a crowd of size k” Given a set of attributes find a quasi-identifier Make sure each quasi-identifier appears at least k times Prevents attacks by linking other available datasets Problem: each query + click + approximate time is a quasi-identifier
5
Example: 2-anonymized GenderZipB’day M229031964 M229041964 F229031983 M229031983 GenderZipB’day M2290*1964 M2290*1964 *229031983 *229031983
6
Laplacian Noise How do you add noise to a text answer in a sensible way: i.e. ‘What is the most popular query term?’ Sensitivity: Query: “How many users searched for x, y and z”? has sensitivity 1 but can uniquely identify a user.
7
Psuedorandom Sketches Want to publish an attribute with value v Create a vector where each entry represents one possible value of the attribute, 1 means yes, 0 means no Flip each bit in the vector with prob. p = ½ - ε This vector is generated by a p-biased psuedorandom function. Publish the input to this function that generates your vector.
8
Psuedorandom Sketches Privacy guarantee: Problem: There is a one to one correspondence between #sketches published and #attributes published Also, possible number of query/clickstreams: – 100,000,000 webpages – 1000 words in a query (Google API) – ~1,000,000 English words – = 2 100,000,000,000,000,000
9
Proposal Publish multiple values per sketch Benefits: doesn’t reveal length of search history, makes vector MUCH smaller – 1,000,000,000,000,000 entries – Can make this smaller Cons: Need to check the privacy guarantees, slight problem with ordering
10
Multiple Values in Sketches Assume a user publishes between 0 and h queries in one sketch Privacy with one sketch: Privacy with l sketches: Minimum length of sketch, M = # users, τ = probability of failure:
11
Search Personalization Sketches can’t be used for this (unless you did something wrong) Can construct clusters of people from sketches
12
Future Work Create a system that makes this easy for users Improve search personalization Find appropriate balance of M, h and ε
13
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.