Preserving Privacy in Clickstreams Isabelle Stanton.

Slides:



Advertisements
Similar presentations
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Advertisements

The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Dictator tests and Hardness of approximating Max-Cut-Gain Ryan O’Donnell Carnegie Mellon (includes joint work with Subhash Khot of Georgia Tech)
Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems Dan Frankowski, Dan Cosley, Shilad Sen, Tony Lam, Loren Terveen,
An Zhu Towards Achieving Anonymity. Introduction  Collect and analyze personal data Infer trends and patterns  Making the personal data “public” Joining.
1 DECK: Detecting Events from Web Click-through Data Ling Chen, Yiqun Hu, Wolfgang Nejdl Presented by Sebastian Föllmer.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Attacks against K-anonymity
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
Public Key Encryption that Allows PIR Queries Dan Boneh 、 Eyal Kushilevitz 、 Rafail Ostrovsky and William E. Skeith Crypto 2007.
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
The AdWords Toolbox All the tools you need to make your ad run more efficiently!
TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION.
Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman.
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Privacy and trust in social network
 You can use google docs to construct tests or surveys that can be given online.  Multiple choice, matching, or fill in one word answers work well with.
RESEARCHING AND CITING SOURCES WHAT INFORMATION WILL I HAVE TO RESEARCH? You will need to have researched information on the following areas: 
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Identifying Reversible Functions From an ROBDD Adam MacDonald.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
Emiran UC San Diego Alin UC San Diego K.K. at&t Divesh at&t.
Protecting Sensitive Labels in Social Network Data Anonymization.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Refined privacy models
SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Privacy of profile-based ad targeting Alexander Smal and Ilya Mironov.
Multiplicative Data Perturbations. Outline  Introduction  Multiplicative data perturbations Rotation perturbation Geometric Data Perturbation Random.
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.
Other Perturbation Techniques. Outline  Randomized Responses  Sketch  Project ideas.
PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Privacy-preserving data publishing
Error Detection and Correction – Hamming Code
Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.
Privacy Protection in Social Networks Instructor: Assoc. Prof. Dr. DANG Tran Khanh Present : Bui Tien Duc Lam Van Dai Nguyen Viet Dang.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Privacy preserving data mining – multiplicative perturbation techniques Li Xiong CS573 Data Privacy and Anonymity.
Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory.
Preserving Privacy and Social Influence Isabelle Stanton.
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Unraveling an old cloak: k-anonymity for location privacy
The Birthday Paradox July Definition 2 Birthday attacks are a class of brute-force techniques that target the cryptographic hash functions. The.
Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.
INTRODUCTION TO ACCESS 2010 Winter Basics of Access Data Management System Allows for multiple levels of data Relational Database User defined relations.
Center for E-Business Technology Seoul National University Seoul, Korea Private Queries in Location Based Services: Anonymizers are not Necessary Gabriel.
The Stream Model Sliding Windows Counting 1’s
A way to detect a collision…
Personalized Privacy Protection in Social Networks
563.10: Bloom Cookies Web Search Personalization without User Tracking
Probabilistic Data Management
By (Group 17) Mahesha Yelluru Rao Surabhee Sinha Deep Vakharia
Differential Privacy in Practice
Personalized Privacy Protection in Social Networks
Lecture 27: Privacy CS /7/2018.
Differential Privacy (2)
The Birthday Paradox June 2012.
Published in: IEEE Transactions on Industrial Informatics
Refined privacy models
Differential Privacy (1)
Presentation transcript:

Preserving Privacy in Clickstreams Isabelle Stanton

Outline Introduction Prior Work My Solution Future Work

Introduction Useful data, search algorithms, search optimization, ad targeting etc AOL Scandal: 651,000 users over 3 months = 21 million queries Current work – Input perturbation: k-anonymity – Output perturbation: Laplacian noise and sensitivity – User published: Psuedorandom Sketches

k-anonymity “Hiding in a crowd of size k” Given a set of attributes find a quasi-identifier Make sure each quasi-identifier appears at least k times Prevents attacks by linking other available datasets Problem: each query + click + approximate time is a quasi-identifier

Example: 2-anonymized GenderZipB’day M M F M GenderZipB’day M2290*1964 M2290*1964 * *

Laplacian Noise How do you add noise to a text answer in a sensible way: i.e. ‘What is the most popular query term?’ Sensitivity: Query: “How many users searched for x, y and z”? has sensitivity 1 but can uniquely identify a user.

Psuedorandom Sketches Want to publish an attribute with value v Create a vector where each entry represents one possible value of the attribute, 1 means yes, 0 means no Flip each bit in the vector with prob. p = ½ - ε This vector is generated by a p-biased psuedorandom function. Publish the input to this function that generates your vector.

Psuedorandom Sketches Privacy guarantee: Problem: There is a one to one correspondence between #sketches published and #attributes published Also, possible number of query/clickstreams: – 100,000,000 webpages – 1000 words in a query (Google API) – ~1,000,000 English words – = 2 100,000,000,000,000,000

Proposal Publish multiple values per sketch Benefits: doesn’t reveal length of search history, makes vector MUCH smaller – 1,000,000,000,000,000 entries – Can make this smaller Cons: Need to check the privacy guarantees, slight problem with ordering

Multiple Values in Sketches Assume a user publishes between 0 and h queries in one sketch Privacy with one sketch: Privacy with l sketches: Minimum length of sketch, M = # users, τ = probability of failure:

Search Personalization Sketches can’t be used for this (unless you did something wrong) Can construct clusters of people from sketches

Future Work Create a system that makes this easy for users Improve search personalization Find appropriate balance of M, h and ε

Questions?