Download presentation
Presentation is loading. Please wait.
Published byRey Head Modified over 9 years ago
1
Detecting Data Leakage Panagiotis Papadimitriou papadimitriou@stanford.edu Hector Garcia-Molina hector@cs.stanford.edu
2
Leakage Problem Stanford Infolab2 App. U 1 App. U 2 JeremySarahMark Other Sources e.g. Sarah’s Network Name: Mark Sex: Male …. Name: Sarah Sex: Female …. Kathryn
3
Outline Problem Description Guilt Models – Pr{U 1 leaked data} = 0.7 – Pr{U 2 leaked data} = 0.2 Distribution Strategies Stanford Infolab3
4
Problem Description Guilt Models Distribution Strategies Stanford Infolab4
5
Problem Entities EntityDataset Distributor Facebook T Set of all Facebook profiles Agents Facebook Apps U 1, …, U n R 1, …, R n R i : Set of people’s profiles who have added the application U i Leaker S Set of leaked profiles Stanford Infolab5
6
Agents’ Data Requests Sample – 100 profiles of Stanford people Explicit – All people who added application (example we used so far) – All Stanford profiles Stanford Infolab6
7
Problem Description Guilt Models Distribution Strategies Stanford Infolab7
8
Guilt Models (1/3) Stanford Infolab8 Other Sources e.g. Sarah’s Network 8 p p: posterior probability that a leaked profile comes from other sources p Guilty Agent: Agent who leaks at least one profile Pr{G i |S}: probability that agent U i is guilty, given the leaked set of profiles S
9
Guilt Models (2/3) Stanford Infolab99 or Agents leak each of their data items independently Agents leak all their data items OR nothing or (1-p) 2 (1-p)p p(1-p) p2p2
10
Guilt Models (3/3) IndependentlyNOT Independently Stanford Infolab10 Pr{G 1 } Pr{G 2 } Pr{G 1 }
11
Problem Description Guilt Models Distribution Strategies Stanford Infolab11
12
The Distributor’s Objective (1/2) Stanford Infolab12 U1U1 U1U1 U2U2 U2U2 U3U3 U3U3 U4U4 U4U4 Request R1R1 Pr{G 1 |S}>>Pr{G 2 |S} Pr{G 1 |S}>> Pr{G 4 |S} S (leaked) R1R1 R1R1 R3R3 R3R3 R2R2 R3R3 R4R4
13
The Distributor’s Objective (2/2) To achieve his objective the distributor has to distribute sets R i, …, R n that minimize Intuition: Minimized data sharing among agents makes leaked data reveal the guilty agents Stanford Infolab13
14
Distribution Strategies – Sample (1/4) Set T has four profiles: – Kathryn, Jeremy, Sarah and Mark There are 4 agents: – U 1, U 2, U 3 and U 4 Each agent requests a sample of any 2 profiles of T for a market survey Stanford Infolab14
15
Distribution Strategies – Sample (2/4) Poor Minimize Stanford Infolab15 U1U1 U2U2 U3U3 U4U4 U1U1 U2U2 U3U3 U4U4
16
Distribution Strategies – Sample (3/4) Optimal Distribution Avoid full overlaps and minimize Stanford Infolab16 U1U1 U2U2 U3U3 U4U4
17
Distribution Strategies – Sample (4/4) Stanford Infolab17
18
Distribution Strategies Sample Data Requests The distributor has the freedom to select the data items to provide the agents with General Idea: – Provide agents with as much disjoint sets of data as possible Problem: There are cases where the distributed data must overlap E.g., |R i |+…+|R n |>|T| Explicit Data Requests The distributor must provide agents with the data they request General Idea: – Add fake data to the distributed ones to minimize overlap of distributed data Problem: Agents can collude and identify fake data NOT COVERED in this talk Stanford Infolab18
19
Conclusions Data Leakage Modeled as maximum likelihood problem Data distribution strategies that help identify the guilty agents Stanford Infolab19
20
Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.