Presentation is loading. Please wait.

Presentation is loading. Please wait.

Differentially Private Analysis of Graphs and Social Networks

Similar presentations


Presentation on theme: "Differentially Private Analysis of Graphs and Social Networks"— Presentation transcript:

1 Differentially Private Analysis of Graphs and Social Networks
Sofya Raskhodnikova Pennsylvania State University

2 Graphs and networks Before we go into examples, let me review some basic terminology from graph theory that I will be relying on in my talk. Terms ``graph’’ and ``network’’ are used interchangeably. A graph or a network is collection of connected objects. Objects are called nodes or vertices and are usually depicted as points. Connections between nodes are called edges or links, and are usually drawn as lines between points. In this talk, we will focus on graphs that contain sensitive, private information. In such graphs, nodes typically represent individuals and edges capture relationships between them. Many types of data can be represented as graphs, where nodes represent individuals and edges capture relationships. Image source: Nykamp DQ, “An introduction to networks.” From Math Insight.

3 Potentially sensitive information in graphs
Many graphs routinely assembled and published for research, data-mining and surveillance contain potentially sensitive information. Links in such graphs can represent, for example,… Here are some specific examples. This is a social network of people assessed repeatedly from 1971 to 2003 as part of the Framingham Heart Study. Sociologists, epidemiologists, and health-care professionals collect data about geographic, friendship, family, and sexual networks to study disease propagation and risk. Social, romantic and sexual relationships “Friendships” in an online social network Financial transactions Phone calls and communication Doctor-patient relationships Source: Christakis, Fowler. The Spread of Obesity in a Large Social Network over 32 Years. N Engl J Med 2007; 357: Source: B. Aven. The effects of corruption on organizational networks and individual behavior. MIT workshop: Information and Decision in Social Networks, 2011.

4 Privacy Utility Two conflicting goals
Public benefits from making aggregate information available. Privacy: protecting information of individuals. Utility: drawing accurate conclusions about aggregate information. Privacy Utility

5 ``Anonymized’’ graphs still pose privacy risk
This is the graph of romantic and sexual relationships in one American high school from a famous sociological study. In this case, the researchers decided to publish all the links in the graph, after removing all information associated with each node, except for gender. (collected for a period of 18 months ) False dichotomy: personally identifying vs. non-personally identifying information. Links and any other information about individual can be used for de-anonymization. In a typical real-life network, many nodes have unique neighborhoods. Bearman, Moody, Stovel. Chains of affection: The structure of adolescent romantic and sexual networks, American J. Sociology, 2008

6 Some published de-anonymization attacks
Movies Movie ratings [Narayanan, Shmatikov 08] De-identified Netflix users based on information from a public movie database IMDb. Social networks [Backstrom, Dwork, Kleinberg 07; Narayanan, Shmatikov 09; Narayanan, Shi, Rubinstein 12] Re-identified users in an online social network (anonymized Twitter) based information from a public online social network (Flickr). Computer networks [Coull, Wright, Monrose, Collins, Reiter 07; Ribeiro, Chen, Miklau, Townsley 08,…] Can reidentify individuals based on external sources. People

7 Who’d want to de-anonymize a social network graph?
Government agency for surveillance. A phisher/spammer to write a personalized message. Health insurance provider to check preexisting conditions. Marketers to focus advertising on influential nodes. Stalkers, nosy neighbors, colleagues, or employers. image sources: Andrew Joyner,

8 What information can be released
without violating privacy?

9 Differential privacy (for graph data)
Here is an answer to this question, motivated by insights of modern cryptography. An algorithm … the usual condition holds. What does it mean for two graphs to be neighbors? Graph G Data processing Data release Algorithm output Differential privacy [Dwork McSherry Nissim Smith 06] Given 𝝐>0, an algorithm A is 𝝐-differentially private if for all pairs of neighbors 𝑮, 𝑮′ and all possible outputs o: 𝑷𝒓 𝑨 𝑮 =𝒐 ≤ 𝟏+𝝐 ⋅ 𝑷𝒓 𝑨 𝑮 ′ =𝒐 . image source

10 Two variants of differential privacy for graphs
Node differential privacy is more in the spirit of protecting privacy of each individual. However, this definition is significantly harder to satisfy because you have to cover much larger changes in the graph. Edge differential privacy Two graphs are neighbors if they differ in one edge. Node differential privacy Two graphs are neighbors if one can be obtained from the other by deleting a node and its adjacent edges. G: G′: G: G′:

11 Differential privacy (for graph data)
This is the standard definition of differential privacy. The only innovation is that the usual picture with discs got updated to a graph. An algorithm … the usual condition holds. What does it mean for two graphs to be neighbors? Graph G Data processing Data release Algorithm output Differential privacy [Dwork McSherry Nissim Smith 06] An algorithm A is 𝝐-differentially private if for all pairs of neighbors 𝑮, 𝑮′ and all possible outputs o: 𝑷𝒓 𝑨 𝑮 =𝒐 ≤ 𝟏+𝝐 ⋅ 𝑷𝒓 𝑨 𝑮 ′ =𝒐 image source

12 Some useful properties of differential privacy
Robustness to side information Post-processing: an algorithm that runs an 𝜖-differentially private subroutine is an 𝜖-differentially private. Composition: an algorithm that runs 𝑘 subroutines that are 𝜖-differentially private is 𝑘𝜖-differentially private.

13 Is differential privacy too strong?
No weaker notion has been proposed that satisfies all three useful properties. We can actually attain it for many useful statistics!

14 What graph statistics can be computed accurately
with differential privacy?

15 (e.g., triangles, 𝒌-triangles, 𝒌-stars)
Graph statistics Scale-free = network whose distribution follows a power law. Basic Network Metrics Notes by Daniel Whitney Number of edges Counts of small subgraphs (e.g., triangles, 𝒌-triangles, 𝒌-stars) Degree distribution Number of connected components How different our graph is from a graph with a certain property Correlation between node attributes (e.g., body fat index) and network attributes (e.g., number of obese neighbors). Sizes of graph cuts. Fraction of nodes of degree d The degree of a node is the number of connections it has. Degree d

16 Tools used in differentially private graph algorithms
Smooth sensitivity A more nuanced notion of sensitivity than the one mentioned in the previous talk Sample and aggregate Maximum flow Linear and convex programming Random projections Iterative updates Postprocessing

17 Differentially private graph analysis
A taste of techniques

18 Basic question: how to compute a statistic f
This is the standard definition of differential privacy. The only innovation is that the usual picture with discs got updated to a graph. An algorithm … the usual condition holds. What does it mean for two graphs to be neighbors? Graph G Data processing Data release Algorithm Approximation to 𝑓(𝐺) How accurately can an 𝝐-differentially private algorithm compute f(G)? image source

19 Challenge for node privacy: high sensitivity
The first upper bound on the error was given in the paper that defined DP by DMNS. Sensitivity of a function 𝑓 is Previous talk: if 𝑓 has low sensitivity, it can be computed accurately with differential privacy (by Laplace mechanism). Challenge: Even simple graph statistics have high sensitivity. 𝝏𝒇= max 𝐧𝐨𝐝𝐞 𝐧𝐞𝐢𝐠𝐡𝐛𝐨𝐫𝑠 𝐺,𝐺′ 𝑓 𝐺 −𝑓 𝐺 ′ 𝒇(𝑮′) 𝒇(𝑮)

20 Challenge for node privacy: high sensitivity
The first upper bound on the error was given in the paper that defined DP by DMNS. Sensitivity of a function 𝑓 is Examples: 𝑓 − (G) is the number of edges in G. 𝑓 △ (G) is the number of triangles in G. max 𝐧𝐨𝐝𝐞 𝐧𝐞𝐢𝐠𝐡𝐛𝐨𝐫𝑠 𝐺,𝐺′ 𝑓 𝐺 −𝑓 𝐺 ′ for graphs on 𝑛 nodes: 𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲 = 𝑛. 𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲 = 𝒏 𝟐 .

21 Idea: project onto graphs
with low sensitivity. [Kasiviswanathan Nissim Raskhodnikova Smith 13] See also [Blocki Blum Datta Sheffet 13, Chen Zhou 13]

22 “Projections” on graphs of small degree
The first upper bound on the error was given in the paper that defined DP by DMNS. Consider graphs of degree ≤𝑑, where 𝑑≈ 𝑛 Examples: 𝑓 − (G) is the number of edges in G. 𝑓 △ (G) is the number of triangles in G. All graphs Graphs of degree ≤𝑑. 𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲 = 𝑑. 𝐒𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐢𝐭𝐲 = 𝒅 𝟐 .

23 Lipschitz extensions A function 𝑓′ is a Lipschitz extension of 𝑓 if
𝑓′ agrees with 𝑓 on graphs of degree ≤𝑑 Sensitivity of 𝑓′ is low Release 𝑓′ using Laplace mechanism There exist Lipschitz extensions for all real-valued functions We design Lipschitz extensions that can be computed efficiently using linear and convex programming for subgraph counts degree distribution All graphs Graphs of degree ≤𝑑. 𝒇 ′ =𝒇

24 Summary Accurate subgraph counts for realistic graphs can be computed by node-private algorithms Use Lipschitz extensions and linear programming It is one example of many graph statistics that node-private algorithms do well on.

25 What can’t be computed differentially privately?
Differential privacy explicitly excludes the possibility of computing anything that depends on one person’s data: Is there a node in the graph that has atypical connections? ``suspicious communication patterns’’?

26 What we are working on Node differentially private algorithms for releasing a large number of graph statistics at once synthetic graphs Exciting area of research: Edge-private algorithms [Nissim, Raskhodnikova, Smith 07; Hay, Rastogi, Miklau, Suciu 09; Hay, Li, Miklau, Jensen 09; Hardt, Rothblum 10; Karwa, Raskhodnikova, Smith, Yaroslavtsev 11; Karwa, Slavkovic 12; Blocki, Blum, Datta, Sheffet 12; Gupta, Roth, Ullman 12; Mir, Wright 12; Kifer, Lin 13, …] Node-private algorithms [Gehrke Lui Pass 12; Blocki Blum Datta Sheffet 13, Kasiviswanathan Nissim Raskhodnikova Smith 13, Chen Zhou 13, Raskhodnikova Smith, ..]

27 Conclusions We are close to having edge-private and node-private algorithms that work well in practice for many basic graph statistics. Accurate node-private algorithms were thought to be impossible only a few years ago. Differential privacy is influencing other scientific disciplines Next talk: reducing false discovery rate.


Download ppt "Differentially Private Analysis of Graphs and Social Networks"

Similar presentations


Ads by Google