Graph Analysis with Node Differential Privacy

Slides:

Advertisements

Similar presentations

Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.

Advertisements

The Primal-Dual Method: Steiner Forest TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA A A A AA A A.

Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.

Fast Algorithms For Hierarchical Range Histogram Constructions

Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Positive and Negative Relationships Chapter 5, from D. Easley and J. Kleinberg book.

Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.

Mauro Sozio and Aristides Gionis Presented By:

1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.

Private Approximation of Search Problems Amos Beimel Paz Carmi Kobbi Nissim Enav Weinreb Ben Gurion University Research partially Supported by the Frankel.

Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev

Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.

1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University.

1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.

An brief tour of Differential Privacy Avrim Blum Computer Science Dept Your guide:

Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR)

1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint work with Mira Gonen Dana Ron Tel-Aviv University.

The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.

Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.

Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.

Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.

Private Analysis of Graphs

Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)

1 Finding Sparser Directed Spanners Piotr Berman, Sofya Raskhodnikova, Ge Ruan Pennsylvania State University.

Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.

Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.

1 Sublinear Algorithms Lecture 1 Sofya Raskhodnikova Penn State University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.

Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.

CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.

Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.

Foundations of Privacy Lecture 5 Lecturer: Moni Naor.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.

Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)

Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.

Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.

NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.

Cohesive Subgraph Computation over Large Graphs

Property Testing (a.k.a. Sublinear Algorithms )

Private Data Management with Verification

New Characterizations in Turnstile Streams with Applications

Techniques for Achieving Vertex Level Differential Privacy

Approximating the MST Weight in Sublinear Time

Learning to Generate Networks

Finding Cycles and Trees in Sublinear Time

Understanding Generalization in Adaptive Data Analysis

Privacy-preserving Release of Statistics: Differential Privacy

Algorithm Design and Analysis

Lecture 18: Uniformity Testing Monotonicity Testing

MST in Log-Star Rounds of Congested Clique

Differential Privacy in Practice

Vitaly (the West Coast) Feldman

Current Developments in Differential Privacy

Maximal Independent Set

CIS 700: “algorithms for Big Data”

Differential Privacy and Statistical Inference: A TCS Perspective

Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems

Models of Network Formation

Why Social Graphs Are Different Communities Finding Triangles

On the effect of randomness on planted 3-coloring models

Introduction Wireless Ad-Hoc Network

Binghui Wang, Le Zhang, Neil Zhenqiang Gong

June 12, 2003 (joint work with Umut Acar, and Guy Blelloch)

Lecture 6: Counting triangles Dynamic graphs & sampling

Generalization bounds for uniformly stable algorithms

Differentially Private Analysis of Graphs and Social Networks

Network Models Michael Goodrich Some slides adapted from:

Locality In Distributed Graph Algorithms

Planting trees in random graphs (and finding them back)

Parameterized Complexity of Conflict-free Graph Coloring

Differential Privacy (1)

Presentation transcript:

Graph Analysis with Node Differential Privacy Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan (GE Research), Kobbi Nissim (Ben-Gurion U. and Harvard U.), Adam Smith (Penn State)

Publishing information about graphs Nearly all the previous talks were addressing the situation where each sensitive data item belongs to one person. In this session, we will focus on datasets where the sensitive items are connections between pairs of individuals, in other words – graphs. This work addresses the problem of publishing information about graphs while protecting privacy of individuals whose data they contain. This is the graph of romantic relationships in one American high school from a famous sociological study. Taking a closer look at this blue node and its connections, one might wonder how the researchers managed to get this boy to sign the consent form for releasing his data. Even though they released no “identifying information”, only “graph data”, Many datasets can be represented as graphs “Friendships” in online social network Financial transactions Email communication Romantic relationships image source http://community.expressor-software.com/blogs/mtarallo/36-extracting-data-facebook-social-graph-expressor-tutorial.html Privacy is a big issue! American J. Sociology, Bearman, Moody, Stovel

Differential privacy for graph data This is the standard definition of differential privacy. The only innovation is that the usual picture with discs got updated to a graph. An algorithm … the usual condition holds. What does it mean for two graphs to be neighbors? Graph G Users Trusted curator Government, researchers, businesses (or) malicious adversary ( ) queries A answers Differential privacy [Dwork McSherry Nissim Smith 06] An algorithm A is 𝝐-differentially private if for all pairs of neighbors 𝑮, 𝑮′ and all sets of answers S: 𝑷𝒓 𝑨 𝑮 ∈𝑺 ≤ 𝒆 𝝐 𝑷𝒓 𝑨 𝑮 ′ ∈𝑺 image source http://www.queticointernetmarketing.com/new-amazing-facebook-photo-mapper/

Two variants of differential privacy for graphs Node differential privacy is more in the spirit of protecting privacy of each individual. However, this definition is significantly harder to satisfy because you have to cover much larger changes in the graph. Edge differential privacy Two graphs are neighbors if they differ in one edge. Node differential privacy Two graphs are neighbors if one can be obtained from the other by deleting a node and its adjacent edges. G: G′: G: G′:

Node differentially private analysis of graphs Graph G Users Trusted curator Government, researchers, businesses (or) malicious adversary ( ) queries A answers Two conflicting goals: utility and privacy Impossible to get both in the worst case Previously: no node differentially private algorithms that are accurate on realistic graphs image source http://www.queticointernetmarketing.com/new-amazing-facebook-photo-mapper/

Our contributions First node differentially private algorithms that are accurate for sparse graphs node differentially private for all graphs accurate for a subclass of graphs, which includes graphs with sublinear (not necessarily constant) degree bound graphs where the tail of the degree distribution is not too heavy dense graphs Techniques for node differentially private algorithms Methodology for analyzing the accuracy of such algorithms on realistic networks Concurrent work on node privacy [Blocki Blum Datta Sheffet 13]

Our contributions: algorithms Node differentially private algorithms for releasing number of edges counts of small subgraphs (e.g., triangles, 𝒌-triangles, 𝒌-stars) degree distribution Accuracy analysis of our algorithms for graphs with not-too-heavy-tailed degree distribution: with 𝜶-decay for constant 𝛼>1 Notation: 𝒅 = average degree 𝑷 𝒅 = fraction of nodes in G of degree ≥𝑑 Every graph satisfies 1-decay Natural graphs (e.g., “scale-free” graphs, Erdos-Renyi) satisfy 𝛼>1 … … Frequency A graph G satisfies 𝜶-decay if for all 𝑡>1: 𝑃 𝑡⋅ 𝑑 ≤ 𝑡 −𝛼 ≤𝒕 −𝜶 … … … 𝒅 𝑡⋅ 𝒅 Degrees

Our contributions: accuracy analysis Node differentially private algorithms for releasing number of edges counts of small subgraphs (e.g., triangles, 𝒌-triangles, 𝒌-stars) degree distribution Accuracy analysis of our algorithms for graphs with not-too-heavy-tailed degree distribution: with 𝜶-decay for constant 𝛼>1 … … A graph G satisfies 𝜶-decay if for all 𝑡>1: 𝑃 𝑡⋅ 𝑑 ≤ 𝑡 −𝛼 (1+o(1))-approximation } 𝐀 𝛜,𝛂 𝐆 −𝐃𝐞𝐠𝐃𝐢𝐬𝐭𝐫𝐢𝐛(𝐆) 𝟏 =𝐨 𝟏

Previous work on differentially private computations on graphs Edge differentially private algorithms number of triangles, MST cost [Nissim Raskhodnikova Smith 07] degree distribution [Hay Rastogi Miklau Suciu 09, Hay Li Miklau Jensen 09, Karwa Slavkovic 12] small subgraph counts [Karwa Raskhodnikova Smith Yaroslavtsev 11] cuts [Blocki Blum Datta Sheffet 12] Edge private against Bayesian adversary (weaker privacy) small subgraph counts [Rastogi Hay Miklau Suciu 09] Node zero-knowledge private (stronger privacy) average degree, distances to nearest connected, Eulerian, cycle-free graphs for dense graphs [Gehrke Lui Pass 12]

Challenge for node privacy: high local sensitivity Local sensitivity [NRS’07]: Global sensitivity [DMNS’06]: For many functions 𝑓 of the data, node 𝐿 𝑆 𝑓 𝐺 is high. Consider adding a node connected to all other nodes. Examples: 𝑓 − (G)= |E(G)|. 𝑓 △ (G)= # of △s in G. 𝑳 𝑺 𝒇 𝑮 = max 𝐺 ′ : 𝐧𝐞𝐢𝐠𝐡𝐛𝐨𝐫 of 𝐺 𝑓 𝐺 −𝑓 𝐺 ′ 𝑮 𝑺 𝒇 = max 𝐺 𝐿 𝑆 𝑓 (𝐺) Edge 𝑮 𝑺 𝒇 − is 1; node 𝑳 𝑺 𝒇 − 𝑮 is 𝑛 for all G. Edge 𝑮 𝑺 𝒇 △ is 𝑛−2; node 𝑳 𝑺 𝒇 △ 𝑮 is |E(G)|.

“Projections” on graphs of small degree The starting point of our algorithms is a simple observation that global sensitivity is much smaller if we restric our attention to bounded-degree graphs. Let 𝓖 = family of all graphs, 𝓖 𝑑 = family of graphs of degree ≤𝑑. Notation. 𝝏𝒇 = node 𝑮 𝑺 𝒇 over 𝓖. 𝝏 𝒅 𝒇 = node 𝑮 𝑺 𝒇 over 𝓖 𝑑 . Observation. 𝝏 𝒅 𝒇 is low for many useful 𝑓. Examples: 𝝏 𝒅 𝒇 − = 𝒅 (compare to 𝝏 𝒇 − = 𝒏) 𝝏 𝒅 𝒇 △ = 𝒅 𝟐 (compare to 𝝏 𝒇 △ = 𝒏 𝟐 ) Idea: ``Project’’ on graphs in 𝓖 𝑑 for a carefully chosen d << n. 𝓖 𝓖 𝑑 Goal: privacy for all graphs

Method 1: Lipschitz extensions Release 𝑓′ via GS framework [DMNS’06] Requires designing Lipschitz extension for each function 𝑓 we base ours on maximum flow and linear and convex programs A function 𝑓′ is a Lipschitz extension of 𝑓 from 𝓖 𝑑 to 𝓖 if 𝑓′ agrees with 𝑓 on 𝓖 𝑑 and 𝝏𝒇′ = 𝝏 𝒅 𝒇 𝓖 𝓖 𝑑 high 𝝏𝒇 𝝏𝒇′ = 𝝏 𝒅 𝒇 low 𝝏 𝒅 𝒇 𝒇 ′ =𝒇

Lipschitz extension of 𝒇 − : flow graph deg(v)/d, 1/1, deg(v’)/d For a graph G=(V, E), define flow graph of G: Add edge (𝑢,𝑣′) iff 𝑢,𝑣 ∈ 𝐸. 𝒗 𝐟𝐥𝐨𝐰 (G) is the value of the maximum flow in this graph. Lemma. 𝒗 𝐟𝐥𝐨𝐰 (G)/2 is a Lipschitz extension of 𝒇 − . 1 1 1' 𝑑 𝑑 2 2' s 3 3' t 4 4' 5 5'

Lipschitz extension of 𝒇 − : flow graph For a graph G=(V, E), define flow graph of G: Add edge (𝑢,𝑣′) iff 𝑢,𝑣 ∈ 𝐸. 𝒗 𝐟𝐥𝐨𝐰 (G) is the value of the maximum flow in this graph. Lemma. 𝒗 𝐟𝐥𝐨𝐰 (G)/2 is a Lipschitz extension of 𝒇 − . Proof: (1) 𝒗 𝐟𝐥𝐨𝐰 (G) = 𝟐𝒇 − (G) for all G∈ 𝓖 𝑑 (2) 𝝏 𝒗 𝐟𝐥𝐨𝐰 = 2⋅ 𝝏 𝒅 𝒇 − 1 1/ 1 1' deg 𝑣 / 𝑑 deg 𝑣 / 𝑑 2 2' s 3 3' t 4 4' 5 5'

Lipschitz extension of 𝒇 − : flow graph For a graph G=(V, E), define flow graph of G: 𝒗 𝐟𝐥𝐨𝐰 (G) is the value of the maximum flow in this graph. Lemma. 𝒗 𝐟𝐥𝐨𝐰 (G)/2 is a Lipschitz extension of 𝒇 − . Proof: (1) 𝒗 𝐟𝐥𝐨𝐰 (G) = 𝟐𝒇 − (G) for all G∈ 𝓖 𝑑 (2) 𝝏 𝒗 𝐟𝐥𝐨𝐰 = 2⋅ 𝝏 𝒅 𝒇 − = 2𝒅 1 1 1' 𝑑 𝑑 2 2' s 3 3' t 4 4' 𝑑 𝑑 5 5' 6 6'

Lipschitz extensions via linear/convex programs For a graph G=([n], E), define LP with variables 𝑥 𝑇 for all triangles 𝑇: 𝒗 𝐋𝐏 (G) is the value of LP. Lemma. 𝒗 𝐋𝐏 (G) is a Lipschitz extension of 𝒇 △ . Can be generalized to other counting queries Other queries use convex programs Maximize 0≤ 𝑥 𝑇 ≤1 for all triangles 𝑇 for all nodes 𝑣 𝑇=△ of 𝐺 𝑥 𝑇 𝑇:𝑣∈𝑉(𝑇) 𝑥 𝑇 ≤ 𝒅 𝟐 = 𝝏 𝒅 𝒇 △

Method 2: Generic reduction to privacy over 𝓖 𝑑 Input: Algorithm B that is node-DP over 𝓖 𝑑 Output: Algorithm A that is node-DP over 𝓖, has accuracy similar to B on “nice” graphs Time(A) = Time(B) + O(m+n) Reduction works for all functions 𝑓 How it works: Truncation T(G) outputs G with nodes of degree >𝑑 removed. Answer queries on T(G) instead of G 𝓖 𝓖 𝑑 high 𝝏𝒇 𝑻 low 𝝏 𝒅 𝒇 via Smooth Sensitivity framework [NRS’07] via finding a DP upper bound ℓ on 𝐿 𝑆 𝑇 (𝐺) [Dwork Lei 09, KRSY’11] and running any algorithm that is 𝝐 ℓ -node-DP over 𝓖 𝑑 G A T T(G) query f 𝒇(𝑻 𝑮 )+ noise( 𝑺 𝑻 𝑮 ⋅ 𝝏 𝒅 𝒇) S 𝑺 𝑻 (G)

Generic Reduction via Truncation Truncation T(G) removes nodes of degree >𝑑. On query 𝑓, answer A G =𝑓 𝑇 𝐺 +𝑛𝑜𝑖𝑠𝑒 How much noise? Local sensitivity of 𝑇 as a map 𝑔𝑟𝑎𝑝ℎ𝑠 →{𝑔𝑟𝑎𝑝ℎ𝑠} 𝑑𝑖𝑠𝑡 𝐺, 𝐺 ′ =# 𝑛𝑜𝑑𝑒 𝑐ℎ𝑎𝑛𝑔𝑒𝑠 𝑡𝑜 𝑔𝑜 𝑓𝑟𝑜𝑚 𝐺 𝑡𝑜 𝐺’ Lemma. 𝐿 𝑆 𝑇 𝐺 ≤1+max ( 𝑛 𝑑 , 𝑛 𝑑+1 ), where 𝑛 𝑖 = #{nodes of degree 𝑖}. Global sensitivity is too large. Frequency Nodes that determine 𝐿 𝑆 𝑇 (𝐺) … … d Degrees 𝐿 𝑆 𝑇 𝐺 = max 𝐺 ′ : 𝐧𝐞𝐢𝐠𝐡𝐛𝐨𝐫 of 𝐺 𝑑𝑖𝑠𝑡 𝑇 𝐺 ,𝑇 𝐺 ′

Smooth Sensitivity of Truncation Smooth Sensitivity Framework [NRS ‘07] 𝑺 𝒇 𝑮 is a smooth bound on local sensitivity of 𝑓 if 𝑺 𝒇 𝑮 ≥𝑳 𝑺 𝒇 (𝑮) 𝑺 𝒇 𝑮 ≤ 𝒆 𝝐 𝑺 𝒇 (𝑮′) for all neighbors 𝑮 and 𝑮′ Lemma. 𝑆 𝑇 𝐺 = max 𝑘≥0 𝑒 −𝜖𝑘 1+ 𝑖=𝑑− 𝑘+1 𝑑− 𝑘+1 𝑛 𝑖 is a smooth bound for 𝑻, computable in time 𝑂(𝑚+𝑛) “Chain rule”: 𝑺 𝑻 𝑮 ⋅ 𝝏 𝒅 𝒇 is a smooth bound for 𝒇∘𝑻 G A T T(G) query f 𝒇(𝑻 𝑮 )+ noise( 𝑺 𝑻 𝑮 ⋅ 𝝏 𝒅 𝒇) S 𝑺 𝑻 (G)

Utility of the Truncation Mechanism Lemma. ∀𝐺,𝑑 If we truncate to a random degree in 2𝑑,3𝑑 , 𝑬 𝑆 𝑇 𝐺 ≤( 𝑖=𝑑 𝑛−1 𝑛 𝑖 ) 3 log 𝑛 𝜖𝑑 + 1 𝜖 +1. Application to releasing the degree distribution: an 𝜖-node differentially private algorithm 𝐴 𝜖,𝛼 such that 𝐴 𝜖,𝛼 𝐺 −𝐷𝑒𝑔𝐷𝑖𝑠𝑡𝑟𝑖𝑏(𝐺) 1 =𝑜 1 with probability at least 2 3 if 𝐺 satisfies 𝛼-decay for 𝛼>2. Utility: If G is d-bounded, expected noise magnitude is 𝑂 𝜕 3𝑑 𝑓 𝜖 2 . G A T T(G) query f 𝒇(𝑻 𝑮 )+ noise( 𝑺 𝑻 𝑮 ⋅ 𝝏 𝒅 𝒇) S 𝑺 𝑻 (G)

Techniques used to obtain our results Node differentially private algorithms for releasing number of edges counts of small subgraphs (e.g., triangles, 𝒌-triangles, 𝒌-stars) degree distribution via Lipschitz extensions } via generic reduction

Conclusions It is possible to design node differentially private algorithms with good utility on sparse graphs One can first test whether the graph is sparse privately Directions for future work Node-private algorithm for releasing cuts Node-private synthetic graphs What are the right notions of privacy for graph data?