1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University.

Slides:



Advertisements
Similar presentations
Costas Busch Louisiana State University CCW08. Becomes an issue when designing algorithms The output of the algorithms may affect the energy efficiency.
Advertisements

1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
The End of Anonymity Vitaly Shmatikov. Tastes and Purchases slide 2.
Authors Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, Abraham Flaxman Presented by: Jonathan di Costanzo & Muhammad Atif Qureshi 1.
Spectrum Based RLA Detection Spectral property : the eigenvector entries for the attacking nodes,, has the normal distribution with mean and variance bounded.
Xiaowei Ying Xintao Wu Univ. of North Carolina at Charlotte 2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada Graph Generation with Prescribed.
Emergence of Scaling in Random Networks Barabasi & Albert Science, 1999 Routing map of the internet
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Data Structures Heaps and Graphs i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst, Brian Hayes, or Glenn Brookshear.
1 Preserving Privacy in Collaborative Filtering through Distributed Aggregation of Offline Profiles The 3rd ACM Conference on Recommender Systems, New.
Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Malicious parties may employ (a) structure-based or (b) label-based attacks to re-identify users and thus learn sensitive information about their rating.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
PageRank Identifying key users in social networks Student : Ivan Todorović, 3231/2014 Mentor : Prof. Dr Veljko Milutinović.
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
Structure based Data De-anonymization of Social Networks and Mobility Traces Shouling Ji, Weiqing Li, and Raheem Beyah Georgia Institute of Technology.
TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION.
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Models of Influence in Online Social Networks
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Privacy and trust in social network
Private Analysis of Graphs
Preserving Link Privacy in Social Network Based Systems Prateek Mittal University of California, Berkeley Charalampos Papamanthou.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
Information Flow using Edge Stress Factor Communities Extraction from Graphs Implied by an Instant Messages Corpus Franco Salvetti University of Colorado.
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Protecting Sensitive Labels in Social Network Data Anonymization.
Resisting Structural Re-identification in Anonymized Social Networks Michael Hay, Gerome Miklau, David Jensen, Don Towsley, Philipp Weis University of.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Evaluating Network Security with Two-Layer Attack Graphs Anming Xie Zhuhua Cai Cong Tang Jianbin Hu Zhong Chen ACSAC (Dec., 2009) 2010/6/151.
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Anonymized Social Networks, Hidden Patterns, and Structural Stenography Lars Backstrom, Cynthia Dwork, Jon Kleinberg WWW 2007 – Best Paper.
A Whirlwind Tour of Differential Privacy
Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Privacy Preserving in Social Network Based System PRENTER: YI LIANG.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
A Viewpoint-based Approach for Interaction Graph Analysis
Private Data Management with Verification
Antonis Papadimitriou, Arjun Narayan, Andreas Haeberlen
Techniques for Achieving Vertex Level Differential Privacy
Privacy-preserving Release of Statistics: Differential Privacy
E-Commerce Theories & Practices
Graph Analysis with Node Differential Privacy
Differential Privacy in Practice
i206: Lecture 14: Heaps, Graphs intro.
Differential Privacy and Statistical Inference: A TCS Perspective
Ruth Anderson UW CSE 160 Winter 2017
Ruth Anderson UW CSE 160 Spring 2018
Michael Ernst CSE 140 University of Washington
Ruth Anderson UW CSE 160 Winter 2016
Social Network Analysis with Apache Spark and Neo4J
Ruth Anderson CSE 160 University of Washington
Differentially Private Analysis of Graphs and Social Networks
Published in: IEEE Transactions on Industrial Informatics
Ruth Anderson CSE 140 University of Washington
Presentation transcript:

1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

Graphs and networks 2 Image source: Nykamp DQ, “An introduction to networks.” From Math Insight. Nykamp DQhttp://mathinsight.org/network_introduction Many types of data can be represented as graphs, where nodes represent individuals and edges capture relationships.

Potentially sensitive information in graphs Social, romantic and sexual relationships “Friendships” in an online social network Financial transactions Phone calls and communication Doctor-patient relationships 3 Source: Christakis, Fowler. The Spread of Obesity in a Large Social Network over 32 Years. N Engl J Med 2007; 357: Source: B. Aven. The effects of corruption on organizational networks and individual behavior. MIT workshop: Information and Decision in Social Networks, 2011.

Two conflicting goals Privacy: protecting information of individuals. Utility: drawing accurate conclusions about aggregate information. 4 PrivacyUtility

5 False dichotomy: personally identifying vs. non-personally identifying information. Links and any other information about individual can be used for de-anonymization. ``Anonymized’’ graphs still pose privacy risk Bearman, Moody, Stovel. Chains of affection: The structure of adolescent romantic and sexual networks, American J. Sociology, 2008 In a typical real-life network, many nodes have unique neighborhoods.

Some published de-anonymization attacks 6 –Movie ratings [Narayanan, Shmatikov 08] De-identified Netflix users based on information from a public movie database IMDb. –Social networks [Backstrom, Dwork, Kleinberg 07; Narayanan, Shmatikov 09; Narayanan, Shi, Rubinstein 12] Re-identified users in an online social network (anonymized Twitter) based information from a public online social network (Flickr). –Computer networks [Coull, Wright, Monrose, Collins, Reiter 07; Ribeiro, Chen, Miklau, Townsley 08,…] Can reidentify individuals based on external sources. Movies People

Government agency for surveillance. A phisher/spammer to write a personalized message. Health insurance provider to check preexisting conditions. Marketers to focus advertising on influential nodes. Stalkers, nosy neighbors, colleagues, or employers. Who’d want to de-anonymize a social network graph? 7 image sources: Andrew Joyner,

8 What information can be released without violating privacy?

Differential privacy (for graph data) Graph G 9 image source Algorithm Data processing output Data release

Two variants of differential privacy for graphs Edge differential privacy Two graphs are neighbors if they differ in one edge. Node differential privacy Two graphs are neighbors if one can be obtained from the other by deleting a node and its adjacent edges. 10 G:

Differential privacy (for graph data) Graph G 11 image source Algorithm Data processing output Data release

Some useful properties of differential privacy 12

Is differential privacy too strong? No weaker notion has been proposed that satisfies all three useful properties. We can actually attain it for many useful statistics! 13

14 What graph statistics can be computed accurately with differential privacy?

Graph statistics 15 … … Fraction of nodes of degree d Degree d … … The degree of a node is the number of connections it has.

Tools used in differentially private graph algorithms Smooth sensitivit y –A more nuanced notion of sensitivity than the one mentioned in the previous talk Sample and aggregate Maximum flow Linear and convex programming Random projections Iterative updates Postprocessing 16

17 Differentially private graph analysis A taste of techniques

Basic question: how to compute a statistic f Graph G 18 image source Algorithm Data processing Data release

19 Challenge for node privacy: high sensitivity

20 Challenge for node privacy: high sensitivity

21 Idea: project onto graphs with low sensitivity. [Kasiviswanathan Nissim Raskhodnikova Smith 13] See also [Blocki Blum Datta Sheffet 13, Chen Zhou 13]

22 “Projections” on graphs of small degree All graphs

Lipschitz extensions 23 All graphs

Summary Accurate subgraph counts for realistic graphs can be computed by node-private algorithms –Use Lipschitz extensions and linear programming –It is one example of many graph statistics that node-private algorithms do well on. 24

What can’t be computed differentially privately? Differential privacy explicitly excludes the possibility of computing anything that depends on one person’s data: –Is there a node in the graph that has atypical connections? –``suspicious communication patterns’’? 25

What we are working on Node differentially private algorithms for releasing –a large number of graph statistics at once –synthetic graphs Exciting area of research: –Edge-private algorithms [Nissim, Raskhodnikova, Smith 07; Hay, Rastogi, Miklau, Suciu 09; Hay, Li, Miklau, Jensen 09; Hardt, Rothblum 10; Karwa, Raskhodnikova, Smith, Yaroslavtsev 11; Karwa, Slavkovic 12; Blocki, Blum, Datta, Sheffet 12; Gupta, Roth, Ullman 12; Mir, Wright 12; Kifer, Lin 13, …] –Node-private algorithms [Gehrke Lui Pass 12; Blocki Blum Datta Sheffet 13, Kasiviswanathan Nissim Raskhodnikova Smith 13, Chen Zhou 13, Raskhodnikova Smith,..] 26

Conclusions We are close to having edge-private and node-private algorithms that work well in practice for many basic graph statistics. Accurate node-private algorithms were thought to be impossible only a few years ago. Differential privacy is influencing other scientific disciplines –Next talk: reducing false discovery rate. 27

Experiments for the flow and LP method [Lu] 28 Graph# nodes# edgesMax degree Time, secs # edges CA-GrQc5,24228, CA-HepTh9,87751, CA-AstroPh18,772396, ,222 com-dblp-ungraph317,0802,099, com-youtube-ungraph1,134,8905,975,24828,754994