Collaborative, Privacy-Preserving Data Aggregation at Scale Michael J. Freedman Princeton University Joint work with: Benny Applebaum, Haakon Ringberg,

Slides:



Advertisements
Similar presentations
1 Cryptography: on the Hope for Privacy in a Digital World Omer Reingold VVeizmann and Harvard CRCS.
Advertisements

Quid-Pro-Quo-tocols Strengthening Semi-Honest Protocols with Dual Execution Yan Huang 1, Jonathan Katz 2, David Evans 1 1. University of Virginia 2. University.
Gate Evaluation Secret Sharing and Secure Two-Party Computation Vladimir Kolesnikov University of Toronto
Secure Evaluation of Multivariate Polynomials
Oblivious Branching Program Evaluation
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
Implementing Oblivious Transfer Using a Collection of Dense Trapdoor Permutations Iftach Haitner WEIZMANN INSTITUTE.
SPORC: Group Collaboration using Untrusted Cloud Resources Ariel J. Feldman, William P. Zeller, Michael J. Freedman, Edward W. Felten Published in OSDI’2010.
Spring 2000CS 4611 Security Outline Encryption Algorithms Authentication Protocols Message Integrity Protocols Key Distribution Firewalls.
ITIS 6200/ Secure multiparty computation – Alice has x, Bob has y, we want to calculate f(x, y) without disclosing the values – We can only do.
Rational Oblivious Transfer KARTIK NAYAK, XIONG FAN.
CS555Topic 241 Cryptography CS 555 Topic 24: Secure Function Evaluation.
1 Introduction CSE 5351: Introduction to cryptography Reading assignment: Chapter 1 of Katz & Lindell.
Introduction to Modern Cryptography, Lecture 12 Secure Multi-Party Computation.
Lect. 18: Cryptographic Protocols. 2 1.Cryptographic Protocols 2.Special Signatures 3.Secret Sharing and Threshold Cryptography 4.Zero-knowledge Proofs.
Efficient Private Techniques for Verifying Social Proximity Michael J. Freedman and Antonio Nicolosi Discussion by: A. Ziad Hatahet.
CMSC 414 Computer and Network Security Lecture 12 Jonathan Katz.
Reusable Anonymous Return Channels
LAAC: A Location-Aware Access Control Protocol YounSun Cho, Lichun Bao and Michael T. Goodrich IWUAC 2006.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
Building a Peer-to-Peer Anonymizing Network Layer Michael J. Freedman NYU Dept of Computer Science Public Design Workshop September 13,
CMSC 414 Computer and Network Security Lecture 21 Jonathan Katz.
The Case for Network-Layer, Peer-to-Peer Anonymization Michael J. Freedman Emil Sit, Josh Cates, Robert Morris MIT Lab for Computer Science IPTPS’02March.
Privacy-Preserving Cross-Domain Network Reachability Quantification
CMSC 414 Computer and Network Security Lecture 19 Jonathan Katz.
1 Introduction to Secure Computation Benny Pinkas HP Labs, Princeton.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Mitigating DoS Attacks against Broadcast Authentication in Wireless Sensor Networks Peng Ning, An Liu North Carolina State University and Wenliang Du Syracuse.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
PRIVACY-PRESERVING COLLABORATIVE NETWORK ANOMALY DETECTION Haakon Ringberg.
A Privacy-Preserving Interdomain Audit Framework Adam J. Lee Parisa Tabriz Nikita Borisov University of Illinois, Urbana-Champaign WPES 2006.
Privacy-Aware Personalization for Mobile Advertising
m-Privacy for Collaborative Data Publishing
Threshold PKC Shafi Goldwasser and Ran Canetti. Public Key Encryption [DH] A PKC consists of 3 PPT algorithms (G,E,D) - G(1 k ) outputs public key e,
A Firewall for Routers: Protecting Against Routing Misbehavior1 June 26, A Firewall for Routers: Protecting Against Routing Misbehavior Jia Wang.
Rushing Attacks and Defense in Wireless Ad Hoc Network Routing Protocols ► Acts as denial of service by disrupting the flow of data between a source and.
Presented by: Sanketh Beerabbi University of Central Florida.
On the Communication Complexity of SFE with Long Output Daniel Wichs (Northeastern) joint work with Pavel Hubáček.
1 Secure Multi-party Computation Minimizing Online Rounds Seung Geol Choi Columbia University Joint work with Ariel Elbaz(Columbia University) Tal Malkin(Columbia.
May 20, 2013 Anon-Pass: Practical Anonymous Subscriptions Michael Z. Lee †, Alan M. Dunn †, Jonathan Katz *, Brent Waters †, Emmett Witchel † † University.
Merkle trees Introduced by Ralph Merkle, 1979 An authentication scheme
CMSC 414 Computer and Network Security Lecture 20 Jonathan Katz.
Secure Computation (Lecture 2) Arpita Patra. Vishwaroop of MPC.
Secure Computation Lecture Arpita Patra. Recap >> Improving the complexity of GMW > Step I: Offline: O(n 2 c AND ) OTs; Online: i.t., no crypto.
New Client Puzzle Outsourcing Techniques for DoS Resistance Brent Waters, Stanford University Ari Juels, RSA Laboratories Alex Halderman, Princeton University.
Privacy Preserving Payments in Credit Networks By: Moreno-Sanchez et al from Saarland University Presented By: Cody Watson Some Slides Borrowed From NDSS’15.
1 Protecting Network Quality of Service against Denial of Service Attacks Douglas S. Reeves S. Felix Wu Chandru Sargor N. C. State University / MCNC October.
m-Privacy for Collaborative Data Publishing
Selective Packet Inspection to Detect DoS Flooding Using Software Defined Networking Author : Tommy Chin Jr., Xenia Mountrouidou, Xiangyang Li and Kaiqi.
Presented By Amarjit Datta
Efficient Private Matching and Set Intersection Mike Freedman, NYU Kobbi Nissim, MSR Benny Pinkas, HP Labs EUROCRYPT 2004.
Keyword search on encrypted data. Keyword search problem  Linux utility: grep  Information retrieval Basic operation Advanced operations – relevance.
Verifiable Threshold Secret Sharing and Full Fair Secure Two-party Computation YE Jian-wei March 7, 2009.
Verifiable Distributed Oblivious Transfer and Mobile-agent Security Speaker: Sheng Zhong (joint work with Yang Richard Yang) Yale University.
Cryptographic methods. Outline  Preliminary Assumptions Public-key encryption  Oblivious Transfer (OT)  Random share based methods  Homomorphic Encryption.
Multi-Party Computation r n parties: P 1,…,P n  P i has input s i  Parties want to compute f(s 1,…,s n ) together  P i doesn’t want any information.
Problem: Internet diagnostics and forensics
When small data is better data
Collaborative, Privacy-Preserving Data Aggregation at Scale
Anonymous Communication
Privacy Preserving Record Linkage
Secure Computation of Constant-Depth Circuits with Applications to Database Search Problems Omer Barkol Yuval Ishai Technion.
Anonymous Communication
Malicious-Secure Private Set Intersection via Dual Execution
Helen: Maliciously Secure Coopetitive Learning for Linear Models
Anonymous Communication
SPINE: Surveillance protection in the network Elements
Presentation transcript:

Collaborative, Privacy-Preserving Data Aggregation at Scale Michael J. Freedman Princeton University Joint work with: Benny Applebaum, Haakon Ringberg, Matthew Caesar, and Jennifer Rexford

Problem: Network Anomaly Detection

Collaborative anomaly detection Some attacks look like normal traffic – e.g., SQL-injection, application-level DoS [Srivatsa TWEB ‘08] Is it a DDoS attack or a flash crowd? [Jung WWW ‘02] Yahoo! Google Bing I’m not sure about Beasty! I’m not sure about Beasty! I’m not sure about Beasty! I’m not sure about Beasty! I’m not sure about Beasty! I’m not sure about Beasty!

Collaborative anomaly detection Targets (victims) could correlate attacks/attackers [Katti IMC ’05], [Allman Hotnets ‘06], [Kannan SRUTI ‘06], [Moore INFOC ‘03] Yahoo! Google Bing “Fool us once, shame on you. Fool us N times, shame on us.”

Problem: Network Anomaly Detection Solution: Aggregate suspect IPs from many ISPs Flag those IPs that appear > threshold τ

Problem: Distributed Ranking Solution: Collect domain statistics from many users Aggregate data by domain

Problem: … Solution: Aggregate (id, data) from many sources Analyze data grouped by id

But what about privacy? What inputs are submitted? Who submitted what?

Data Aggregation Problem Many participants, each with (key, value) observation Goal: Aggregate observations by key KeyValues k1k1 ( v a, v b ) k2k2 ( v i, v j, v k ) … knkn ( v x ) A A A

Data Aggregation Problem Many participants, each with (key, value) observation Goal: Aggregate observations by key KeyValues k1k1 ( v a, v b ) k2k2 ( v i, v j, v k ) … knkn ( v x ) A A A F ( ) ) ) PDA: Only release the value column CR-PDA:Plus keys whose values satisfy some func

Data Aggregation Problem Many participants, each with (key, value) observation Goal: Aggregate observations by key KeyValues k1k1 ( 1, 1 ) k2k2 ( 1, 1, 1 ) … knkn ( 1 ) Σ Σ Σ PDA: Only release the value column CR-PDA:Plus keys whose values satisfy some func ≥ τ ? ? ?

Goals Keyword privacy: No party learns anything about keys Participant privacy: No party learns who submitted what Efficiency: Scale to many participants, each with many inputs Flexibility: Support variety of computations over values Lack of coordination: – No synchrony required, individuals cannot prevent progress – All participants need not be online at same time

Potential solutions Approach Keyword Privacy Participant PrivacyEfficiencyFlexibility Lack of Coord Garbled Circuit Evaluation Multiparty Set Intersection YesYesVery PoorYesNo YesYesPoorNoNo Decentralized

SecurityEfficiency Weaken security assumptions? – Assume honest but curious participants? – Assume no collusion among malicious participants? In large/open setting, easy to operate multiple nodes (so-called “Sybil attack”)

Towards Centralization? DB Participants

Potential solutions Approach Keyword Privacy Participant PrivacyEfficiencyFlexibility Lack of Coord Garbled Circuit Evaluation Multiparty Set Intersection Hashing Inputs Network Anonymization YesYesVery PoorYesNo YesYesPoorNoNo NoNoVery GoodYesYes NoYesVery GoodYesYes Decentralized Centralized

Towards semi-centralization Participants ProxyDB Assumption: Proxy and DB do not collude

Potential solutions Approach Keyword Privacy Participant PrivacyEfficiencyFlexibility Lack of Coord Garbled Circuit Evaluation Multiparty Set Intersection Hashing Inputs Network Anonymization This Work YesYesVery PoorYesNo YesYesPoorNoNo NoNoVery GoodYesYes NoYesVery GoodYesYes YesYesGoodYesYes Decentralized Centralized

Privacy Guarantees Privacy of PDA against malicious entities and participants – Malicious participant may collude with either malicious proxy or DB, but not both – May violate correctness in almost arbitrary ways Privacy of CR-PDA against honest-but-curious entities and malicious participants

PDA Strawman #0 ParticipantProxyDB 1. Client sends input k k

PDA Strawman #1 ParticipantProxyDB 1. Client sends encrypted input k 2. Proxy batches and retransmits 3. DB decrypts input ds k# Violates keyword privacy E DB (k)

ds PDA Strawman #2 ParticipantProxyDB 1. Client sends hashes of k 2. Proxy batches and retransmits 3. DB decrypts input H (k)# H( ) 1 H( ) 9 Still violates keyword privacy: IPs drawn from small domains Still violates keyword privacy: IPs drawn from small domains E DB ( H (k) )

PDA Strawman #3 ParticipantProxyDB 5. Proxy recovers k from E PRX (k) 1. Client sends keyed hashes of k – Keyed hash function (PRF) – Key s known only by proxy F s (k)# F s ( ) 1 F s ( ) 9 E DB ( F s (k) ) But how do clients learn F s (IP)) ? But how do clients learn F s (IP)) ? Secret s

Our Basic PDA Protocol ParticipantProxyDB 1. Client sends keyed hashes of k – F s (x) learned by client through Oblivious PRF protocol 2.Proxy batches and retransmits keyed hash 3.DB decrypts input F s (k)# F s ( ) 1 F s ( ) 9 E DB ( F s (k) ) OPRF E DB ( F s (k) ) F s (k) Secret s

F s (k)# F s ( ) 1 F s ( ) 9 retransmits Basic CR-PDA Protocol ParticipantProxyDB 1.Client sends keyed hashes of k, and encrypted k for recovery 2.Proxy retransmits keyed hash 3.DB decrypts input 4.Identify rows to release and transmit E PRX (k) to proxy 5.Proxy decrypts k and releases E DB ( F s (k) ) F s (k) E DB (E PRX (k)) E PRX (k) F s (k)#Enc’d k F s ( ) 1E PRX ( ) F s ( ) 9E PRX ( ) Secret s

retransmits Privacy Properties ParticipantProxyDB Any coalition of HBC participants HBC coalition of proxy and participants HBC database E DB ( F s (k) ) F s (k) E DB (E PRX (k)) E PRX (k) Keyword privacy: Nothing learned about unreleased keys Participant privacy: Key  Participant not learned Secret s

retransmits Privacy Properties ParticipantProxyDB Any coalition of HBC participants HBC coalition of proxy and participants HBC database E DB ( F s (k) ) F s (k) E DB (E PRX (k)) E PRX (k) Keyword privacy: Nothing learned about unreleased keys Participant privacy: Key  Participant not learned Secret s malicious participants HBC coalition of DB and participants

retransmits More Robust PDA Protocol ParticipantProxyDB Any coalition of HBC participants HBC coalition of proxy and participants HBC database E DB ( F s (k) ) F s (k) E DB (E PRX (k)) E PRX (k) Secret s malicious participants HBC coalition of DB and participants ORPF  Encrypted OPRF Protocol Ciphertext re-randomization by proxy Proof by participant that submitted k’s match

Encrypted-OPRF protocol Problem: in basic OPRF protocol, participant learns F s (k) Encrypted-OPRF protocol: – Client learns blinded F s (k) – Client encrypts to DB – Proxy can unblind F s (k) “under the encryption” ( ) r Enc ( )( ) r F s (k) (π s i ) k i =1 El Gamalg mod p

Encrypted-OPRF protocol Problem: in basic OPRF protocol, participant learns F s (k) Encrypted-OPRF protocol – Client learns blinded F s (k) – Client encrypts to DB – Proxy can unblind F s (k) “under the encryption” OPRF runs OT protocol for each bit of input k OT protocols expensive, so use batch OT protocol [Ishai et al] ( ) r Enc ( )( ) r F s (k)

Scalable Protocol Architecture Participants Client-Facing Proxies Share secret s Proxy Decryption Oracles Share PRX key Front-End DB Tier Share DB key Back-End DB Storage Partition F s keyspace

Evaluation Scalable architecture implemented – Basic CR-PDA / PDA protocol + and encrypted-OPRF protocol w/ Batch OT – ~5000 lines of threaded C++, GnuPG for crypto Testbed of 2 GHz Linux machines AlgorithmParameterValue RSA / ElGamalkey size1024 bits Oblivious Transferk80 AESkey size256 bits

Throughput vs. participant batch size Single CPU core for DB and proxy each

Maximum throughput per server Four CPU cores for DB and proxy (each)

Throughput scalability Number CPU cores per DB and proxy (each)

Summary Privacy-Preserving Data Aggregation protects: – Participants: Do not reveal who submitted what – Keywords: Only reveal values / released keys Novel composition of crypto primitives – Based on assumption that 2+ known parties don’t collude Efficient implementation of architecture – Scales linearly with computing resources – Ex: Millions of suspected IPs in hours Of independent interest… – Introduced encrypted OPRF protocol – First implementation/validation of Batch OT protocol