Privacy-Preserving Data Aggregation without Secure Channel: Multivariate Polynomial Evaluation Taeho Jung 1, XuFei Mao 2, Xiang-Yang Li 1, Shao-Jie Tang 1, Wei Gong 2, Lan Zhang 2 Illinois Institute of Technology, Chicago 1 Tsinghua University, Beijing 2 1
Motivation Calculating average salary of a company? 2
Motivation Calculating average salary of a company? Getting a global behavioral feature of a group? 3
Motivation Calculating average salary of a company? Getting a global behavioral feature of a group? Analyze statistics on sensitive individual data? – Personalized ad – Medical statistics 4
Motivation Calculating average salary of a company? Getting a global behavioral feature of a group? Analyze statistics on sensitive individual data? 5 Privacy-preserving data mining is needed!
Problem description 6 x1 x2 without disclosing x i to each other.
Adversaries Semi-honest (or passive) adversary: Correctly follows the protocol specification and do not collude with each other, yet attempts to learn additional information by eavesdropping/analyzing the messages.
Approaches Cryptographic approaches – SMC Change the data precision – Coarse grained values Change the data accuracy by perturbation – Value distortion Data separation 8
Related Work: SMC 9 High Complexity & Frequent Interactions 1987 G,M,W
Garbled Circuit Andrew C. Yao 1986
Oblivious Transfer 11 Shimon Even, Oded Goldreich, and Abraham Lempel 1985
Randomized Approach: add noise Original values x 1, x 2,..., x n – from probability distribution X (unknown) To hide these values, we use y 1, y 2,..., y n – from probability distribution Y (known) Given – x 1 +y 1, x 2 +y 2,..., x n +y n – the probability distribution of Y Estimate the probability distribution of X.
All implemented in secure channel Efficient Alternates: Data separation There are some existing works not using SMC Clifton et al. etc … 13 X_11X_12X_13X_14 X_21X_22X_23X_24 X_31X_32X_33X_34 X_41X_42X_43X_44 x1 y1 x2 x3 x4
Our Contributions Unsecured channel: Our communication channels are open to anyone, and we can still achieve privacy and security. Low computation overhead: Run time (computation only) is times less than SMC. 14
Our solution in a nutshell Polynomial = Multiplications ( * ) & Additions (+) Inspired by the observation : Multi-party Product & Sum calculation protocols Design two novel protocols Fast & light, secure in any insecure channel Aggregator can be untrusted Advantages : 15
Product Protocol 16 Integers, modulo P
Sum Protocol 17 Use product protocol
Put All Together Combine product and sum protocols to achieve general multivariate polynomial operation: Provable privacy preserving – Entropy, hardness 18
Run time comparison 19 Gates Run time (ms) additions in our schemes are equivalent to a 1066-gate circuit. FairplayMP by Ben et al. (SMC implementation) Our run time : 72.2 microseconds.
Conclusion & Future Work Privacy-Preserving Data Aggregation – Product Protocol – Sum Protocol – Can be used for privacy-preserving computation & data mining – Efficient & non-approximate Future Work – Minimizing information leakage – Defend against collusion attack 20
21