I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011
Smart Metering Electricity suppliers are deploying smart meters – that report energy consumption periodically (every minutes). Should improve energy management (for suppliers and customers) … Part of the Smart Grid (Critical Infrastructure)
Privacy?
Hoover Microwave Kettle Fridge Lighting
Motivation: Privacy/Security Potential threats – Profiling Increase in the granular collection, use and disclosure of personal energy information; Data linkage of personally identifiable information with energy use; Creation of an entirely new "library" of personal information – Security Is someone at home? We want to prevent – Suppliers from profiling customers – Attackers from getting private information
Contributions First provably private scheme for smart metering – No need for trusted aggregator – No assumptions about the adversary’s power (knowledge) – Remains useful for the supplier – Robust against node failures!! – Secure against colluding malicious users Validated by simulations – a new simulator to generate synthetic consumption data
Overview Model – Adversary model – Network model – Privacy model Our scheme: Distributed aggregation with encryption Performance and privacy analysis Conlusions
Model Dishonest-but-non-intrusive adversary – does not follow the protocol correctly – collude with malicious users – BUT: cannot access the distribution network (like to install wiretapping devices) Network model – No communication between meters! – Each meter has a public/private key pair Privacy model – Differential privacy model
Why Differential Privacy? There are different possible models (k- anonymity, l-diversity, …) We are using the Differential Privacy model – Only model that does not make any assumptions about the attacker model – Proposes a simple off-the-shelf sanitization technique – Strong (too strong?) and provides provable privacy!
The Differential Privacy Model Informally, a sanitization algorithm A is differentially private if its output is insensitive to changes in any individual value Definition: A is ε -differential private if given 2 datasets (set of traces) I and I’ differing in only one user, and any output x, then: First model that provides provable privacy! …and make no assumptions about the adversary! Very strong (too strong?)
Sanitization It was shown that a simple solution is to add noise to each sample in each slot such that: It can be shown that if: 1.noise follows a Laplacian distribution 2.where is the scale parameter of the laplace distribution, and Δ is the sensitivity (i.e. maximum value a sample can take) Then is ε -private in each slot
Sanitization: Example Time slotOriginal (Wh)Max user (Wh) Noised data (Wh) Lap(1350/0.1) Lap(1350/0.1) Lap(1350/0.1) Lap(1350/0.1) (sum over 4 slots) (over 4 slots)
Aggregating Data Electricity Supplier Aggregator Supplier gets (noisy) aggregated value but can’t recover individual sample!
Error/utility The larger the cluster, the better the utility…but the smaller the granularity
Noised Aggregated Data: Sum of N samples + Lapl. noise N=200 N=600
Aggregating Data Pros/Cons Pros: – Great solution to reduce noise/error – … and still generate useful (aggregated) data to the supplier – …with strict privacy guarantees. Cons: – Aggregators have to be trusted ! – Who can be the aggregator? Supplier? Network? Can we get ride of the aggregator and still perform aggregation??
Distributed Aggregation Electricity Supplier
Our Approach: Distributed Aggregation Step 1: Distributed noise generation – We use the fact that a Laplacian noise can be generated as a sum of Gamma noises – Each node adds to its sample and sends result to the supplier – When noised samples are aggregated by the supplier, the noise gets added to a Laplacian noise… – No more aggregator needed!
Problem: original data:gamma noised data: The added gamma noise is too small to guarantee privacy of individual measurements! The supplier can possibly retrieve sample value from noised samples!
Step 2: Encrypting noised samples Electricity Supplier
Performance and privacy analysis A new trace generator Error depending on the number of users Privacy over multiple slots – Privacy of appliance usages and different activities (cooking, watching TV, …) – Privacy of being home
Trace generation
Error and the number of users ε over a single slot!
Privacy of appliances Noise is added to guarantee ε=1 per slot = error is 0.17 with 100 users
Privacy of the simultanous usage of active appliances (Are you at home?) ε 0.17 error for 100 users (ε=1 per slot)
Privacy of the simultanous usage of all appliances ε 0.17 error for 100 users (ε=1 per slot)
Conclusion First practical scheme that provides formal privacy and utility guarantees… Our scheme uses aggregation + noise Validation based on realistic datasets (generated by simulator) We can guarantee meaningful privacy for some activities (or appliances) but cannot hide everything! Privacy can be increased by adding more noise but we have to add more users to ensure low error!
Encryption Modulo-addition based: where k i is not known to the supplier where
Key generation Each node pair shares a symmetric key Each node randomly picks x other nodes such that if v selects w then w also selects v. Example for two nodes: 1.v selects w (and w selects v) if: 1.v and w generate the encryption key: 1.v supplier: 2.w supplier: Supplier decrypts by adding the ciphertexts:
Security analysis misbehaving users: – supplier can deploy fake meters ( α fraction of N nodes) or some users collude with the supplier and omit adding noise – each user adds extra noise to tolerate this attack… supplier lies about the cluster size … see report for proofs/details
Error and the number of misbehaving users ( ε =1 per slot)
Why aggregation is not enough? Why noise has to be added? Because we don’t make any assumption about the adversary model…. – E.g., if he knows (N-1) values, it can get the N th value… even with aggregation and encryption – But can’t get any info about Nth value if noise is added ;-) – Very strong guarantee!
Laplace Distribution
Privacy over multiple slots Composition property of diff. privacy: If we have ε 1 and ε 2 privacy in two different slots, then we have ε 1 +ε 2 privacy over the two slots Note ε =1 is an upper bound (for all users) in each slot! The exact bound by adding if we have consumption c(t) Over multiple slots:
Example
Differential Privacy Model: interpretation I or I’ Was the input I or I’ ??? Similar idea than indistinguishability in crypto…. If ε = 1: If ε = 0.5: If ε = 0.1: