Presentation is loading. Please wait.

Presentation is loading. Please wait.

Foundations of Privacy Lecture 7 Lecturer: Moni Naor.

Similar presentations


Presentation on theme: "Foundations of Privacy Lecture 7 Lecturer: Moni Naor."— Presentation transcript:

1 Foundations of Privacy Lecture 7 Lecturer: Moni Naor

2 Recap of last week’s lecture Counting Queries –Hardness Results –Tracing Traitors and hardness results for general (non synthetic) databases

3 Can We output Synthetic DB Efficiently? |C| |U| subpolypoly subpoly poly ?? ? Signatures Hard on Avg. Using PRFs Not in general General algorithm

4 General output sanitizers Theorem Traitor tracing schemes exist if and only if sanitizing is hard Tight connection between |U|, |C| hard to sanitize and key, ciphertext sizes in traitor tracing Separation between efficient/non-efficient sanitizers uses [BoSaWa] scheme

5 Traitor Tracing: The Problem Center transmits a message to a large group Some Users leak their keys to pirates Pirates construct a clone: unauthorized decryption devices Given a Pirate Box want to find who leaked the keys E(Content) K 1 K 3 K 8 Content Pirate Box Traitors ``privacy” is violated!

6 Need semantic security! Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup, Encrypt, Decrypt and Trace. Setup : generates a key bk for the broadcaster and N subscriber keys k 1,..., k N. Encrypt : given a bit b generates ciphertext using the broadcaster’s key bk. Decrypt : takes a given ciphertext and using any of the subscriber keys retrieves the original bit Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1,...,N} of a key k i used to create the pirate box

7 Simple Example of Tracing Traitor Let E K (m) be a good shared key encryption scheme Key generation: generate independent keys for E bk = k 1,..., k N Encrypt : for bit b generate independent ciphertexts E K 1 (b), E K 2 (b), … E K N (b) Decrypt : using k i : decrypt i th ciphertext Tracing algorithm: using hybrid argument Properties : ciphertext length N, key length 1.

8 Equivalence of TT and Hardness of Sanitizing Ciphertext Key Traitor Tracing Database entry Query Sanitizing hard TT PirateSanitizer for distribution of DBs (collection of)

9 Traitor Tracing ! Hard Sanitizing Theorem If exists TT scheme –cipher length c(n), –key length k(n), can construct: 1.Query set C of size ≈2 c(n) 2.Data universe U of size ≈2 k(n) 3.Distribution D on n -user databases w\ entries from U D is “ hard to sanitize ”: exists tracer that can extract an entry in D from any sanitizer’s output Separation between efficient/non-efficient sanitizers uses [BoSaWa06] scheme Violate its privacy!

10 Interactive Model Data Multiple queries, chosen adaptively ? query 1 query 2 Sanitizer

11 Counting Queries: answering queries interactively Counting-queries C is a set of predicates c: U  {0,1} Query : how many D participants satisfy c ? Relaxed accuracy: answer query within α additive error w.h.p Not so bad: error anyway inherent in statistical analysis Queries given one by one and should be answered on the fly. U Database D of size n Query c Interactive

12 Can we answer queries when not given in advance? Can always answer with independent noise –Limited to number of queries that is smaller than database size. We do not know the future but we do know the past ! –Can answer based on past answers

13 Idea: Maintain List of Possible Databases DStart with D 0 = list of all databases of size m –Gradually shrinks Each round j : D –if list D j-1 is representative : answer according to average database in list –Otherwise: prune the list to maintain consistency D D j-1 DDjDDj

14 Low sensitivity! DInitialize D 0 = { all databases of size m over U}. Input: x* DEach round D j-1 = {x 1, x 2, …} where x i of size m For each query c 1, c 2, …, c k in turn: DLet A j à Average i 2 D j-1 min{d cj (x*,x i ), T} DIf A j is small : answer according to median db in D j-1 –DD –D j à D j-1 If A j is large : Give true answer according to x* D –remove all db’s that are far away to get D j Noisy threshold Plus noise T ¼ n 1-  Need two threshold values: around T/2

15 Need to show Accuracy and functionality : The result is accurate DIf A j is large: many of x i 2 D j-1 are removed D D j is never empty Privacy: Not many large A j Can release identity of large rounds Can release noisy answers in large rounds. How much noise should we add? # of large rounds / 

16 The number of large rounds is bounded If A j is large, then must be many sets where the difference with real value is close to T –Assuming actual answer is close to true answer (with added noise): many sets are far from actual answer –Therefore many sets are pruned T0 Actual Threshold for small and large Threshold for far D Size of D 0 - ( ) |U| m Total number of rounds: m log|U| Constant fraction

17 Why is there a good x i Counting-queries C is a set of predicates c: U  {0,1} Query : how many D participants satisfy c ? The sample is assumed to be from x* since we start with all possible sets Existential proof – need not know c 1, c 2, …, c k in advance U Database x* of size n Query c Claim : Sample x of size m approximates x* on all given c 1, c 2, …, c k

18 Size m is Õ(n 2/3 log k) For any c 1, c 2, …, c k : There exist a set x of size m = Õ((n\α) 2· log k) s.t. max j dist c j (x,x*) ≤ α For α=Õ(n 2/3 log k), dist c j (x,x*) = |1/m h c j,x i -1/n h c j,x* i |

19 Why can we release when large rounds occur? Do not expect more than O(m log|U| ) large rounds Make the threshold noisy For every pair of neighboring databases: x* and x’* Consider vector of threshold noises –Of length k If a point is far away from threshold – same in both If close to threshold: can correct at cost –Cannot occur too frequently

20 Privacy of Identity of Large Rounds Protect: time of large rounds For every pair of neighboring databases: x* and x’* Can pair up noise thershold vectors  1  2  k-1  k  k+1  1  2  k-1  ’ k  k+1 For only a few O(m log|U|) points: x* is above threshold and and x’* below. Can correct threshold value  ’ k =  k +1 Prob ≈ e ε

21 Summary of Algorithms Three algorithms BLR DNRRV RR (with help from HR)

22 What if the data is dynamic? Want to handle situations where the data keeps changing –Not all data is available at the time of sanitization Curator/ Sanitizer

23 Google Flu Trends “ We've found that certain search terms are good indicators of flu activity. Google Flu Trends uses aggregated Google search data to estimate current flu activity around the world in near real- time.”

24 Example of Utility: Google Flu Trends

25 What if the data is dynamic? Want to handle situations where the data keeps changing –Not all data is available at the time of sanitization Issues When does the algorithm make an output? What does the adversary get to examine? How do we define an individual which we should protect? D+Me Efficiency measures of the sanitizer

26 Data Streams Data is a stream of items Sanitizer sees each item and updates internal state. Produces output: either on-the-fly or at the end state Sanitizer Data Stream output

27 Three new issues/concepts Continual Observation –The adversary gets to examine the output of the sanitizer all the time Pan Privacy –The adversary gets to examine the internal state of the sanitizer. Once? Several times? All the time? “User” vs. “Event” Level Protection –Are the items “singletons” or are they related

28 Randomized Response Randomized Response Technique [Warner 1965] –Method for polling stigmatizing questions –Idea: Lie with known probability. Specific answers are deniable Aggregate results are still valid The data is never stored “in the plain” 1 noise + 0 + 1 + … “trust no-one” Popular in DB literature Mishra and Sandler.

29 The Dynamic Privacy Zoo Differentially Private Continual Observation Pan Private User level Private User-Level Continual Observation Pan Private Petting Randomized Response

30 Continual Output Observation Data is a stream of items Sanitizer sees each item, updates internal state. Produces an output observable to the adversary state Output Sanitizer

31 Continual Observation Alg - algorithm working on a stream of data –Mapping prefixes of data streams to outputs –Step i output  i Alg is ε -differentially private against continual observation if for all –adjacent data streams S and S’ –for all prefixes t outputs  1  2 …  t Pr[Alg(S)=  1  2 …  t ] Pr[Alg(S’)=  1  2 …  t ] ≤ e ε ≈ 1+ ε e-ε ≤e-ε ≤ Adjacent data streams : can get from one to the other by changing one element S= acgtbxcde S’= acgtbycde

32 The Counter Problem 0/1 input stream 011001000100000011000000100101 Goal : a publicly observable counter, approximating the total number of 1 ’s so far Continual output: each time period, output total number of 1 ’s Want to hide individual increments while providing reasonable accuracy

33 Counters w. Continual Output Observation Data is a stream of 0/1 Sanitizer sees each x i, updates internal state. Produces a value observable to the adversary 100100110001 state 1112 Output Sanitizer

34 Counters w. Continual Output Observation Continual output: each time period, output total 1 ’s Initial idea: at each time period, on input x i 2 {0, 1} Update counter by input x i Add independent Laplace noise with magnitude 1/ε Privacy: since each increment protected by Laplace noise – differentially private whether x i is 0 or 1 Accuracy: noise cancels out, er ror Õ(√ T ) For sparse streams: this error too high. T – total number of time periods 0 12345-2-3-4

35 Why So Inaccurate? Operate essentially as in randomized response – No utilization of the state Problem: we do the same operations when the stream is sparse as when it is dense –Want to act differently when the stream is dense The times where the counter is updated are potential leakage

36 Main idea: update output value only when large gap between actual count and output Have a good way of outputting value of counter once: the actual counter + noise. Maintain Actual count A t (+ noise ) Current output out t (+ noise) Delayed Update s D – update threshold

37 Out t - current output A t - count since last update. D t - noisy threshold If A t – D t > fresh noise then Out t+1  Out t + A t + fresh noise A t+1  0 D t+1  D + fresh noise Noise: independent Laplace noise with magnitude 1/ε Accuracy: For threshold D : w.h.p update about N/D times Total error: (N/D) 1/2 noise + D + noise + noise Set D = N 1/3  accuracy ~ N 1/3 Delayed Output Counter delay

38 Privacy of Delayed Output Protect: update time and update value For any two adjacent sequences 101101110001 101101010001 Can pair up noise vectors  1  2  k-1  k  k+1  1  2  k-1  ’ k  k+1 Identical in all locations except one  ’ k =  k +1 Where first update after difference occurred Prob ≈ e ε Out t+1  Out t +A t + fresh noiseA t – D t > fresh noise, D t+1  D + fresh noise D t D’ t

39 Dynamic from Static Run many accumulators in parallel: –each accumulator: counts number of 1's in a fixed segment of time plus noise. –Value of the output counter at any point in time : sum of the accumulators of few segments Accuracy: depends on number of segments in summation and the accuracy of accumulators Privacy: depends on the number of accumulators that a point influences Accumulator measured when stream is in the time frame Only finished segments used xtxt Idea: apply conversion of static algorithms into dynamic ones Bentley-Saxe 1980

40 The Segment Construction Based on the bit representation: Each point t is in d log t e segments  i=1 t x i - Sum of at most log t accumulators By setting  ’ ¼  / log T can get the desired privacy Accuracy: With all but negligible in T probability the error at every step t is at most O (( log 1.5 T)/  )). canceling

41 Synthetic Counter Can make the counter synthetic Monotone Each round counter goes up by at most 1 Apply to any monotone function

42 The Dynamic Privacy Zoo Differentially Private Outputs Privacy under Continual Observation Pan Privacy User level Privacy Continual Pan Privacy Petting Sketch vs. Stream


Download ppt "Foundations of Privacy Lecture 7 Lecturer: Moni Naor."

Similar presentations


Ads by Google