Download presentation
Presentation is loading. Please wait.
Published byAda Walsh Modified over 9 years ago
1
Susan B. Davidson U. Penn Sanjeev Khanna U. Penn Tova MiloTel-Aviv U. Debmalya Panigrahi MIT Sudeepa Roy U. Penn Provenance Views for Module Privacy
2
Data-oriented Workflows Must Be Secure Discrete Secure Ref. Tova Milo’s keynote, PODS 2011 2Provenance Views for Module PrivacyPODS 2011
3
Split Entries Align Sequences Functional Data Curate Annotations Format-2 Format-1 Format-3 Construct Trees In an execution of the workflow, data (values) appear on the edges TGCC GTGT GGCT AAAT CTGT GC … CTAA ATGT CTGT GC… GGCT AAAT GTCT G TGCC GTGT GGCG TC… ATCC GTGT GGCT.. d1d1 d2d2 d3d3 d4d4 d5d5 d6d6 d7d7 3 PODS 2011Provenance Views for Module Privacy Workflows Vertices = Modules/Programs Edges = Dataflow
4
Biologist’s workspace Which sequences have been used to produce this tree? How has this tree been generated? ? s Split Entries Align Sequences Functional DataCurate Annotations Format Construct Trees t 4PODS 2011Provenance Views for Module Privacy Need for Provenance TGCC GTGT GGCT AAAT CTGT GC … CTAA ATGT CTGT GC… GGCT AAAT GTCT G TGCC GTGT GGCG TC… ATCC GTGT GGCT.. ? ? ? Enable sharing and reuse Ensure repeatability and debugging
5
Need for Provenance Need for Privacy s Split Entries Align Sequences Functional DataCurate Annotations Format Construct Trees t Workflow OWNER Workflow USER How has this result been produced? All data values My data is sensitive! My module is proprietary! The flow/structure should not be revealed! 5PODS 2011Provenance Views for Module Privacy … TGC C… ATG GCC
6
Module Privacy Module f takes input x, produces output y = f( x ) User should not be able to guess ( x, f( x )) pairs with high probability (over any number of executions) Output value f( x ) is private, not the algorithm for f 6PODS 2011Provenance Views for Module Privacy Module f x1x1 x2x2 x3x3 x4x4 y1y1 y2y2 y3y3 f(x 1, x 2, x 3, x 4 ) =
7
Module Privacy: Motivation Medical Record of patient P x = x’ = f(x) = Does P have AIDS? Process Record f = Check for AIDS Check for Cancer Create Report report Does P have cancer? Patient P’s concern: Whether P has AIDS should not be inferred given his medical record Module owner’s concern: No one should be able to simulate the module and use it elsewhere 7PODS 2011Provenance Views for Module Privacy
8
Module Privacy in a Workflow Private Modules (no a priori knowledge to the user) o Module for AIDS detection Public Modules (full knowledge to the user) Sorting, reformatting modules 8PODS 2011Provenance Views for Module Privacy a7a7 a6a6 m1m1 m2m2 m3m3 a1a1 a3a3 a2a2 a4a4 a5a5 Data Sharing n modules are connected as DAG Private module f, input x, f( x ) should not be revealed
9
Module Privacy with Secure View Privacy Definition: L-diversity [MGKV’ 06] By hiding some input/output attributes, each x has L different equivalent possibilities for f( x ) Output view is called a ‘Secure-view’ Differential privacy? [Dwork’ 06, DMNS’ 06, …] (Usual) Random noise cannot be added Scientific experiments must be repeatable Any f should always map any x to the same f( x ) 9PODS 2011Provenance Views for Module Privacy
10
A view: Projection of R on visible attributes Privacy parameter Γ (eg. Γ = 2) Γ -standalone-private View: every input x can be mapped to Γ different outputs by the “possible worlds” Possible World: Relation that agrees with R on visible attributes (and respects the functional dependency) y = (x 1 x 2 ) y = (x 1 ≠ x 2 ) Standalone Module Privacy x1x1 x2x2 y InputOutput Module f 10PODS 2011Provenance Views for Module Privacy x1x1 x2x2 y 000 011 101 110 x1x1 x2x2 y 000 011 101 110 x1x1 x2x2 y 000 011 101 110 x1x1 x2x2 y 000 011 101 111 x1x1 x2x2 y 000 011 101 111 Relation R for f Functional dependency: x 1, x 2 y
11
A view: Same as before Γ -workflow-private view: privacy for each private module as before Possible world: Relation that agrees with R on visible attributes (and respects ALL functional dependencies) Workflow Module Privacy Relation R has n func. dependencies a7a7 a6a6 m1m1 m2m2 m3m3 a1a1 a3a3 a2a2 a4a4 a5a5 Workflow W 1. a 1, a 2 a 3, a 4, a 5 2. a 3, a 4 a 6 3. a 4, a 5 a 7 11PODS 2011Provenance Views for Module Privacy
12
Secure-View Optimization Problem Conflicting interests of Owner and User Hiding each data/attribute has a cost 12PODS 2011Provenance Views for Module Privacy User: Provenance Owner: Privacy Secure-view problem: Minimize the sum of the cost of the hidden attributes while guaranteeing Γ-workflow-privacy of all private modules
13
Let’s start with a Single Module 13PODS 2011Provenance Views for Module Privacy PROBLEM-1 V (Visible attributes) V is safe? PROBLEM-2 V V is safe? ORACLE A safe subset V* with minimum cost PROBLEM-1 Communication Complexity: (N), N = #rows in R o R is given explicitly Computation Complexity: Co-NP-hard in k = #attributes of R o R is given succinctly PROBLEM-2 Communication Complexity: 2 (k) oracle calls are needed How hard is the secure-view problem for a standalone module?
14
Any Upper Bound? The trivial brute-force algorithm solves the problem in time O(2 k N 2 ) k = #attributes of R, N = #rows of R Can return ALL safe subsets: useful for the next step Not so bad: k is not too large for a single module A module is reused in many workflows Expert knowledge from the module designers can be used to speed up the process 14PODS 2011Provenance Views for Module Privacy
15
Moving on to General Workflows Workflows have Arbitrary data sharing, arbitrary (DAG) connection Interactions between private and public modules Trivial algorithms are not good Leads to running time = exponential in n We use the (list of) standalone safe subsets for private modules First consider: Workflows with all private modules Two Steps: 1. Show that, any combination of safe-subsets for standalone privacy is also safe for workflow privacy (Composability) 2. Find the minimum cost safe subset for workflow (Optimization) 15PODS 2011Provenance Views for Module Privacy
16
Composability Key idea: When a module m is placed in a workflow, and the same attribute subset V is hidden, #possible worlds shrinks but not #possible outputs of the inputs Proof involves showing existence of a possible world “All-private workflow” assumption is necessary 16PODS 2011Provenance Views for Module Privacy
17
Optimally Combining Standalone Solutions Any combination of safe subsets works We want one with minimum cost Solve the optimization problem for workflow given the list of options for each individual module The simplest version (no data sharing) is NP-hard In the paper: Approximation and matching hardness results of different versions Bounded data sharing has better approximation ratio 17PODS 2011Provenance Views for Module Privacy
18
Workflows with Public Modules Public modules are difficult to handle Composability does not work f 1 (x) = y f 2 (y) = y Public Private Solution: Privatize some public modules Names of “privatized” modules are not revealed Now composability works Privatization has an additional cost Worse approximation results 18PODS 2011Provenance Views for Module Privacy
19
Related Work Workflow privacy (mainly access control) Chebotko et. al. ’08, Gil et. al. ’07, ’10 Secure provenance Braun et. al. ’08, Hasan et. al. ’07, Lyle-Martin ’10 Privacy-preserving data mining Surveys by Aggarwal-Yu ’08, Verykios et. al. ’04 Privacy in statistical databases Survey by Dwork ’08 19PODS 2011Provenance Views for Module Privacy
20
Conclusion and Future Work This is a first step to handling module privacy in a network of modules Future Directions: 1. Explore alternative notion of privacy/partial background knowledge 2. Explore alternative “privatization” techniques for public modules 3. Handle infinite/very large domains of attributes 20PODS 2011Provenance Views for Module Privacy
21
Thank You. Questions? 21PODS 2011Provenance Views for Module Privacy
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.