Presentation is loading. Please wait.

Presentation is loading. Please wait.

PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.

Similar presentations


Presentation on theme: "PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard."— Presentation transcript:

1 PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences 1

2 Outline 2  Introduction  Background  Provenance System Property  Architecture & Protocol  Evaluation  Conclusion & Comment

3 Introduction 3  Problem to Solve  Implement a provenance aware storage system in current cloud stores ( use Amazon )

4 Background(1/3) 4  Provenance  Data has two critical components What it is ( contents ) Where it came from ( ancestry )  The provenance is the description of how the object was derived.  The metadata that describes the history of an object  Why use provenance?  Use case – Slogan Digital Sky Survey (SDSS) Debug Experimental Results Detect and Avoid Faulty Data Propagation Improving Text Search Result  Security

5 5

6 Background(2/3) 6  Provenance can be abstract defined as a directed acyclic graph ( DAG )  Nodes objects : files, processes, tuples, data sets, etc Have attributes Command line arguments Name and Version number  Edges Indicate a dependency between the objects

7 7 Justification Report is justified by is response to is caused by is response to is based on is caused by Data Collection Request I1 Blood Test Request I2 Donor Data Request I4 Donation Decision I9 Blood Test Request I6 Decision Request I8 Blood Test Result I7 Donor Data I5 Patient Brain Death Notification I3

8 Background(3/3) 8  Eventual Consistency  A weaker form of data consistency  During a sufficient long period of time, and no updates are sent, we can expect that all replicas in system will be consistent

9 Provenance System Property(1/2) 9  Provenance Data Coupling  An object and its provenance must match  The provenance must accurately and completely describe the data  Multi-object Causal Ordering  The causal relationship among objects  A system must ensure that an object’s ancestors and their provenance are persistent before making the object itself persistent

10 10 Justification Report is justified by is response to is caused by is response to is based on is caused by Data Collection Request I1 Blood Test Request I2 Donor Data Request I4 Donation Decision I9 Blood Test Request I6 Decision Request I8 Blood Test Result I7 Donor Data I5 Patient Brain Death Notification I3

11 Provenance System Property(2/2) 11  Data Independent Persistence  Ensure a system retain an object’s provenance, even if the object is removed  Efficient Query  Be accessible to users who want to access or verify provenance properties of their data

12 Architecture(1) 12

13 Architecture(2) – S3 13  Simple Storage Service(S3)  Amazon’s storage service  An object store where the size of objects can range from 1 byte to 5GB  With each objects, clients can store up to 2KB of metadata  Use SOAP or REST API PUT, GET, HEAD, COPY, DELETE

14 Architecture(3) - SimpleDB 14  SimpleDB  An Amazon’s service that provides the functionality of indexing and querying data  Data model consist items that are described by pairs  Each item can have 256 pairs  Each attribute name and value can be as large as 1KB

15 Architecture(4) - SQS 15  Simple Queueing Service  Distributed messaging system that allows users to exchange messages between various distributed components in their systems  8KB limit of the size of the message  In this paper, SQS is used as a write-ahead log(WAL)

16 Architecture(5) -- PASS 16  Provenance-Aware Storage System  A storage system that automatically collects, stores., manages, and provides search for provenance  Monitor system calls  Generate provenance and sending both provenance and data to PA-S3fs

17 Architecture(6) – PA-S3fs 17  Provenance Aware S3 File System  Caches data and provenance on the client to reduce traffic to S3  Send data and provenance to the cloud

18 Protocol(1) 18

19 Protocol(2) 19  Protocol 1 ( P1 )  Standalone Cloud Store  Map each file to an S3 object and store the provenance as a separate S3 object  Provenance object Named with a uuid Contain the name of primary object  Primary object metadata Version number and uuid

20 Protocol(3) 20  P1 does not support data coupling  But can detect decoupling  Query is inefficient  Need retrieve all provenance Client PUT:Provenance OK PUT:Data OK S3

21 Protocol(4) 21

22 Protocol(5) 22  Protocol 2 ( P2 )  Cloud store with a cloud database  Store provenance as one SimpleDB item  If item is larger than 1KB SimpleDB limit store provenance as S3 object save the pointer in attribute-value

23 Protocol(6) 23  Provide efficient provenance queries  Does not support data coupling Client PUT: Prov > 1KB OK PUT:Data OK S3 SimpleDB OK BatchPUTAttributes: Prov

24 Protocol(7) 24  Protocol 3 ( P3 )  Cloud store with Cloud Database and Messaging Service  Use SQS as a write-ahead log (WAL) 8KB limit Store large objects as temporary S3 objects, and record the pointer in WAL  Commit daemon Read the log records Assemble all the records belonging to a transaction Ignore the records if the client crash

25 25 Client PUT: Temp data copy OK Copy:Data OK S3 SimpleDB OK BatchPUTAttributes SQS SendMessage: Prov OK Commitd RecvMessage S3 PUT:Prov>1KB Delete:temp Delete:Msg OK

26 Protocol(9) 26

27 Evaluation(1) 27  Workload  CVSROOT nightly backup IO intensive 240 operations  Blast Mix of compute and IO operations Provenance tree has a depth of 5 10773 operations  Challenge Mix of compute and IO operations Provenance tree has a depth of 11 6179 operations

28 Evaluation(2) 28 EC2 instanceLocal machine

29 Evaluation(3) 29  Query performance  Q1 Retrieve all the provenance ever recorded  Q2 Retrieve the provenance of all version of one object  Q3 Find all files that were directly output by Blast  Q4 Find all the descendants of files derived from Blast

30 Evaluation(4) 30

31 Conclusion 31  Definition of properties that provenance systems must exhibit  Design and implementation of three protocols for storing provenance and data on the cloud  All three protocols have reasonable overhead in time and minimal financial overhead

32 Comment 32  Economy  Provenance can not increase profit directly  Customer loyalty  Security  Provenance can ensure correctness of files  But it may contain sensitive information

33 33  THE END


Download ppt "PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard."

Similar presentations


Ads by Google