Download presentation
Presentation is loading. Please wait.
Published byDaniella Bolas Modified over 9 years ago
1
PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences 1
2
Outline 2 Introduction Background Provenance System Property Architecture & Protocol Evaluation Conclusion & Comment
3
Introduction 3 Problem to Solve Implement a provenance aware storage system in current cloud stores ( use Amazon )
4
Background(1/3) 4 Provenance Data has two critical components What it is ( contents ) Where it came from ( ancestry ) The provenance is the description of how the object was derived. The metadata that describes the history of an object Why use provenance? Use case – Slogan Digital Sky Survey (SDSS) Debug Experimental Results Detect and Avoid Faulty Data Propagation Improving Text Search Result Security
5
5
6
Background(2/3) 6 Provenance can be abstract defined as a directed acyclic graph ( DAG ) Nodes objects : files, processes, tuples, data sets, etc Have attributes Command line arguments Name and Version number Edges Indicate a dependency between the objects
7
7 Justification Report is justified by is response to is caused by is response to is based on is caused by Data Collection Request I1 Blood Test Request I2 Donor Data Request I4 Donation Decision I9 Blood Test Request I6 Decision Request I8 Blood Test Result I7 Donor Data I5 Patient Brain Death Notification I3
8
Background(3/3) 8 Eventual Consistency A weaker form of data consistency During a sufficient long period of time, and no updates are sent, we can expect that all replicas in system will be consistent
9
Provenance System Property(1/2) 9 Provenance Data Coupling An object and its provenance must match The provenance must accurately and completely describe the data Multi-object Causal Ordering The causal relationship among objects A system must ensure that an object’s ancestors and their provenance are persistent before making the object itself persistent
10
10 Justification Report is justified by is response to is caused by is response to is based on is caused by Data Collection Request I1 Blood Test Request I2 Donor Data Request I4 Donation Decision I9 Blood Test Request I6 Decision Request I8 Blood Test Result I7 Donor Data I5 Patient Brain Death Notification I3
11
Provenance System Property(2/2) 11 Data Independent Persistence Ensure a system retain an object’s provenance, even if the object is removed Efficient Query Be accessible to users who want to access or verify provenance properties of their data
12
Architecture(1) 12
13
Architecture(2) – S3 13 Simple Storage Service(S3) Amazon’s storage service An object store where the size of objects can range from 1 byte to 5GB With each objects, clients can store up to 2KB of metadata Use SOAP or REST API PUT, GET, HEAD, COPY, DELETE
14
Architecture(3) - SimpleDB 14 SimpleDB An Amazon’s service that provides the functionality of indexing and querying data Data model consist items that are described by pairs Each item can have 256 pairs Each attribute name and value can be as large as 1KB
15
Architecture(4) - SQS 15 Simple Queueing Service Distributed messaging system that allows users to exchange messages between various distributed components in their systems 8KB limit of the size of the message In this paper, SQS is used as a write-ahead log(WAL)
16
Architecture(5) -- PASS 16 Provenance-Aware Storage System A storage system that automatically collects, stores., manages, and provides search for provenance Monitor system calls Generate provenance and sending both provenance and data to PA-S3fs
17
Architecture(6) – PA-S3fs 17 Provenance Aware S3 File System Caches data and provenance on the client to reduce traffic to S3 Send data and provenance to the cloud
18
Protocol(1) 18
19
Protocol(2) 19 Protocol 1 ( P1 ) Standalone Cloud Store Map each file to an S3 object and store the provenance as a separate S3 object Provenance object Named with a uuid Contain the name of primary object Primary object metadata Version number and uuid
20
Protocol(3) 20 P1 does not support data coupling But can detect decoupling Query is inefficient Need retrieve all provenance Client PUT:Provenance OK PUT:Data OK S3
21
Protocol(4) 21
22
Protocol(5) 22 Protocol 2 ( P2 ) Cloud store with a cloud database Store provenance as one SimpleDB item If item is larger than 1KB SimpleDB limit store provenance as S3 object save the pointer in attribute-value
23
Protocol(6) 23 Provide efficient provenance queries Does not support data coupling Client PUT: Prov > 1KB OK PUT:Data OK S3 SimpleDB OK BatchPUTAttributes: Prov
24
Protocol(7) 24 Protocol 3 ( P3 ) Cloud store with Cloud Database and Messaging Service Use SQS as a write-ahead log (WAL) 8KB limit Store large objects as temporary S3 objects, and record the pointer in WAL Commit daemon Read the log records Assemble all the records belonging to a transaction Ignore the records if the client crash
25
25 Client PUT: Temp data copy OK Copy:Data OK S3 SimpleDB OK BatchPUTAttributes SQS SendMessage: Prov OK Commitd RecvMessage S3 PUT:Prov>1KB Delete:temp Delete:Msg OK
26
Protocol(9) 26
27
Evaluation(1) 27 Workload CVSROOT nightly backup IO intensive 240 operations Blast Mix of compute and IO operations Provenance tree has a depth of 5 10773 operations Challenge Mix of compute and IO operations Provenance tree has a depth of 11 6179 operations
28
Evaluation(2) 28 EC2 instanceLocal machine
29
Evaluation(3) 29 Query performance Q1 Retrieve all the provenance ever recorded Q2 Retrieve the provenance of all version of one object Q3 Find all files that were directly output by Blast Q4 Find all the descendants of files derived from Blast
30
Evaluation(4) 30
31
Conclusion 31 Definition of properties that provenance systems must exhibit Design and implementation of three protocols for storing provenance and data on the cloud All three protocols have reasonable overhead in time and minimal financial overhead
32
Comment 32 Economy Provenance can not increase profit directly Customer loyalty Security Provenance can ensure correctness of files But it may contain sensitive information
33
33 THE END
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.