Download presentation
Presentation is loading. Please wait.
Published byPrecious Seabourn Modified over 9 years ago
1
Making Cloud Storage Provenance- Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences
2
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 2 The Cloud Next generation computing environment Next generation computing environment Cheap: Pay as you go Cheap: Pay as you go Provision resources (storage, CPU) on a need basis Provision resources (storage, CPU) on a need basis Provides illusion of infinite resources Provides illusion of infinite resources Companies with large batch oriented tasks can get results quickly Companies with large batch oriented tasks can get results quickly Cloud providers Cloud providers Amazon Web Services (AWS) Amazon Web Services (AWS) Google AppEngine Google AppEngine Microsoft Azure Microsoft Azure
3
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 3 Provenance for the Cloud As apps move to the cloud, so will the data As apps move to the cloud, so will the data Amazon hosts scientific data for free Amazon hosts scientific data for free However, most cloud services are not designed to store provenance However, most cloud services are not designed to store provenance Why Provenance? Why Provenance? Debug Application Results Debug Application Results Validate Data Sets Validate Data Sets Improve Search Results Improve Search Results Regulatory Compliance Regulatory Compliance
4
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 4 Provenance Properties We identified the following properties We identified the following properties Read Correctness Read Correctness Causal Ancestry Ordering Causal Ancestry Ordering Queryable Queryable
5
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 5 Read Correctness Data must be what is described by provenance Data must be what is described by provenance Provenance accurately describes the data object Provenance accurately describes the data object Mechanisms Mechanisms Atomicity: At storage time, both provenance and data should be stored or neither should be stored Atomicity: At storage time, both provenance and data should be stored or neither should be stored Consistency: At retrieval time, data returned should be consistent with provenance Consistency: At retrieval time, data returned should be consistent with provenance
6
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 6 Causal Ancestry Ordering The provenance and data of an ancestor object must be recorded in the provenance system No dangling references
7
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 7 Efficient Query Provenance must be accessible to users who want to verify properties of their data or simply be aware of its lineage If provenance is not readily accessible, the provenance is of questionable value.
8
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 8 Goal How do we design protocols around current cloud services such that these properties are satisfied? How do we design protocols around current cloud services such that these properties are satisfied? Setting Setting Provenance-Aware Storage system (PASS) tracks and collects provenance Provenance-Aware Storage system (PASS) tracks and collects provenance Primarily considered AWS Primarily considered AWS Used 3 services: S3, SimpleDB, SQS Used 3 services: S3, SimpleDB, SQS
9
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 9 Outline Introduction Introduction PASS Background PASS Background Protocol 1: Standalone S3 Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDB Protocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQS Protocol 3: S3 + SimpleDB + SQS Analysis Analysis Conclusion and Status Conclusion and Status
10
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 10 Observes system calls that applications make and captures relationships between objects P: read A Generates record: P depends on A Cache the record P: write B Generates record: B depends on P Store both ‘B depends on P’ and ‘P depends on A’ Mirrors data locally and caches provenance till we need to send it to AWS Provenance-Aware Storage System
11
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 11 Outline Introduction Introduction PASS Background PASS Background Protocol 1: Standalone S3 Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDB Protocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQS Protocol 3: S3 + SimpleDB + SQS Analysis Analysis Conclusion and Status Conclusion and Status
12
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 12 Simple Storage Service (S3) Object Store: sizes from 1byte to 5GB Object Store: sizes from 1byte to 5GB Object’s identified by URI Object’s identified by URI SOAP or REST interface SOAP or REST interface Operations: Operations: PUT, GET, HEAD, COPY, DELETE PUT, GET, HEAD, COPY, DELETE PUT: store an object and its metadata (2KB limit) PUT: store an object and its metadata (2KB limit) HEAD: retrieves metadata of an object HEAD: retrieves metadata of an object Cost: data storage + bandwidth + num ops Cost: data storage + bandwidth + num ops Eventual consistency Eventual consistency
13
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 13 Architecture 1: Standalone S3 Application PASS S3 Prov+Data User System
14
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 14 Protocol 1: Standalone S3 PASS S3 PUT:(Prov >1KB) OK
15
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 15 Protocol 1: Standalone S3 PASS S3 PUT:Data OK
16
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 16 Properties ArchReadCorrectness Causal Ordering EfficientQuery AtomicityConsistency S3
17
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 17 Outline Introduction Introduction PASS Background PASS Background Protocol 1: Standalone S3 Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDB Protocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQS Protocol 3: S3 + SimpleDB + SQS Analysis Analysis Conclusion and Status Conclusion and Status
18
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 18 SimpleDB Service providing database functionality Service providing database functionality Data model: items described by attribute-value pairs Data model: items described by attribute-value pairs 256 attrs maximum, name/value < 1KB 256 attrs maximum, name/value < 1KB Operations: PutAttributes, Query, QueryWithAttributes, and SELECT Operations: PutAttributes, Query, QueryWithAttributes, and SELECT Query returns items Query returns items QueryWithAttributes returns both items and attributes QueryWithAttributes returns both items and attributes Cost: bandwidth + storage + num ops + Cost: bandwidth + storage + num ops + machine hrs machine hrs
19
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 19 Architecture 2: S3 + SimpleDB Application PASS S3 User System SimpleDB Data Prov
20
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 20 Protocol 2: S3 + SimpleDB PASS S3 PUT:(rec > 1KB) OK SimpleDB PutAttrs+ OK
21
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 21 Protocol 2: S3 + SimpleDB PASS S3 PUT:Data OK SimpleDB
22
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 22 Properties ArchReadCorrectness Causal Ordering EfficientQuery AtomicityConsistency S3 SimpleDB
23
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 23 Outline Introduction Introduction PASS Background PASS Background Protocol 1: Standalone S3 Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDB Protocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQS Protocol 3: S3 + SimpleDB + SQS Analysis Analysis Conclusion and Status Conclusion and Status
24
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 24 Simple Queuing Service (SQS) Distributed Messaging System Distributed Messaging System Queues are identified by URL Queues are identified by URL Operations: SendMessage, ReceiveMessage, DeleteMessage Operations: SendMessage, ReceiveMessage, DeleteMessage VisibilityTimeout: VisibilityTimeout: Message will not be available for x seconds after a ReceiveMessage Message will not be available for x seconds after a ReceiveMessage Limits: 8KB message size, max 10 msgs can be received Limits: 8KB message size, max 10 msgs can be received
25
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 25 Architecture 3: S3 + SimpleDB + SQS Application PASS S3 User System SimpleDB Queue1 Data Prov
26
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 26 Protocol 3: S3 + SimpleDB + SQS PASS SimpleDB SQS Commitd S3 PUT: Temp copy OK SndMsg+ OK RecvMsg+ COPY OK OK PutAttrs+
27
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 27 Protocol 3: S3 + SimpleDB + SQS PASS SimpleDB SQS Commitd S3 DelMsg+ DEL:CPY OK
28
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 28 Idempotency SimpleDB, S3, and SQS are idempotent SimpleDB, S3, and SQS are idempotent If a commit daemon crashes, comes back up and processes a transaction again, there will not be errors If a commit daemon crashes, comes back up and processes a transaction again, there will not be errors
29
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 29 Properties ArchReadCorrectness Causal Ordering EfficientQuery AtomicityConsistency S3 SimpleDB SQS
30
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 30 Outline Introduction Introduction PASS Background PASS Background Protocol 1: Standalone S3 Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDB Protocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQS Protocol 3: S3 + SimpleDB + SQS Analysis Analysis Conclusion and Status Conclusion and Status
31
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 31 Analysis Extracted provenance by running three workloads Extracted provenance by running three workloads Linux compile Linux compile Blast Blast Provenance challenge Provenance challenge Compute cost to store and query provenance Compute cost to store and query provenance Number of ops Number of ops Bandwidth Bandwidth
32
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 32 Storage Cost RawP1P2P3 Data1.27GB121.8MB (9.3%) 167.8MB (13.6%) 421.4MB (32.2%) ops31,18024,952 (80.0%) 168,514 (540.5%) 231,287 (741.7%) P1 = S3 P2 = S3 + SimpleDB P3 = S3 + SimpleDB + SQS
33
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 33 Query Cost 1. 1. Dump the provenance of a given object Ran it on all objects for statistical significance 2. 2. Find all the files that were outputs of blast. 3. 3. Find all the descendants of files derived from blast.
34
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 34 Query results QueryS3SimpleDB DataOpsDataOps 1121.8MB56,13251.24MB71,825 2121.8MB56,1322.8KB6 3121.8MB56,13213.8KB31
35
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 35 Outline Introduction Introduction PASS Background PASS Background Protocol 1: Standalone S3 Protocol 1: Standalone S3 Protocol 2: S3 + SimpleDB Protocol 2: S3 + SimpleDB Protocol 3: S3 + SimpleDB + SQS Protocol 3: S3 + SimpleDB + SQS Analysis Analysis Conclusion and Status Conclusion and Status
36
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 36 Conclusions Identified the properties that need to be satisfied for storing provenance in the cloud Identified the properties that need to be satisfied for storing provenance in the cloud Presented various protocols for storing provenance and data on the cloud Presented various protocols for storing provenance and data on the cloud Costs of storing provenance is reasonable Costs of storing provenance is reasonable
37
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 37 Status System almost ready System almost ready Plan to submit it to Symposium on Operating Systems Principles (SOSP) Plan to submit it to Symposium on Operating Systems Principles (SOSP) Really hard to drive up the cost Really hard to drive up the cost Jan Bill = $1.95 Jan Bill = $1.95 Feb Bill = $9.38 Feb Bill = $9.38
38
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 38 Extra
39
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 39 Protocol 1: Standalone S3 On file close: Convert the provenance into attribute- value pairs as required by S3 Convert the provenance into attribute- value pairs as required by S3 If (sizeof(record) > 1KB) If (sizeof(record) > 1KB) Store the record in a separate S3 object Store the record in a separate S3 object Replace attribute-value pair with pointer to this object Replace attribute-value pair with pointer to this object Upload the file using PUT: Upload the file using PUT: Arguments: object, attributes Arguments: object, attributes
40
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 40 Protocol 2: S3 + SimpleDB On file close: 1. Convert the provenance into attribute-value pairs as required by SimpleDB Additonal record: md5sum of (file contents + version) Additonal record: md5sum of (file contents + version) 2. If (sizeof(record) > 1KB) Store the record in a separate S3 object Store the record in a separate S3 object Replace attribute-value pair with pointer to this object Replace attribute-value pair with pointer to this object 3. Issue PutAttributes: store the provenance One item per version (= One PutAttributes) per version of the object One item per version (= One PutAttributes) per version of the object 4. Upload the file to S3 using PUT
41
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 41 Protocol 3: S3 + SimpleDB + SQS (I) Log phase: Log data on a queue Log phase: Log data on a queue 1. Store a copy of the file in a temporary location on S3 2. Allocate a transaction id (uuid) 3. Split provenance into chunks of 8KB and enqueue them on an SQS queue Tag each message with the transaction ID Tag each message with the transaction ID One additional record that has a pointer to the temp S3 object One additional record that has a pointer to the temp S3 object
42
2/23/2009 Making a cloud Provenance-Aware - TaPP'09 42 Protocol 3: S3 + SimpleDB + SQS (II) Commit phase: move data from SQS to S3 and SimpleDB Commit phase: move data from SQS to S3 and SimpleDB 1. ReceiveMessage: get messages from the queue and assemble the packets 2. Store the provenance in SimpleDB using PutAttributes call Take care of overflows Take care of overflows 3. Execute an S3 COPY and copy the object from its temporary location to permanent 4. Delete Messages from SQS 5. Delete temporary file copy
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.