Presentation is loading. Please wait.

Presentation is loading. Please wait.

SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services Zhe Wu, Michael Butkiewicz, Dorian Perkins, Ethan Katz-Bassett, Harsha.

Similar presentations


Presentation on theme: "SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services Zhe Wu, Michael Butkiewicz, Dorian Perkins, Ethan Katz-Bassett, Harsha."— Presentation transcript:

1 SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services Zhe Wu, Michael Butkiewicz, Dorian Perkins, Ethan Katz-Bassett, Harsha V. Madhyastha UC Riverside and USC

2 Geo-distributed Services for Low Latency 2

3 Cloud Services Simplify Geo-distribution 3

4 Need for Geo-Replication Data uploaded by a user may be viewed/edited by users in other locations Social networking (Facebook, Twitter) File sharing (Dropbox, Google Docs)  Geo-replication of data is necessary Isolated storage service in each cloud data center  Application needs to handle replication itself 4

5 Geo-replication on Cloud Services Lots of recent work on enabling geo-replication Walter(SOSP’11), COPS(SOSP’11), Spanner(OSDI’12), Gemini(OSDI’12), Eiger(NSDI’13)… Faster performance or stronger consistency Added consideration on cloud services 5 Minimizing cost

6 Outline Problem and motivation SPANStore overview Techniques for reducing cost Evaluation 6

7 SPANStore Key value store (GET/PUT interface) spanning cloud storage services Main objective: minimize cost Satisfy application requirements Latency SLOs Consistency (Eventual vs. sequential consistency) Fault-tolerance 7

8 SPANStore Overview 8 SPANStor e App Metadata lookups Return data/ACK Library request Read/write data based on optimal replication policy Data center A Data center B Data center C Data center D

9 SPANStore Overview 9 SPANStor e App Data center B SPANStor e App Data center C SPANStor e Data center A SPANStor e App Data center D Placement Manager workload Replication policy Inter-DC latencies Pricing policies Latency, consistency and fault tolerance requirements SPANStore Characterization Application Input

10 Outline Problem and motivation SPANStore overview Techniques for reducing cost Evaluation 10

11 Questions to be addressed for every object: Where to store replicas How to execute PUTs and GETs

12 Cloud Storage Service Cost 12 Storage cost Request cost Data transfer cost + + = Storage service cost (the amount of data stored) (the number of PUT and GET requests issued) (the amount of data transferred out of data center)

13 Low Latency SLO Requires High Replication in Single Cloud Deployment 13 R R R R Latency bound = 100ms AWS regions

14 Technique 1: Harness Multiple Clouds 14 R R R R R R Latency bound = 100ms AWS regions

15 Price Discrepancies across Clouds 15 Cloud regionStorage price (GB) Data transfer price (GB) GET request price (10000 requests) PUT request price (1000 requests) S3 US West0.095$0.12$0.004$0.005$ Azure Zone20.095$0.19$0.001$0.0001$ GCS0.085$0.12$0.01$ …………… Leveraging discrepancies judiciously can reduce cost

16 Range of Candidate Replication Policies 16 Strategy 1: single replica in cheapest storage cloud R High latencies

17 Range of Candidate Replication Policies 17 Strategy 2: few replicas to reduce latencies R R High data transfer cost

18 Range of Candidate Replication Policies 18 Strategy 3: replicated everywhere PUT R R R R High latencies& cost of PUTs High storage cost Optimal replication policy depends on: 1. application requirements 2. workload properties

19 High Variability of Individual Objects 19 Estimate workload based on same hour in previous week 60% of hours have error higher than 50% 20% of hours have error higher than 100% Error can be as high as 1000% Analyze predictability of Twitter workload

20 Technique 2: Aggregate Workload Prediction per Access Set Observation: stability in aggregate workload Diurnal and weekly patterns Classify objects by access set: Set of data centers from which object is accessed Leverage application knowledge of sharing pattern Dropbox/Google Docs know users that share a file Facebook controls every user’s news feed 20

21 Technique 2: Aggregate Workload Prediction per Access Set 21 Aggregate workload is more stable and predictable Estimate workload based on same hour in previous week

22 Optimizing Cost for GETs and PUTs 22 R R GET R R Use cheap (request + data transfer) data centers

23 Technique 3: Relay Propagation 23 PUT Asynchronous propagation (no latency constraint) R 0.25$/GB 0.19$/GB 0.2$/GB 0.19$/GB 0.12$/GB R R R R

24 Technique 3: Relay Propagation 24 PUT 0.25$/GB 0.19$/GB 0.2$/GB 0.19$/GB 0.12$/GB Violate SLO Asynchronous propagation (no latency constraint) Synchronous propagation (bounded by latency SLO) R R R R R

25 Summary Insights to reduce cost Multi-cloud deployment Use aggregate workload per access set Relay propagation Placement manager uses ILP to combine insights Other techniques Metadata management Two phase-locking protocol Asymmetric quorum set 25

26 Outline Problem and motivation SPANStore overview Techniques for reducing cost Evaluation 26

27 Evaluation Scenario Application is deployed on EC2 SPANStore is deployed across S3, Azure and GCS Simulations to evaluate cost savings Deployment to verify application requirements Retwis ShareJS 27

28 Simulation Settings Compare SPANStore against Replicate everywhere Single replica Single cloud deployment Application requirements Sequential consistency PUT SLO: min SLO satisfies replicate everywhere GET SLO: min SLO satisfies single replica 28

29 SPANStore Enables Cost Savings across Disparate Workloads 29 Savings by relay propagation #1: big objects, more GETs (Lots of data transfers from replicas) #2: big objects, more PUTs ( Lots of data transfers to replicas ) Savings by reducing data transfer #3: small objects, more GETs ( Lots of GET requests ) Savings by price discrepancy of GET request #4: small objects, more PUTs ( Lots of PUT requests ) Savings by price discrepancy of PUT request

30 Deployment Settings 30 Retwis Scale down Twitter workload GET: read timeline PUT: make post Insert: read follower’s timeline and append post to it Requirements: Eventual consistency 90%ile PUT/GET SLO = 100ms

31 SPANStore Meets SLOs 31 SLO 90%ile Insert SLO

32 Conclusions SPANStore Minimize cost while satisfying latency, consistency and fault-tolerance requirements Use multiple cloud providers for greater data center density and pricing discrepancies Judiciously determine replication policy based on workload properties and application needs 32

33


Download ppt "SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services Zhe Wu, Michael Butkiewicz, Dorian Perkins, Ethan Katz-Bassett, Harsha."

Similar presentations


Ads by Google