©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
Universität Innsbruck Leopold Franzens Copyright 2006 DERI Innsbruck LarCK Workshop, ISWC/ASWC Busan, Korea 16-Feb-14 Towards Scalable.
No SQL is not about SQL No SQL is a Zoo.. Key-Value Stores Wide Column Stores Document Stores Graph Databases.
Cloud Computing Development. Shallow Introduction.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
Describing Complex Products as Configurations using APL Arrays.
CS 440 Database Management Systems
5/27/2014 Stephen Frein. About Me Director of QA for Comcast.com Adjunct for CCI
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
© 2013 A. Haeberlen, Z. Ives Cloud Storage & Case Studies NETS 212: Scalable & Cloud Computing Fall 2014 Z. Ives University of Pennsylvania 1.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
A Survey of Distributed Database Management Systems Brady Kyle CSC
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
Analysis of Cloud Data Management Systems
Presentation by Krishna
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
IBM Haifa Research 1 The Cloud Trade Off IBM Haifa Research Storage Systems.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
Databases with Scalable capabilities Presented by Mike Trischetta.
:: Conférence :: NoSQL / Scalabilite Etat de l’art Samuel BERTHE10 Mars 2014Epitech Nantes.
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
1. Big Data A broad term for data sets so large or complex that traditional data processing applications ae inadequate. 2.
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
WTT Workshop de Tendências Tecnológicas 2014
Modern Databases NoSQL and NewSQL Willem Visser RW334.
NoSQL Not Only SQL Edel Sherratt. What is NoSQL? Not Only SQL Large volumes of data No schema Partition tolerance – scale by adding more commodity servers.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud.
Cloud Computing Clase 8 - NoSQL Miguel Johnny Matias
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
Geo-distributed Messaging with RabbitMQ
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Look Mom! – NoSQL Charles Nurse | DotNetNuke Corp.
CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.
CSCI5570 Large Scale Data Processing Systems
CS 405G: Introduction to Database Systems
and Big Data Storage Systems
Cloud Computing and Architecuture
CS122B: Projects in Databases and Web Applications Winter 2017
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
Trade-offs in Cloud Databases
Modern Databases NoSQL and NewSQL
NOSQL.
Database Concepts.
Christian Stark and Odbayar Badamjav
NOSQL databases and Big Data Storage Systems
NoSQL Systems Overview (as of November 2011).
Massively Parallel Cloud Data Storage Systems
NOSQL and CAP Theorem.
NoSQL Databases An Overview
CS 440 Database Management Systems
NoSQL Not Only SQL University of Kurdistan Faculty of Engineering
April 13th – Semi-structured data
Transaction Properties: ACID vs. BASE
CMPE 280 Web UI Design and Development March 14 Class Meeting
Presentation transcript:

©2011 Hewlett-Packard Company and Vertica Confidential11 Cloud Storage Challenges Dr. Dinkar Sitaram

2 Overview –Types of cloud storage –Building cloud-scale storages –Challenges: theoretical considerations –Dealing with the challenges Based on Moving to the Cloud by Dinkar Sitaram & Geetha Manjunath, to be published by Elsevier

©2011 Hewlett-Packard Company and Vertica Confidential33 Types of cloud storage

4 File-based cloud storage –Allow storage of files in cloud –Amazon S3, Windows Azure, … –Built on top of HTTP –Amazon S3 Overview Create bucket, objects GET roject/file.c roject/file.c No directories: file names Need AWS Access Key and AWS Secret Key –Region: geographical

5 Database oriented cloud storage –Offers a database service –Examples: Amazon RDS (MySQL), Windows Azure SQL –RDS examples Can administer (e.g., create, replicate) database using Amazon RDS APIs Db.createDBInstanceAsync (parms) creates a database Use JDBC APIs to build applications ResultSet rs = stmt.executeQuery (SELECT * FROM Employee)

6 Key-value stores –Database consists of pairs No schema as in relational databases Typically data need not be normalized More flexible than RDBMS, scales due to fewer restrictions More work in application (e.g., valid values) to guarantee traditional RDBMS qualities –Examples: Amazon SimpleDB, Google BigTable, Hadoop HBase –Programming example (SDB) Google SimpleJDBC String insert = "INSERT INTO employees (name, title) VALUES (Dinkar', Architect)"; int val = st.executeUpdate(insert);

7 XML databases –Store XML documents –Examples: MongoDB Stores JSON documents { Name: Dinkar, Attributes: {Sex: M, Title: Architect} } Documents can have pointers to other documents Index on any attribute (including embedded): db.Orders.EnsureIndex() Searching: db.orders.find() –XML DBs midway between key-value stores and RDBMS Explicitly create indices More complex structures Some XML DBs, e.g., CouchDB, offer transactions

©2011 Hewlett-Packard Company and Vertica Confidential88 Building cloud-scale storage

9 Cloud storage requirements –Scaling to cloud-scale: partitioning –Availability: replication

10 Partitioning strategies –Similar to methods for partitioning databases –Round-robin on partitioning attributes Loses associativity –Hash partitioning –Range-based –Directory-based Memcached Can provide, e.g., geographical partitioning –References: Parallel database systems: the future of high performance database systems, by DeWitt, D and Gray, J, Communications of the ACM, Volume 35 Issue 6, June 1992.

11 Amazon availability –Multiple availability zones per regions Zones failure isolated from each other –Data replicated across 3 availability zones by default

©2011 Hewlett-Packard Company and Vertica Confidential12 Challenges: Theoretical considerations

13 CAP theorem –Fundamental limitation of distributed systems –No distributed system can satisfy all three properties below Conjectured in [Brewer00], proved in [LynGil02] by considering a two-node cluster Consistency: all operations appear to be serialized on a non-distributed object Availability: every operation returns a result Partition-tolerance: Arbitrary number of messages between service nodes are lost –References 1. [Brewer00] Towards Robust Distributed Systems by Eric A. Brewer, ACM Symposium on Principles of Distributed Systems, July , Portland, Oregon 2. [LynGil02] Brewers Conjecture and the Feasibility of Consistent, Available, Partition- Tolerant Web Services, by Nancy Lynch and Seth Gilbert, ACM SIGACT News, Volume 33 Issue 2 (2002), pg

14 2-node example 1. Servers replicated for availability 2. If network partitions 3.Allow servers to operate independently (inconsistent) OR 4. Bring servers down (no availability)

15 Practical example: Netflix –Netflix: video on demand over the Internet –Runs on Amazon cloud –Consider the following scenario User at TV updates list of favorites Load balancer sends update to server 1 Set top box requests favorites list Load balancer sends update to server 2 Is the returned result consistent? Depends! –Comparing NoSQL Availability Models by Adrian Cockcroft, 0/comparing-nosql-availability- models.html 0/comparing-nosql-availability- models.html

©2011 Hewlett-Packard Company and Vertica Confidential16 Dealing with inconsistency predicted by CAP theorem

17 Relaxed consistency –Consistency can be relaxed Weak consistency: system does not guarantee to return consistent results Eventual consistency: if no further updates, system will become consistent. If updates are infrequent, can wait for some time to get consistent value Read your writes consistency: a client performing a read after a write will always see its own updates Session consistency: consistency within a session –Amazon S3 US Standard Region: Eventual consistency US West, EU, Asia Pacific Regions: Read your writes consistency for new object creation, eventual consistency for writes and deletes –Reference: Eventual Consistency by Werner Vogel, Communications of the ACM, January 2009

18 Example: Handling inconsistency –BASE: an alternative to ACID [Brewer00] Basically Available Soft-state Eventually consistent –Example: online shopping portal User table: transactions by user Transaction table: transactions used for billing How do we update both tables after a purchase? –Traditional database method Begin transaction Update User table Update Transaction table End transaction –BASE, an ACID Alternative, by D. Pritchett, ACM Queue, June 2008 –A common cloud Method Queue update to user table Queue update to transaction table –Databases could be inconsistent –Will become eventually consistent User tableTransaction table Application

©2011 Hewlett-Packard Company and Vertica Confidential19 Conclusions

20 Conclusions –Many alternatives for building cloud storage exist –Careful trade-off between consistency and availability