Database Scalability, Elasticity, and Autonomy in the Cloud Agrawal et al. Oct 24, 2011.

Slides:



Advertisements
Similar presentations
My first computer: The Apple ][ It wanted to be programmed.
Advertisements

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Amazon RDS (MySQL and Oracle) and SQL Azure Emil Tabakov Telerik Software Academy academy.telerik.com.
MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
Running Your Database in the Cloud Eran Levin VP R&D - Xeround.
Manage & Configure SQL Database on the Cloud Haishi Bai Technical Evangelist Microsoft.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Windows Azure SQL Database and Storage Name Title Organization.
Ch 4. The Evolution of Analytic Scalability
Databases with Scalable capabilities Presented by Mike Trischetta.
Database Design Table design Index design Query design Transaction design Capacity Size limits Partitioning (shard) Latency Redundancy Replica overhead.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
PMIT-6102 Advanced Database Systems
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
IBM Almaden Research Center © 2011 IBM Corporation 1 Spinnaker Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore Jun Rao Eugene.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Goodbye rows and tables, hello documents and collections.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
CSC 536 Lecture 10. Outline Case study Google Spanner Consensus, revisited Raft Consensus Algorithm.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
WINDOWS AZURE STORAGE SERVICES A brief comparison and overview of storage services offered by Microsoft.
Scott Klein Technical Evangelist. Scott Klein.
Windows Azure Conference 2014 Designing Applications for Scalability.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.
North America Europe Asia Pacific Data centers.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Intuitions for Scaling Data-Centric Architectures
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Nov 2006 Google released the paper on BigTable.
Mapping the Data Warehouse to a Multiprocessor Architecture
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
CS 540 Database Management Systems
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
Cloud Computing from a Developer’s Perspective Shlomo Swidler CTO & Founder mydrifts.com 25 January 2009.
Deploying Highly Available SQL Server in Windows Azure A Presentation and Demonstration by Microsoft Cluster MVP David Bermingham.
1 Cloud Computing, CS Data in the Cloud: Data-as- a-Service for the Cloud.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
1 Cloud-Native Data Warehousing Bob Muglia. 2 Scenarios with affinity for cloud Gartner 2016 Predictions: By 2018, six billion connected things will be.
Data in the Cloud: Data-as-a-Service for the Cloud
Table General Guidelines for Better System Performance
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
CSE-291 (Cloud Computing) Fall 2016
Operational & Analytical Database
Introduction to NewSQL
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
1 Demand of your DB is changing Presented By: Ashwani Kumar
Outline Virtualization Cloud Computing Microsoft Azure Platform
Table General Guidelines for Better System Performance
Advanced Database Topics
Presentation transcript:

Database Scalability, Elasticity, and Autonomy in the Cloud Agrawal et al. Oct 24, 2011

Framing Survey paper Identifies necessary qualities of cloud storage – Scalability – Sensible consistency / programming model – Scale-down and migration – Autonomic management Pointers to different work in the space

Scalability Add more resources, get more performance – Handle more requests per second – Store more data Achievable with scale-up or scale-out – Scale-out is the only paradigm for the cloud App’s parallelism is limited by Amdahl’s Law

Finding the right design point What’s the right consistency / programming model? Pure key-value stores are too weak – Only have transactions on single records Traditional RDBMs are too strong – Can’t just run MySQL at scale Instead, provide strong consistency within a portion of the data – Megastore – Vertica, Aster, Teradata, Greenplum, …

Data Fusion vs. Data Fission Consistency WeakStrong DynamoMySQLBigTable, PNUTS FusionFission Megastore, G-Store Azure, ElasTraS, Rel Cloud

Data Fusion Start with a key-value store Partition records into groups Provide multi-record updates within a group Cross-group operations handled separately Assumes that cross-group ops are rare

Data Fission Start with a relational database Partition tables into shards Provide ACID within each shard Cross-shard ops are expensive Assumes that cross-shard ops are rare

What’s the difference? Is Fusion vs. Fission a worthwhile distinction? Seems like they both arrive at the same place Megastore “Fusion” vs. ElasTras “Fission” – Shard tables based on a table’s primary key – Shard is co-located on the same machine – ACID transactions within a shard – Primary and secondary indexes – All Megastore is missing is an SQL interface!

The difference Different targeted users – Fusion is for people who own datacenters – Fission is for people who want SQL in the cloud Different exposed API – Fusion is more explicit about performance – Fission tries to hide partitioning from user Anything else?

Elasticity Dynamically scaling up and down on-demand Important with pay-as-you-go cloud pricing Consolidate to reduce costs Expand to increase performance Need to move state and processing duties around within the system

Live migration of databases Shared-disk – “Global disk” shared by all DB nodes – Just need to copy in-memory state – Iterative copy: sync up cached pages + transaction state to minimize the availability hit Shared-nothing – Each DB node is its own separate DB instance – Need to copy both local disk state and memory – Push/pull: gradually shift new requests to the new node, sync state in the background

Database Autonomy Need management to be more automatic Elasticity and load balancing based on usage and ML predictions Performance modeling – Migration costs (availability, performance, $$$) – Resource isolation (consolidated services) – SLAs

Questions?

Tree schema Primary table’s primary key used for sharding Secondary tables are sharded into row groups – Row groups are co-located and transactional Global tables are write-rarely, and replicated on all nodes

Tree schema