Relational Cloud: A Database-as-a-Service for the Cloud Carlo Curino, Evan Jones, Raluca Ada Popa, Nirmesh Malaviya, Eugene Wu, Sam Madden, Hari Balakrishnan, Nickolai Zeldovich Presented by Arka Bhattacharya (for CS 294,Berkeley) (some slides are taken from the CIDR ‘11 talk)
THE STARTUP STORY
Motivation Why move to the cloud ? Economies of scale (hw & licensing costs) Pay per use & lower administrative costs Present players : Amazon RDS (MySQL on EC2) Microsoft SQL Azure
Problems ! Problems arising : Efficient Multi-tenancy (Provider) Elastic scalability (Provider) Privacy (User) Note : Relational Cloud is mainly for OLTP workloads & DAS architectures, consistency guarantees
1. Efficient Multi-tenancy – Placement & Migrations Problem : Consolidate databases into smallest number of servers, balancing load and without affecting performance Solution : Kairos, SIGMOD ’11 Upto 17:1 consolidation Key insight : Single database server per machine + logical databases ; (as opposed to DB in VM, or multiple DB servers per machine ) Reduces redundant work, group commits, lower RAM wastage, code sharing, cheaper context switches
Kairos ….cntd Measure RAM,CPU & Disk usage of a database, and estimate combined load RAM : Probe table to gauge working set size ; additive Disk : Deduce model by testing DBMS with different write rates & working set size & measuring amount of IO CPU : additive Frame optimization problem (non-linear programming) Solving takes time After lots of heuristics, optimization solutions terminate in 8 minutes for 20 servers & 100 workloads !
2. Elastic Scalability Database Partitioning Problem : Partition an OLTP database into N chunks so as to maximize performance Solution : Schism, VLDB 2010 Close to optimal Key insight : Minimize number of distributed transactions Advantage over Hashing, round-robin Use workload trace to find good partitions
Schism …cntd
Schism …. cntd Use a classifier to capture partitioning in compact form, for efficient query routing Lots of heuristics to choose good workload sample Sampling, blanket state filtering, etc Graph Partitioning in fast ( < 40 sec ) Achieves almost linear scalability !
3. Privacy Problem : Prevent DBA from snooping on data ensure data security during application and DBMS server compromise Solution : CryptDB, SOSP 2011 Low overhead ~ 22.5% Key insight : Adjustable security
CrpytDB …Onions Any value DET : equality join DET : equality selection RND Any value OPE-inequality join OPE : inequality select RND int value HOM Onion 1Onion 2Onion 3
Overall architecture DB stats Partitions & placements
Relational Cloud Advantages : Unmodified DB backends Workload-aware consolidation Workload-aware sharding High availability via replication of front-end servers SQL over encrypted data