1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud
Shared Disk vs. Shared Nothing Shared NothingShared Disk Masters Slaves 2
3 Start small, grow incrementally Scalable AND highly available Add capacity on demand with zero downtime Simplicity No need to partition data No need for master-slave Shared Disk Advantages
Server 1 OSS DBMS ScaleDB VM The Virtualized Cloud Database Local Disk OSS DBMS Storage Engine My SQL Server Server 2 OSS DBMS ScaleDB VM OSS DBMS ScaleDB VM OSS DBMS ScaleDB VM OSS DBMS ScaleDB VM Shared Nothing Shared Storage Shared Disk 4
ScaleDB As the Storage Engine 5 ScaleDB Storage Engine MySql Database Management Level Storage Engine Level MySql Server
ScaleDB Cluster Manager ScaleDB Node ScaleDB API Transaction Manager Index Manager Index Manager Data Manager Data Manager Lock Manager Local Lock Manager Log Manager Recovery Manager Recovery Manager Storage Manager Global Recovery Manager Global Sync Manager Global Sync Manager Global Lock Manager Global Lock Manager ScaleDB Storage System Cache & Storage Devices ScaleDB’s Internal Architecture ScaleDB Storage Sysytem Buffer Manager Local Sync Coordinator Threads Manager 6
Deploying ScaleDB … ScaleDB Cluster Manager Node 1 DBMS ScaleDB Node 2 DBMS ScaleDB Node N DBMS ScaleDB Application Application Layer Database Layer (Physical or VM nodes) Storage Layer Shared Storage ScaleDB 7
The Storage Engine Pluggable Storage Engine –Transactional storage engine –Supports MySQL Storage Engine API –Reads/Writes done via network to a shared storage –Maintains a local cache –Local Lock Manager – manage locking at the node level –Connector to Cluster Manager – synchronize operations at a cluster level 8
The Cluster Manager Distributed Lock Manager – manage cluster level locks –Locks can be held over any type of resource: DBMS, Table, Partition, File, Block, Row etc. –Supports multiple lock modes: Read, Read/Write, exclusive etc. –Synchronize state using messaging Local Lock Manager – manage locks at a node level –Maintains locks at the node level –Synchronize state using shared memory Identifies node failures and manage recovery 9
The Cluster Manager Distributed Lock Manager –Synchronize conflicting processes between nodes in the cluster Example: 2 nodes need to update the same resource at the same time. –The challenge: Requests are done via the network – can be expensive: –Internal operations may be in nanoseconds, network operations are in milliseconds –The solution Requests are send only when conflicts occur 10
The Storage Independent storage nodes –Accessible via network –Each node has a Cache Layer and a Persistent Layer –Database nodes can force the write to disk based on transactional requirement –Data can be distributed over multiple storage nodes –Each Storage Node can be mirrored –Each Storage Node may have a Hot Backup Node 11
The Storage Node 12 Disks Cache Based On LRU Interface to Storage Storage Node –Manage the data in cache and flush to disk when required. –Supports the storage engine calls for Read, Write, etc. –Supports pushed calls from storage engine such Count Rows, Search, etc. –Each node is a Linux machine. No need for Network File System (NFS).
Scaling the Storage Tier … ScaleDB Cluster Manager Node 1 DBMS ScaleDB Node 2 DBMS ScaleDB Node N DBMS ScaleDB Database Layer (Physical or VM nodes) Storage Layer 13 Shared Storage Cache TCP/UDP Shared Storage Cache TCP/UDP Shared Storage Cache TCP/UDP Shared Storage Cache TCP/UDP Local Cache Global Cache
14 Global Cache Guarantees cache coherency Manages caching of shared data Minimizes access time to data which is not in local cache and would otherwise be read from disk Implements fast direct memory access over high-speed interconnects for all data blocks and types Uses an efficient and scalable messaging protocol
HA of the Storage Tier … ScaleDB Cluster Manager Node 1 DBMS ScaleDB Node 2 DBMS ScaleDB Node N DBMS ScaleDB Database Layer (Physical or VM nodes) Storage Layer Shared Storage Mirrored Storage ScaleDB Hot Backup 15
Scaling the Storage Tier … ScaleDB Cluster Manager Node 1 DBMS ScaleDB Node 2 DBMS ScaleDB Node N DBMS ScaleDB Database Layer (Physical or VM nodes) Partitioned Storage Partitioned Mirrored Partitioned Hot Backup Partitioned Storage Partitioned Mirrored Partitioned Hot Backup Partitioned Storage Partitioned Mirrored Partitioned Hot Backup Partition 1 Partition 2 Partition Q 16
Scaling the Storage Tier ScaleDB Cluster Manager Node N MySQL Database Layer (Physical or VM nodes) 17 ScaleDB Local Cache Cache Storage Cache Storage Cache Storage Cache Storage Main Mirror Cache Storage Read –From Local Cache –From Main Or Mirror Get From Cache Get From Storage Write –To local cache –At end of transaction multicast to main and mirror optional acknowledgement: –after receive –after write
18 Traditional Query Processing What Were Yesterday Sales ? Get The Sales Table Storage Array Retrieve Entire Sales Table Process Table Data DBMS Server
19 ScaleDB Query Processing Storage Nodes DBMS Server What Were Yesterday Sales ? Get October 15 Sales
Scaling the Storage Tier 20 Advantages –Parallel processing: I/O calls are executed simultaneously on multiple Storage Nodes. Logic pushed to storage layer: “SELECTcustomer_name from calls WHERE amount > 200” Traditional approach – return all rows to the database ScaleDB storage – return selected rows to the database –Leverage cache on multiple storage nodes –Storage layer can be expended without downtime –Data is Mirrored –Support for Hot-Backup –Low cost
High Availability Failure of a node –Detected by the Cluster Manager A surviving node is requested to undo uncommitted transactions Failure of the Cluster Manager –Detected by the Standby Cluster Manager Requests all nodes to undo uncommitted transactions Failure of a Storage Node –Continue with a mirrored storage – or – –Use the Storage Node Log to recover 21
22 Performance / Tuning Occurs when 2 or more nodes want the same resource at the same time Types of Contention: –Read/Read contention – is never a problem because of the shared disk system –Read/Write contention – reader is requested to release the block and grant is provided to writer –Write/Read or Write/Write – Writer sends block to the global cache layer, Buffer invalidate message is send to the other nodes Requestor receives the grant
23 Performance / Tuning Fast Network between the nodes –2 logical networks: Between the database nodes and the Cluster Manager Between the database nodes and the storage –Optimize Socket Receive Buffers ( 256 KB – 1MB ) Partition requests to maintain locality of data –Send requests that update/query the same data to the same node By Database By Table By Table with PK –Logic can change dynamically to adopt to changes Changes in data distribution Changes in user behaviors Additional DBMS nodes
ScaleDB: Elastic/Enterprise Database FunctionSimpleDBRDSScaleDB TransactionsNoYes JoinsNoYes Data ConsistencyNo (Eventual)Yes SQL SupportNoYes ACID CompliantNoYes Supports MySQL applications without modification NoYes Dynamic Elasticity (w/o interruption) YesNoYes High-AvailabilityYesNoYes Eliminates PartitioningYesNoYes Eliminates possible 5-minute data loss upon failure YesNoYes 24
Value Proposition Runs on low-cost cloud infrastructures (e.g. Amazon) High-availability, no single point of failure Dramatically easier set-up & maintenance –No partitioning/repartitioning –No slave and replication headaches –Simplified tuning Scales up/down without interrupting your application Lower TCO 25