Elastic Data Partitioning for Cloud-based SQL Processing Systems Lipyeow Lim Information & Computer Science Department University of Hawai`i at Mānoa 9/8/20101Lipyeow Lim -- University of Hawai`i at Manoa
Outline 9/8/2010Lipyeow Lim -- University of Hawai`i at Manoa2
DBMS Shared Nothing Parallel DBMS 1/14/2013Lipyeow Lim -- University of Hawaii at Manoa3 DBMS query results Network Parallel DB layer
Cloud-based Architecture 1/14/2013Lipyeow Lim -- University of Hawaii at Manoa4 (Virtualized) Network Disk Memory CPU Disk Memory CPU Disk Memory CPU Disk Memory CPU Amazon EC2 Physical Resources Virtual Machines
DBMS “Scaling” Up and Down 1/14/2013Lipyeow Lim -- University of Hawaii at Manoa5 Network Parallel DB layer DBMS query results
Problem Statement Given A relation T A partitioning function F on a fixed partitioning key An initial number p of partitions/fragments An initial mapping of p fragments to p database nodes A target number q of partitions Find a mapping of {T1, T2,.. Tp} to {T1, T2,... Tq} and an assignment of the q fragments to q database nodes Such that we minimize The number of tuples re-partitioned The number of tuples moved between database nodes 9/8/2010Lipyeow Lim -- University of Hawai`i at Manoa6
Partitioning a Relation Partitioning attribute/key. Partitioning type. Eg. Range or Hash Partitioning constraint. Eg. Equi-width, equi-size Number of partitions/fragments. 9/8/2010Lipyeow Lim -- University of Hawai`i at Manoa : : : : hash function
Horizontal Fragmentation: Range Partition sidsnameratingage 22dustin745 29brutus133 31lubber855 32andy423 58rusty horatio735 1/14/2013Lipyeow Lim -- University of Hawaii at Manoa8 sidsnameratingage 29brutus133 32andy423 sidsnameratingage 22dustin745 31lubber855 58rusty horatio735 Range Partition on rating column Partition 1: 0 <= rating < 5 Partition 2: 5 <= rating <= 10 Partition 1 Partition 2
Range Partition: Query Processing Which partitions? Better than non-parallel ? 1/14/2013Lipyeow Lim -- University of Hawaii at Manoa9 sidsnameratingage 29brutus133 32andy423 sidsnameratingage 22dustin745 31lubber855 58rusty horatio735 Partition 1 Partition 2 SELECT * FROM Sailors S SELECT * FROM Sailors S WHERE rating = 2 SELECT * FROM Sailors S WHERE rating < 2 and age < 30 SELECT * FROM Sailors S WHERE age > 30
Partition 1 Partition 2 Horizontal Fragmentation: Hash Partition Hash partitioning using hash function – Partition = rating mod 2 1/14/2013Lipyeow Lim -- University of Hawaii at Manoa10 sidsnameratingage 22dustin745 29brutus133 31lubber855 32andy423 58rusty horatio735 sidsnameratingage 31lubber855 32andy423 58rusty1035 sidsnameratingage 22dustin745 29brutus133 64horatio735
Hash Partition: Query Processing Which partitions? Better than non-parallel ? 1/14/2013Lipyeow Lim -- University of Hawaii at Manoa11 SELECT * FROM Sailors S SELECT * FROM Sailors S WHERE rating = 2 SELECT * FROM Sailors S WHERE rating < 2 and age < 30 SELECT * FROM Sailors S WHERE age > 30 Partition 1 Partition 2 sidsnameratingage 31lubber855 32andy423 58rusty1035 sidsnameratingage 22dustin745 29brutus133 64horatio735
Method N: Naive Resize 9/8/2010Lipyeow Lim -- University of Hawai`i at Manoa12
Method C : Chunk-based 9/8/2010Lipyeow Lim -- University of Hawai`i at Manoa13
Method T : Tree-based 9/8/2010Lipyeow Lim -- University of Hawai`i at Manoa14
Method H : Hash-based 9/8/2010Lipyeow Lim -- University of Hawai`i at Manoa15
9/8/2010Lipyeow Lim -- University of Hawai`i at Manoa16
9/8/2010Lipyeow Lim -- University of Hawai`i at Manoa17
9/8/2010Lipyeow Lim -- University of Hawai`i at Manoa18
9/8/2010Lipyeow Lim -- University of Hawai`i at Manoa19