Gueyoung Jung, Nathan Gnanasambandam, and Tridib Mukherjee International Conference on Cloud Computing 2012
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting (MOBB) approach Experimental Evaluation Conclusion 2
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting (MOBB) approach Experimental Evaluation Conclusion 3
Collected data can exceed hundreds of terabytes and continuously generated ◦ sensors, social media, click-stream, log files, and mobile devices The solution: Cloud Computing ◦ Analyze big-data by leveraging vast amounts of computing resources available on demand with low resource usage cost 4
Parallel data mining ◦ topic mining, pattern mining ◦ analyze large amounts of unstructured data ◦ time constraint Big-data are partly analyzed on local private resources while rest of big-data are transferred to external computing nodes ◦ more flexible and obvious cost benefits 5
The considerations for optimizing parallel data mining ◦ Node determination ◦ Synchronized completion ◦ Data partition determination Maximally Overlapped Bin-packing driven Bursting (MOBB) 6
The goals of MOBB algorithm ◦ Balancing across computing nodes ◦ Time overlap between data transfer delay and computation time in each computing node 7
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting (MOBB) approach Experimental Evaluation Conclusion 8
Load distribution ◦ the overhead of data transfer Maximum overlap between data transfer and computation ◦ determine the order of different sizes of data chunks transferred to each node Task scheduling among computing nodes ◦ load-balancing (CometCloud) ◦ heterogeneous clouds 9
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting (MOBB) approach Experimental Evaluation Conclusion 10
SLA: Service Level Agreement 11
12
13
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting (MOBB) approach Experimental Evaluation Conclusion 14
15 made by the unit of data
Estimation of computation time ◦ Response surface model ◦ Queueing model Estimation of data transfer delay ◦ more dynamic than computation time ◦ Auto-regressive moving average (ARMA) model 16
17
Determination of bucket size of each node Sorting of data chunks in descending order Sorting node bucket sizes in descending order (high delay = lower bucket size) 18
19
20
21
Weighted load distribution Delay-based preference Buckets are completely filled one at a time ◦ reduce fragmentation of buckets 22
Organize the sequence of chunks for maximizing the overlap between data transfer and computation 23
24
25
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting (MOBB) approach Experimental Evaluation Conclusion 26
Frequent Pattern Mining ◦ A phone call log obtained from a call center and web access log ◦ Size: 200 GB (collected for one year) ◦ Objective: Obtain patterns of each user activities on human resource information systems 27
Four computing nodes ◦ Low–end Local Central node (LLC) 5 VMs, each has two 2.8 GHz cores, 1GB memory, 1TB hard drive ◦ Low-end Local Worker (LLW) similar to LLC ◦ High-end Local Worker (HLW) 6 non-virtualized servers, each has GHz cores, 48GB memory, 10 TB hard drive Shared by other applications ◦ Mid-end Remote Worker (MRW) 9 VMs, each has two 2.8 GHz, 4 GB memory, 1 TB hard drive 28
29
30
31
32 HLW+MRW
Ideal optimal data allocation ◦ The slack time must be 0 33
Introduction Related Work Problem Statement Maximally Overlapped Cloud-Bursting (MOBB) approach Experimental Evaluation Conclusion 34
A cloud-bursting based on maximally overlapped load-balancing algorithm which is to optimize the performance of big-data analytics is proposed Results shows the performance can be improved by 20% to 60% against other approaches 35
36