Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and.

Similar presentations


Presentation on theme: "Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and."— Presentation transcript:

1 Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and Engg. The Ohio State University Columbus, Ohio - 43210

2 Outline Introduction Motivation FREERIDE-G Processing Structure Run-time Load Balancing System Experimental Results Conclusions December 24, 20152

3 Introduction Growing abundance of data –Sensors, scientific simulations and business transactions Data Analysis –Translate raw data into knowledge Grid/Cloud Computing –Enables distributed processing December 24, 20153

4 Motivation Resources are geographically distributed –Data nodes –Compute nodes –Middleware user Remote data analysis is important Heterogeneity of resources –Difference in network bandwidth –Difference in compute power December 24, 20154 Data Nodes Compute Nodes Middleware user Grid/Cloud Environment

5 FREERIDE-G Processing Structure (Framework for Rapid Implementation of Datamining Engines – Grid) December 24, 20155 While( ) { forall( data instances d) { (I, d’) = process(d) R(I) = R(I) op d’ } ……. } A Map-reduce like system Remote data analysis Middleware API Process Reduce Global Combine Reduction Object

6 A Real-time Grid/Cloud Scenario December 24, 20156 A B C D Compute Data

7 Run-time Load Balancing December 24, 20157 Two factors of load imbalance Computational factor, w1 Remote data transfer (wait time), w2 Case 1: w1 > w2 Case 2: w2 > w1 We use sum of weights to account for both the components

8 Dynamic Load Balancing Algorithm December 24, 20158 Consider every chunk, Ci Calculate Compute cost, Cc Calculate Data transfer cost, Tc Input Bandwidth matrix, W1 & W2 Total cost = W1*Cc + W2*Tc If Total cost < Min Update Min Assign Ci to Pj

9 Experimental Setup Settings Organizational Grid Wide Area Network (WAN) Goals are to evaluate Scalability Dynamic Load balancing overhead Adaptability to scenarios –compute bound, –I/O bound, –WAN setting Applications K-means Vortex Detection December 24, 20159

10 10 Scalability and Overhead of Dynamic Balancing Vortex detection 14.8 GB data Organizational setting Bandwidth –50mb/sec –100mb/sec 31% benefit Overhead within 10% December 24, 201510

11 Model Adaptability – Compute Bound Scenario Kmeans clustering 25.6 GB data Bandwidth –50 MB –200 MB Best result 75-25 combination skewed towards work load component Initial (unbalanced) overhead 57% over balanced Dynamic overhead 5% over balanced December 24, 201511 Ideal Case Dynamic case Compute Data transfer

12 Model Adaptability – I/O Bound Scenario December 24, 201512 Kmeans clustering 25.6 GB data Bandwidth –15 mb/s –60 mb/s Best result 25-75 combination skewed towards data transfer component Initial (unbalanced) overhead 40% over balanced Dynamic overhead 4% over balanced

13 Model Adaptability – WAN setting Vortex Detection 14.6 GB Best result 25-75 combination results in lowest overhead (favoring data delivery component) Unbalanced configuration 20% overhead over balanced Our approach Overhead reduced to 8% December 24, 201513

14 Conclusions Dynamic load balancing solution for grid environments Both workload and data transfer factors are important Scalability is good and overheads are within 10% Adaptable to compute-bound, I/O bound, and WAN settings December 24, 201514

15 December 24, 201515 Thank You! Questions? Contacts: Leonid Glimcher -glimcher@cse.ohio-state.eduglimcher@cse.ohio-state.edu Vignesh Ravi- raviv@cse.ohio-state.eduraviv@cse.ohio-state.edu Gagan Agrawal- agrawal@cse.ohio-state.eduagrawal@cse.ohio-state.edu

16 glimcher@cse.ohio-state.edu P. 16 DataGrid Lab Setup 1: Organizational Grid Data hosted on Opteron 250’s Processed on Opteron 254’s 2 clusters connected through two 10 GB optical fibers Both clusters within same city (0.5 mile apart) Evaluating: Scalability Adaptability Integration overhead Compute cluster (cse-ri) Repository cluster (bmi-ri)

17 glimcher@cse.ohio-state.edu P. 17 DataGrid Lab Setup 2: WAN Data Repository: Opteron 250’s (OSU) Opteron 258’s (Kent St) Processed on Opteron 254’s No dedicated link between processing and repository clusters Evaluating: Scalability Adaptability Compute cluster (OSU ) Repository cluster (Kent ST) Repository cluster (OSU)

18 FREERIDE-G System Design December 24, 201518


Download ppt "Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and."

Similar presentations


Ads by Google