01/22/09ICDCS20061 Load Unbalancing to Improve Performance under Autocorrelated Traffic Ningfang Mi College of William and Mary Joint work with Qi Zhang College of William and Mary Alma Riska Seagate Research Evgenia Smirni College of William and Mary
01/22/09ICDCS20062 Outline Motivation Motivation Our solution Our solution Conclusion and future work Conclusion and future work
01/22/09ICDCS20063 Clustered Servers Front-end Dispatcher Back-end Nodes Load Balancing Heavy tailed service time Round Robin (RR) Round Robin (RR) Random Random Join Shortest Queue (JSQ) Join Shortest Queue (JSQ) Join Shortest Weighted Queue (JSWQ) Join Shortest Weighted Queue (JSWQ) AdaptLoad AdaptLoad Autocorrelated Interarrival time Performance ?
01/22/09ICDCS20064 Why Considering Dependence? BAD performance effect BAD performance effect Higher ACF, higher dependence, worse performance Higher ACF, higher dependence
01/22/09ICDCS20065 Effect of ACF on Load Balancing Size-based Policies do NOT win! WHY?
01/22/09ICDCS20066 Possible Reason... What is ACF in Each Node? Load plus ACF. Load+ACF
01/22/09ICDCS20067 Outline Motivation Motivation Our solution Our solution Conclusion and future work Conclusion and future work
01/22/09ICDCS20068 Solution EqAL (Equally distribute work guided by Autocorrelation and Load) EqAL (Equally distribute work guided by Autocorrelation and Load) Balancing load Balancing load AdaptLoad AdaptLoad Balancing ACF Balancing ACF Move jobs from strongly correlated node to weakly correlated node Move jobs from strongly correlated node to weakly correlated node
01/22/09ICDCS20069 Review: AdaptLoad Each node only serves request with size falling in certain range Each node only serves request with size falling in certain range [s 0 0, s 1 ), [s 1, s 2 ), … [s N-1, s N ∞) [s 0 0, s 1 ), [s 1, s 2 ), … [s N-1, s N ∞) Self-adjust the size ranges by predicting the incoming workload based on the histogram of previous requests Self-adjust the size ranges by predicting the incoming workload based on the histogram of previous requests
01/22/09ICDCS Review: AdaptLoad Step 1: Build histogram on-line e.g., request sizes in sequential: …
01/22/09ICDCS Review: AdaptLoad Step 1: Build histogram on-line e.g., request sizes in sequential: …
01/22/09ICDCS Review: AdaptLoad Step 1: Build histogram on-line Step 2: At the end of monitoring window, find the boundaries to partition the total work (area) equally Server 1 Server 2 Server 3 Server 4 s0 s1 s2s3 s4
01/22/09ICDCS S_EQAL Server i increase p i of its work Corrective factor p i : ∑ p i = 0 negative (reducing work) vs. positive (increasing work) p 1 =-R (pre-determined corrective constant) p i using semi-geometric method to decide Server 1 Server 2 Server 3 Server 4 Server 1 Server 2 Server 3 Server 4
01/22/09ICDCS Performance of S_EQAL Service time: WorldCup 1998 Trace Service time: WorldCup 1998 Trace Inter-arrival time: MMPP(2) Inter-arrival time: MMPP(2) Same statistics moments as WorldCup Same statistics moments as WorldCup With short range dependence (SRD) With short range dependence (SRD) 4 servers in the cluster 4 servers in the cluster Average utilization per server: 62% Average utilization per server: 62%
01/22/09ICDCS Average Slowdown by R Best Slowdown
01/22/09ICDCS Average Response Time by R Best Response Time How to get optimal R ?
01/22/09ICDCS Dynamic Policy: D_EQAL Self adjust R Self adjust R R R is initialized as 0 RAdj Adjust R for a small value Adj at the end of each monitoring window The adjustment should improve both slowdown and response time If not, wrong direction p i Recalculate p i Set size boundaries
01/22/09ICDCS Performance of D_EQAL
01/22/09ICDCS Effectiveness of D_EQAL
01/22/09ICDCS Outline Motivation Motivation Our solution Our solution Conclusion and future work Conclusion and future work
01/22/09ICDCS Conclusion and Future Work Load balancing policy should also consider dependence structure in traffic. Load balancing policy should also consider dependence structure in traffic. D_EQAL balances the load and correlation D_EQAL balances the load and correlation Self-adaptive Self-adaptive effective effective Future work Future work More adaptive -- detect the change of dependence structure More adaptive -- detect the change of dependence structure Multiple classes - consider different priority Multiple classes - consider different priority
01/22/09ICDCS Thank you ! Questions?
01/22/09ICDCS Why Considering Dependence? BAD performance effect BAD performance effect Metric: Autocorrelation function (ACF) Metric: Autocorrelation function (ACF) Inter-arrival time of the i th request: X i Inter-arrival time of the i th request: X i The correlation between inter-arrival times with lag k The correlation between inter-arrival times with lag k x0x0 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8 lag(1) lag(2) Higher ACF, higher dependence, worse performance
01/22/09ICDCS Examples of ACF
01/22/09ICDCS Effect of ACF on a Single Server
01/22/09ICDCS Load Balancing Policies None priori knowledge None priori knowledge Round Robin (RR) Round Robin (RR) Random Random Size-based Size-based Join Shortest Queue (JSQ) Join Shortest Queue (JSQ) Join Shortest Weighted Queue (JSWQ) Join Shortest Weighted Queue (JSWQ) AdaptLoad AdaptLoad … Proved optimal performance Proved optimal performance
01/22/09ICDCS Inside Each Server
01/22/09ICDCS Inside Each Server Too Bias How to get optimal R ?