Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tuan V. Dinh, Lachlan Andrew and Yoni Nazarathy Modelling a supercomputer with the model Australia and New Zealand Applied Probability Workshop.

Similar presentations


Presentation on theme: "Tuan V. Dinh, Lachlan Andrew and Yoni Nazarathy Modelling a supercomputer with the model Australia and New Zealand Applied Probability Workshop."— Presentation transcript:

1 Tuan V. Dinh, Lachlan Andrew and Yoni Nazarathy Modelling a supercomputer with the model Australia and New Zealand Applied Probability Workshop

2 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 2 Australia and New Zealand Applied Probability Workshop Supercomputer clusters large scale simulation: climate, genome, astronomy, etc. foundation of cloud computing BIG DATA EXASCALE COMPUTING MORE COMPUTING POWER DESIRED Electricity bills Heat – thermal management Investment – cooling systems, hardware, etc.

3 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 3 Australia and New Zealand Applied Probability Workshop Power proportionality Load Power ideal reality 60% peak single server (1) ( 1) Bassoro, “The case for energy proportional”, 2007. idle server ~ 60% peak power turn off idle servers challenges: switching cost (setup, wear-and-tear), performance impacts ? Swinburne Supercomputer

4 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 4 Australia and New Zealand Applied Probability Workshop An energy saving framework CONTROL FRAMEWORK system congestion model number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? min ( ) energy performance penalty switching ++ Objective:

5 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 5 Australia and New Zealand Applied Probability Workshop Congestion model CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? min ( ) energy performance penalty switching + + Objective:

6 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 6 Australia and New Zealand Applied Probability Workshop Congestion model - 1 2 3 … batch Poisson, rate function batch size distribution with c.d.f i.i.d service time WHY ? jobs arrive in “batch” manner, i.e within seconds, from same user system mostly under-utilized, using infinite server approximation substantial daily variations

7 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 7 Australia and New Zealand Applied Probability Workshop Discrete-time cost time T+tt : current running jobs t +k {jobs arriving in (t,t+k], still around at t+k} {jobs arriving before t, still around at t+k} C(k) = n(k) + |n(k) – n(k-1)| + C 1 (k):energyC 3 (k):performance penaltyC 2 (k):switching

8 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 8 Australia and New Zealand Applied Probability Workshop Optimization formulation C(k) = n(k) + |n(k) – n(k-1)|+ C 1 (k):energyC 3 (k):performance penaltyC 2 (k):switching (*) solving (*): load estimation in far future. the system can feedback the ACTUAL load U(s) for s < k

9 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 9 Australia and New Zealand Applied Probability Workshop A Model Predictive Control framework CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? min ( ) energy performance penalty switching ++ Objective: MPC

10 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 10 Australia and New Zealand Applied Probability Workshop Model Predictive Control execution time T+t t T Solve (**), obtain {n*(0), n*(1),…}.ONLY “execute” n*(0). t +1 T T+t+1 Solve (**), obtain {n*(0), n*(1),…}.ONLY “execute” n*(0). (**) Limited look-ahead 1.less sensitive to load estimation accuracy 2.Use “on-going” information know how many jobs actually arrived in (t,t+1]

11 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 11 Australia and New Zealand Applied Probability Workshop Solving the optimization problem { n(k) + |u(k)| } (***) s.t:, k =0,1…,K-1 Normal approximation C(k) = n(k) + |n(k) – n(k-1)|+ C 1 (k):energyC 3 (k):performance penaltyC 2 (k):switching k =0,1…,K-1 solved numerically using LP

12 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 12 Australia and New Zealand Applied Probability Workshop X(k): new arrivals [Carrillo,89]: is a compound Poisson RV, with batch rate:, where s = (k+1/2)Δ; Δ: slot-time. even if the arrival process is NOT Poisson, [Whitt,99]. {jobs arriving in (t,t+k], still around at t+k} N ~ Poisson( ) b i : i.i.d batch size, mean and variance

13 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 13 Australia and New Zealand Applied Probability Workshop U(k): existing jobs [Carrillo,91]: is a binomial RV, with parameters: and, where s = (k+1/2)Δ; Δ: slot-time. Hence: {jobs arriving before t, still around at t+k} one can use job elapsed runtimes to calculate [Whitt,99]

14 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 14 Australia and New Zealand Applied Probability Workshop Summary of analytical framework CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? Objective: MPC LP optimization Normal approximation min ( ) energy performance penalty switching ++

15 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 15 Australia and New Zealand Applied Probability Workshop Numerical evaluation supercomputer simulator CONTROLLER system states control decision Swinburne supercomputer logs cost performance

16 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 16 Australia and New Zealand Applied Probability Workshop Scheme 1: All up (no turn off) supercomputer simulator system states control decision cost performance NO CONTROL Swinburne supercomputer logs

17 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 17 Australia and New Zealand Applied Probability Workshop Scheme 2: t wait heuristic supercomputer simulator system states control decision cost performance t wait heuristic Server idle for t wait => turn OFF Swinburne supercomputer logs

18 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 18 Australia and New Zealand Applied Probability Workshop Scheme 3: predictive control supercomputer simulator system states control decision cost performance MPC estimated from historical data Swinburne supercomputer logs

19 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 19 Australia and New Zealand Applied Probability Workshop S.3: rate function time of day rate arrivals 20102011 use daily periodic rates

20 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 20 Australia and New Zealand Applied Probability Workshop S.3: service time & batch size [Lublin et al.,2003]: Hyper-Gamma, Log-uniform [Li et al.,2005]: Log Normal, Weibull Empirical (2010) Gamma time(sec) c.d.f size(CPU) c.d.f Our approximations only concern MEAN and VARIANCE of X X: batch size G: service time (2010)

21 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 21 Australia and New Zealand Applied Probability Workshop S.3: cost performance ε ~ service availability normalised cost Cost 1 = total cost when there is NO CONTROL (energy only) Simulation period: 1 year

22 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 22 Australia and New Zealand Applied Probability Workshop Cost performance: all schemes “offline” optimal cost [Lu et al., 12]. No perf. penalty S.1S.2S.3, ε = 0.58 consider predictive settings (S.3) whose demand penalty cost is the same as t wait heuristic (S.2) after all, model is to estimate θ(k)s. still > 20% to gain

23 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 23 Australia and New Zealand Applied Probability Workshop Remarks and considerations 1. Room for improvement: ~20% to gain! 2.Examining our estimations ? rate function not accurate Use job elapsed times Normal approximation ? 3. Fundamental bound on what to achieve given uncertainty ? [Dinh,Andrew and Branch,CCgrid13]

24 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 24 Australia and New Zealand Applied Probability Workshop Thank you CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? Objective: MPC LP optimization Normal approximation min ( ) energy performance penalty switching ++

25 http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 25 Australia and New Zealand Applied Probability Workshop The objective cost CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? min ( ) energy performance penalty switching ++ Objective:


Download ppt "Tuan V. Dinh, Lachlan Andrew and Yoni Nazarathy Modelling a supercomputer with the model Australia and New Zealand Applied Probability Workshop."

Similar presentations


Ads by Google