Download presentation
Presentation is loading. Please wait.
Published byRosemary Croson Modified over 10 years ago
1
Tuan V. Dinh, Lachlan Andrew and Yoni Nazarathy Modelling a supercomputer with the model Australia and New Zealand Applied Probability Workshop
2
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 2 Australia and New Zealand Applied Probability Workshop Supercomputer clusters large scale simulation: climate, genome, astronomy, etc. foundation of cloud computing BIG DATA EXASCALE COMPUTING MORE COMPUTING POWER DESIRED Electricity bills Heat – thermal management Investment – cooling systems, hardware, etc.
3
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 3 Australia and New Zealand Applied Probability Workshop Power proportionality Load Power ideal reality 60% peak single server (1) ( 1) Bassoro, “The case for energy proportional”, 2007. idle server ~ 60% peak power turn off idle servers challenges: switching cost (setup, wear-and-tear), performance impacts ? Swinburne Supercomputer
4
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 4 Australia and New Zealand Applied Probability Workshop An energy saving framework CONTROL FRAMEWORK system congestion model number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? min ( ) energy performance penalty switching ++ Objective:
5
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 5 Australia and New Zealand Applied Probability Workshop Congestion model CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? min ( ) energy performance penalty switching + + Objective:
6
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 6 Australia and New Zealand Applied Probability Workshop Congestion model - 1 2 3 … batch Poisson, rate function batch size distribution with c.d.f i.i.d service time WHY ? jobs arrive in “batch” manner, i.e within seconds, from same user system mostly under-utilized, using infinite server approximation substantial daily variations
7
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 7 Australia and New Zealand Applied Probability Workshop Discrete-time cost time T+tt : current running jobs t +k {jobs arriving in (t,t+k], still around at t+k} {jobs arriving before t, still around at t+k} C(k) = n(k) + |n(k) – n(k-1)| + C 1 (k):energyC 3 (k):performance penaltyC 2 (k):switching
8
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 8 Australia and New Zealand Applied Probability Workshop Optimization formulation C(k) = n(k) + |n(k) – n(k-1)|+ C 1 (k):energyC 3 (k):performance penaltyC 2 (k):switching (*) solving (*): load estimation in far future. the system can feedback the ACTUAL load U(s) for s < k
9
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 9 Australia and New Zealand Applied Probability Workshop A Model Predictive Control framework CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? min ( ) energy performance penalty switching ++ Objective: MPC
10
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 10 Australia and New Zealand Applied Probability Workshop Model Predictive Control execution time T+t t T Solve (**), obtain {n*(0), n*(1),…}.ONLY “execute” n*(0). t +1 T T+t+1 Solve (**), obtain {n*(0), n*(1),…}.ONLY “execute” n*(0). (**) Limited look-ahead 1.less sensitive to load estimation accuracy 2.Use “on-going” information know how many jobs actually arrived in (t,t+1]
11
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 11 Australia and New Zealand Applied Probability Workshop Solving the optimization problem { n(k) + |u(k)| } (***) s.t:, k =0,1…,K-1 Normal approximation C(k) = n(k) + |n(k) – n(k-1)|+ C 1 (k):energyC 3 (k):performance penaltyC 2 (k):switching k =0,1…,K-1 solved numerically using LP
12
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 12 Australia and New Zealand Applied Probability Workshop X(k): new arrivals [Carrillo,89]: is a compound Poisson RV, with batch rate:, where s = (k+1/2)Δ; Δ: slot-time. even if the arrival process is NOT Poisson, [Whitt,99]. {jobs arriving in (t,t+k], still around at t+k} N ~ Poisson( ) b i : i.i.d batch size, mean and variance
13
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 13 Australia and New Zealand Applied Probability Workshop U(k): existing jobs [Carrillo,91]: is a binomial RV, with parameters: and, where s = (k+1/2)Δ; Δ: slot-time. Hence: {jobs arriving before t, still around at t+k} one can use job elapsed runtimes to calculate [Whitt,99]
14
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 14 Australia and New Zealand Applied Probability Workshop Summary of analytical framework CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? Objective: MPC LP optimization Normal approximation min ( ) energy performance penalty switching ++
15
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 15 Australia and New Zealand Applied Probability Workshop Numerical evaluation supercomputer simulator CONTROLLER system states control decision Swinburne supercomputer logs cost performance
16
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 16 Australia and New Zealand Applied Probability Workshop Scheme 1: All up (no turn off) supercomputer simulator system states control decision cost performance NO CONTROL Swinburne supercomputer logs
17
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 17 Australia and New Zealand Applied Probability Workshop Scheme 2: t wait heuristic supercomputer simulator system states control decision cost performance t wait heuristic Server idle for t wait => turn OFF Swinburne supercomputer logs
18
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 18 Australia and New Zealand Applied Probability Workshop Scheme 3: predictive control supercomputer simulator system states control decision cost performance MPC estimated from historical data Swinburne supercomputer logs
19
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 19 Australia and New Zealand Applied Probability Workshop S.3: rate function time of day rate arrivals 20102011 use daily periodic rates
20
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 20 Australia and New Zealand Applied Probability Workshop S.3: service time & batch size [Lublin et al.,2003]: Hyper-Gamma, Log-uniform [Li et al.,2005]: Log Normal, Weibull Empirical (2010) Gamma time(sec) c.d.f size(CPU) c.d.f Our approximations only concern MEAN and VARIANCE of X X: batch size G: service time (2010)
21
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 21 Australia and New Zealand Applied Probability Workshop S.3: cost performance ε ~ service availability normalised cost Cost 1 = total cost when there is NO CONTROL (energy only) Simulation period: 1 year
22
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 22 Australia and New Zealand Applied Probability Workshop Cost performance: all schemes “offline” optimal cost [Lu et al., 12]. No perf. penalty S.1S.2S.3, ε = 0.58 consider predictive settings (S.3) whose demand penalty cost is the same as t wait heuristic (S.2) after all, model is to estimate θ(k)s. still > 20% to gain
23
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 23 Australia and New Zealand Applied Probability Workshop Remarks and considerations 1. Room for improvement: ~20% to gain! 2.Examining our estimations ? rate function not accurate Use job elapsed times Normal approximation ? 3. Fundamental bound on what to achieve given uncertainty ? [Dinh,Andrew and Branch,CCgrid13]
24
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 24 Australia and New Zealand Applied Probability Workshop Thank you CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? Objective: MPC LP optimization Normal approximation min ( ) energy performance penalty switching ++
25
http://caia.swin.edu.au/cv/tdinhhttp://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 25 Australia and New Zealand Applied Probability Workshop The objective cost CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? min ( ) energy performance penalty switching ++ Objective:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.