Presentation is loading. Please wait.

Presentation is loading. Please wait.

GWpilot: a personal pilot system A.J. Rubio-Montero, E. Huedo and R. Mayo-García EGI Technical Forum 2012 Prague – 20 Sep 2012.

Similar presentations


Presentation on theme: "GWpilot: a personal pilot system A.J. Rubio-Montero, E. Huedo and R. Mayo-García EGI Technical Forum 2012 Prague – 20 Sep 2012."— Presentation transcript:

1 GWpilot: a personal pilot system A.J. Rubio-Montero, E. Huedo and R. Mayo-García EGI Technical Forum 2012 Prague – 20 Sep 2012

2 EGI TF 2012 – Prague, 20 Sep 20122 Outline  Common problems in Grid computation  Pilot Jobs  GWpilot  Advantages  Utilisation  Design and improvements  Suitability  DKEsG  Revision  Description of the calculation performed  Performance measurements  Some results  Conclusions

3 EGI TF 2012 – Prague, 20 Sep 20123 Common problems in Grid Computation  Variable overheads: queue waiting times, overload...  Variable error rate: connection cuts, jobs arbitrary aborted…  Diverse configurations: complexity, unexpected types of WN…  To increase performance by means of scheduling, the resources must be completely characterised, but it is impossible with generic middleware:  Design defects: GLUE specification is incomplete: no broadband, latency, queue policy, average waiting time, resource profile…  Miss-configurations and lacks of maintenance  Solutions  Self-scheduling  Models  Heuristics  Statistics  Pilot jobs

4 EGI TF 2012 – Prague, 20 Sep 20124 Pilot Jobs: basics SLOT APPROPIATION pilot Coordinator Server task pilot LRMS queue CE pilot Pilot Factory task monitoring task pilot

5 EGI TF 2012 – Prague, 20 Sep 20125 Pilot jobs: benefits  Reduce the Grid complexity:  direct use and characterization of assigned WNs  direct monitoring user tasks.  Fix task dispatching overheads  remove the waiting time in remote queues  remove middleware overheads and errors (CREAM,GRAM)  Reduce task error rate: middleware, hardware or connectivity  Increase compatibility  creating special configurations  Implementing legacy communication protocols  Allows the implementation of advanced scheduling techniques

6 EGI TF 2012 – Prague, 20 Sep 20126 Pilot Jobs: Implementations  Centralized frameworks: daunting maintenance, deployment and customization.  AliEn and PanDa (suitable for HEP users)  DIRAC  glideinWMS  EDGeS (XtremWeb) and GridBot (BOINC)  Application-oriented: mono-user, mono-application.  DIANE  They are not exploiting all the scheduling advantages provided by pilot jobs or they lack compatibility or adaptability aspects  Alternative  GWpilot

7 EGI TF 2012 – Prague, 20 Sep 20127 GWpilot: features  Easy-to-install and standalone from remote middleware  Highly customizable and tuneable, even by unskilled users  Multi-user with fair-share policies  Compatible with previously ported applications  Interoperable with diverse Grid infrastructures  Lightweight and scalable, achieving nearly optimal performance  Advanced scheduling policies for any kind of tasks

8 EGI TF 2012 – Prague, 20 Sep 20128 GWpilot: simplicity of use and configuration  GWpilot makes the use of pilot jobs automatic and unattended both to users and developers: # cat ls_template.jt EXECUTABLE = /bin/ls STDOUT_FILE = logs/ls.out.${ARCH}.${JOB_ID} STDERR_FILE = logs/ls.err.${ARCH}.${JOB_ID} REQUIREMENTS = LRMS_NAME = "jobmanager-pilot" RANK = CPU_MHZ # gwsubmit -t ls_template.jt  Usual configuration parameters  maximum of submitted pilots  dispatching suspension timeout (maximum time spent at remote LRMS)  pilot pulling interval against GWpilot and number of retries Users have only to fix this requirement in their tasks # cat gwd.conf … IM_MAD = pilot_im:gw_im_mad_pilot::dummy:pilot_em EM_MAD = pilot_em:gw_em_mad_pilot:-m 550 –t 18000 -i 45 -f 20 :rsl_nsh …

9 EGI TF 2012 – Prague, 20 Sep 20129 GWpilot: integrated into GridWay metascheduler GridWay Core CREAMGRAM GWpilot Server MSD2 GLUE CLIDRMAABESJSDL Applications Scheduler Allows submitting a % more pilots than the estimated free slots More accurate estimation of free slots pilot task CREAM CE site-BDII GLOBUS CE site-BDII pilot HTTPS pull BDII task GWpilot Factory pilots task

10 EGI TF 2012 – Prague, 20 Sep 201210 GWpilot: suitability for distributed applications  Could give a boost to your computational challenges !!!  Legacy applications previously ported to GridWay or to DRMAA/BES/JSDL standards can directly benefit from GWpilot.  Examples:  Truba/MaRaTra  VMEC  ISDEP  FAFNER-2  gGEM  DKEsG : Drift Kinetic Equation solver for Grid

11 EGI TF 2012 – Prague, 20 Sep 201211 DKEsG: calculating NC transport of Fusion devices * 1 A. J. Rubio-Montero et al. “Drift Kinetic Equation Solver for Grid (DKEsG),” IEEE Trans.Plasma Sci., 38( 9). 2010. * 2 D. A. Spong, “Generation and damping of neoclassical plasma flows in stellarators,” Phys. Plasmas, 12(5), 2005. Fluxes through the surfaces generated by the magnetic field lines: DRMAA-enabled producer-consumer workflow: chunking tasks and polling time for BoT states are customizable The DKEsG Workflow * 1 Updated with Spong’s DKEs code* 2 NC transport coefficients

12 EGI TF 2012 – Prague, 20 Sep 201212 Experiment: DKEsG parameter scan with the TJ-II standard configuration r[2…141] X EFIELD[-250…250:10] X CMUL[(1…9)10(-7…0)] = 514,080 independent tasks (1 to 12 min proportional to radius) 420 tasks 103,236 independent BoTs 5 tasks X BoT 6.58 years on Intel Xeon X5365 3GHz (64bit)

13 EGI TF 2012 – Prague, 20 Sep 201213 Resources used and configuration bounds GISELA infrastructure (prod.vo.eu-eela.eu) Discarded 32bits, duplicate and CERN resources. Used for submitting pilots  DKEsG only submits 500 BoTs  BoT Susp. timeout: 60 secs  Limited to 100 jobs per resource  queues overloaded with 15% more pilots  suspension timeout: 5 hours Max pilots submitted by GWpilot  Other configuration parameters:  Pilot pulling interval : 45 seconds with 20 retries.  DKEsG polling time: 15 secs.  Resources are ranked/prioritised based on CPU speed

14 EGI TF 2012 – Prague, 20 Sep 201214 Experiment: measured computational results Many overloaded sites Pilots die when a BoT is running inside Suspended BoTs because they have been assigned to death pilots Only 0.4% failing BoTs 62% failing grid jobs Makespan: 94 h: 27 m: 31 s. 610 times faster than sequential Total time wasted at remote queues: 1 year and 20 days. Not appreciable by the user.

15 EGI TF 2012 – Prague, 20 Sep 201215 Experiment: turnaround measurements DKEsG cannot supply enough BoTs The number of available pilots are lower than 500 Pilot overhead is always between 41-43 secs  Scalability of GWpilot Accumulated turnaround overhead is only 6.21%. If only GridWay were used the resultant one* would be 61.48 % * A.J. Rubio-Montero et al, “Executions of a Drift Kinetic Ecuation solver on Grid,” in PDP 2010, Pisa, Italy.

16 EGI TF 2012 – Prague, 20 Sep 201216 Plasma results: bootstrap current (D 13 ) of the outer radial plasma position in TJ-II (negative polarization) This surface needs: 3672 DKEsG-Mono tasks = 663 CPU hours consumed from the Grid Boostrap current tends to zero  The collisionless asymptotic value (which depends on the configuration) is recovered.  Larger uncertainties appear in the long mean free path regime. As expected, the coefficients are even in the electric field.

17 EGI TF 2012 – Prague, 20 Sep 201217 Plasma results: normalized NC transport coefficients (L 33 ) from the resistivity enhancement (positive polarization)  CMUL and EFIELD parameters decrease monotonically as K increases.  By solving the integration in K, there is a continuous reutilization of the data included in the database

18 EGI TF 2012 – Prague, 20 Sep 201218 Conclusions  Summary  GWpilot is suitable to easily improve the performance of several kinds of fusion codes.  New features have been implemented in GridWay and DKEsG.  DKEsG execution shows impressive improvements in terms of makespan and turnaround.  Future work  Continue the DKEsG calculations in order to build an extensive database for several fusion devices that allows the user to read the monoenergetic coefficients and to obtain the final fluxes without performing again the calculations.  We are evaluating GWpilot with other applications from other scientific areas.  More information at:  www.ciemat.es/portal.do?IDR=343&TR=C www.ciemat.es/portal.do?IDR=343&TR=C  www.gridway.org www.gridway.org

19 EGI TF 2012 – Prague, 20 Sep 201219 Thanks for your attention


Download ppt "GWpilot: a personal pilot system A.J. Rubio-Montero, E. Huedo and R. Mayo-García EGI Technical Forum 2012 Prague – 20 Sep 2012."

Similar presentations


Ads by Google