Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t DBES P. Saiz (IT-ES) AliEn job agents.

Similar presentations


Presentation on theme: "Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t DBES P. Saiz (IT-ES) AliEn job agents."— Presentation transcript:

1 Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t DBES P. Saiz (IT-ES) AliEn job agents

2 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 203 Nov 11 Pablo Saiz Workload management TEG workshop Summary What does an AliEn Job Agent do? AliEn framework TaskQueue JobAgent Challenges Summary

3 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 303 Nov 11 Pablo Saiz Workload management TEG workshop All components to create a GRID File Catalogue –UNIX-like file system –Mapping to physical files –Metadata information –SE discovery Transfer Model –With different plugins TaskQueue –Job Agent & pull model –Automatic installation of software packages –Simulation, reconstruction, analysis... Developed by ALICE –Used also by PANDA and CBM (FAIR) AliEn

4 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 403 Nov 11 Pablo Saiz Workload management TEG workshop AliEn TaskQueue Distribution of jobs among CE With priorities and quotas According to job requirements –InputData, memory, partition... Installation of software packages Multiple backends –LSF, PBS, CREAMCE, CONDOR, FORK Scales up to at least 40k concurrent jobs (150k per day)

5 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 503 Nov 11 Pablo Saiz Workload management TEG workshop xrootd Job execution Job Manager JOB TASKQUEUE Job Broker CE CM Packman MonALISA xrootd Site A JOB CM Packman MonALISA xrootd Site B CM Packman MonALISA Site C File catalogue LFN GUID Meta data JOB CE

6 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 603 Nov 11 Pablo Saiz Workload management TEG workshop Job optimizers Splitting –By file, directory, storage, production Priority –Based on user quotas Automatic resubmission –Depending on type of error and thresholds Merging

7 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 703 Nov 11 Pablo Saiz Workload management TEG workshop Job Broker Multiple instances –And let the single database deal with concurrent processes Classad matching between site and waiting jobs ordered by priority Requirements on: –Data location, site/queue name, TTL, disk, memory, partition, user, (available software)... Extract most common fields –Reduce number of matching

8 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 803 Nov 11 Pablo Saiz Workload management TEG workshop AliEn CE Deployed on the vobox of each site Check amount of JobAgents running/queued Asks Broker for things to do –If match, send agents to the batch system: CREAMCE, LSF, PBS, CONDOR... Can install software packages –And the JobAgent can as well Can submit to several CREAMCE Possible improvements: –Bulk submission –Debug

9 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 903 Nov 11 Pablo Saiz Workload management TEG workshop AliEn Job Agent Pilot running on the worker node Asks for jobs to execute Prepares software packages, input files Executes and monitors payload Upload results And ask for another job

10 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 1003 Nov 11 Pablo Saiz Workload management TEG workshop Torrent installation Already deployed in multiple sites: –CERN, RAL, CCIN2P3, Aalborg, UIB, UiO, KIAE, SUT... No need for shared file system Installing AliEn & all software components (300 MB) Aria torrent client Clean up after the job execution Challenge –Deploy independent trackers per site?

11 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 1103 Nov 11 Pablo Saiz Workload management TEG workshop Torrent technology alitorrent.cern.ch Site A Site B

12 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 1203 Nov 11 Pablo Saiz Workload management TEG workshop JobAgent monitoring Send heartbeat: Monitor TTL, space usage and memory consumption –If job misbehaves, stop it and report it RUNNINGZOMBIEEXPIRED 4 hours If heartbeat returns

13 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 1303 Nov 11 Pablo Saiz Workload management TEG workshop system process AliEn JobAgent Check child monitor job: rsz, vsz, disk space alive dead Finish report & repeat Ok Kills Job Not Ok fork child JobAgent $err=system(user-command); will become $err=system(ulimit –S –v FASTKILL_MEMORY –c 0; user-command); AliEn JobAgent Child user job system command monitors job Finish Kills job exceeds allocation report & finish job runs ok Memory check Register output Jeff Porter LBNL

14 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 1403 Nov 11 Pablo Saiz Workload management TEG workshop Write: Client Authen File Catalogue SERank Optimizer I’m in ‘Madrid’ Give me SEs! Try: CCIN2P3, CNAF, Kosice Similar process for read (limited to SE having the file) Can select number of SE, QoS, particular user, avoid SE... DEFAULT ARGUMENTS SHOULD BE USED WHENEVER POSSIBLE Writing the output

15 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 1503 Nov 11 Pablo Saiz Workload management TEG workshop Challenges Remote data access –Access data over WAN? Multicore jobagent –One agent per core (overkill) or –One agent per machine (needs development) Interactive jobs –PoD File level brokering –Change JDL depending on who picks the job

16 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 1603 Nov 11 Pablo Saiz Workload management TEG workshop File level brokering Site ASite BSite C File 1 File 2 File 3 File 4 File 5 Current schema Submit 4 jobs: Job 1: files 1,4, in Site A or B Job 2: file 2, in Site B or C Job 3: file 3, in Site A or C Job 4: file 5, site A, B or C File level brokering Submit 3 jobs: Job 1: for Site A Job 2: for Site B Job 3: for Site C Job analyzes all available files on site not processed yet

17 CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 1703 Nov 11 Pablo Saiz Workload management TEG workshop Summary AliEn Job Agents –Pull model –Can install everything (even AliEn) with bittorrent –Sanity checks –Monitors payload And kills it if need it –Automatic SE discovery for read/write


Download ppt "Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t DBES P. Saiz (IT-ES) AliEn job agents."

Similar presentations


Ads by Google