Download presentation
Presentation is loading. Please wait.
Published byBethanie Dennis Modified over 9 years ago
1
1 Using Stork Barcelona, 2006 Condor Project Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor
2
2 Meet Friedrich* Friedrich is a scientist with a BIG problem. *Frieda’s twin brother
3
3 I have a lot of data to process.
4
4 Friedrich's problem … Friedrich has many large data sets to process. For each data set: 1.stage the data in from a remote server 2.run a job to process the data 3.stage the data out to a remote server
5
5 The Classic Data Transfer Job #!/bin/sh globus-url-copy source dest Scripts often work fine for short, simple data transfers, but…
6
6 Many things can go wrong! These errors are more likely with large data sets: The network is down. The data server is unavailable. The transferred data is corrupt. The workflow does not know that the data was bad.
7
7 Stork Solves Problems Creates the concept of the data placement job Managed and scheduled the same as any Condor job Friedrich’s jobs benefit from built-in fault tolerance
8
8 Supported Data Transfer Protocols local file system GridFTP FTP HTTP SRB NeST SRM and, it is extensible to other protocols
9
9 Fault Tolerance Retries failed jobs Can also retry a failed data transfer job using an alternate protocol. For example, first try GridFTP, then try FTP Retry stuck jobs Configurable fault responses
10
10 Getting Stork Stork is part of Condor, so get Condor... Available as a free download from http://www.cs.wisc.edu/condor Currently available for Linux platforms
11
11 Personal Condor works well with Stork This is Condor/Stork on your own workstation, no root access required, no system administrator intervention needed After installation, Friedrich submits his jobs to his Personal Stork…
12
12 Friedrich’s Personal Condor Master Central Mgr. SchedD StartD Stor k N compute elements external data servers DAG data jobs CPU jobs Friedrich's workstation:
13
13 Stork will... Keep an eye on data placement jobs, and it will keep you posted on their progress Throttle the maximum number of jobs running Keep a log of job activities Add fault tolerance to all jobs Detect and retry failed data placement jobs
14
14 The Submit Description File Just like the rest of Condor, a plain ASCII text file, but with a different format Written in new ClassAd language Neither Stork nor Condor care about file name extensions Contents of file tells Stork about jobs: data placement type, source/destination location/protocol, proxy location, alternate protocols to try
15
15 Simple Submit File // c++ style comment lines // file name is stage-in.stork [ dap_type = " transfer " ; src_url = “ http://server/path " ; dest_url = " file:///dir/file " ; log = " stage-in.log " ; ] Note: different format from Condor submit files
16
16 Another Simple Submit File // c++ style comment lines // file name is stage-in.stork [ dap_type = " transfer " ; src_url = “ gsiftp://server/path " ; dest_url = " file:///dir/file " ; x509proxy = " default " ; log = " stage-in.log " ; ] Note: different format from Condor submit files
17
17 Running stork_submit Give stork_submit the name of the submit file: % stork_submit stage-in.stork stork_submit parses the submit file, checks for it errors, and sends the job to the Stork server. stork_submit returns the created job id (a job handle)
18
18 Sample stork_submit % stork_submit stage-in.stork ================ Sending request: [ dest_url = "file:///dir/file"; src_url = “http://server/path"; dap_type = "transfer"; log = "path/stage-in.log"; ] ================ Request assigned id: 1 job id
19
19 The Job Queue stork_submit sends the job to the Stork server The Stork server manages the local job queue View the queue with stork_q, or stork_status
20
20 Job Status stork_q queries all active jobs % stork_q stork_status queries the given job id, which may be active, or complete % stork_status 12
21
21 Removing jobs To remove a data placement job from the queue, use stork_rm You may only remove jobs that you own (Unix root may remove anyone’s jobs) Give a specific job ID % stork_rm 21 removes a single job
22
22 Use Log Files // c++ style comment lines [ dap_type = " transfer " ; src_url = " gsiftp://server/path " ; dest_url = " file:///dir/file " ; x509proxy = " default " ; log = " stage-in.log " ; ]
23
23 Sample Stork User Log 000 (001.-01.-01) 04/17 19:30:00 Job submitted from host:... 001 (001.-01.-01) 04/17 19:30:01 Job executing on host:... 008 (001.-01.-01) 04/17 19:30:01 job type: transfer... 008 (001.-01.-01) 04/17 19:30:01 src_url: gsiftp://server/path... 008 (001.-01.-01) 04/17 19:30:01 dest_url: file:///dir/file... 005 (001.-01.-01) 04/17 19:30:02 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job...
24
24 Stork and DAGMan Data placement jobs are integrated with Condor’s DAGMan, and Friedrich benefits
25
25 Defining Friedrich's DAG data to stage in crunch the data data to stage out
26
26 Friedrich’s DAG input1input2 crunch result
27
27 The DAG Input File # file name is friedrich.dag DATA input1 input1.stork DATA input2 input2.stork JOB crunch process.submit DATA result result.stork PARENT input1 input2 CHILD crunch PARENT crunch CHILD result
28
28 One of the Stork Submit Files // file name is input1.stork [ dap_type = " transfer " ; src_url = “http://north.cs.wisc.edu/ ~freidrich/data1 " ; dest_url = " file:///home/friedrich/in1 " ; log = " in1.log " ; ]
29
29 Condor Submit Description File # file name is process.submit universe = vanilla executable = process input = in1 output = crunch.result error = crunch.err log = crunch.log queue
30
30 Stork Submit File // file name is result.stork [ dap_type = " transfer " ; src_url = " file:///home/friedrich/crunch.result " ; dest_url = “http://north.cs.wisc.edu/ ~friedrich/final.results " ; log = " result.log " ; ]
31
31 Friedrich Submits the DAG While Friedrich’s current working directory is /home/friedrich % condor_submit_dag friedrich.dag
32
32 In Review With Stork Friedrich now can… Submit data processing jobs and go home! Because, Stork manages the data transfers, including fault detection and retry Condor DAGMan manages dependencies.
33
33 Additional Resources http://www.cs.wisc.edu/condor/stork/ Condor Manual, Stork section stork-announce@cs.wisc.edu list stork-discuss@cs.wisc.edu list
34
34 Additional Slides
35
35 Important Parameters STORK_MAX_NUM_JOBS limits number of active jobs STORK_MAX_RETRY limits job attempts, before job marked as failed STORK_MAXDELAY_INMINUTES specifies “hung job” threshold
36
36 Current Restrictions Currently, best suited for “Personal Stork” mode Local file paths must be valid on Stork server, including submit directory. To share data, successive jobs in DAG must use shared filesystem
37
37 Future Work Enhance multi-user fair share Enhance support for DAGs without shared file system Enhance scheduling with configurable job requirements and rank Add DAP job matchmaking Additional platform ports
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.