Download presentation
Presentation is loading. Please wait.
Published byEmilia Góra Modified over 5 years ago
1
Recent Developments in the ICAT Job Portal a generic job submission system built on a scientific data catalog NOBUGS October 2016 Brian Ritchie, Rebecca Fair Steve Fisher, Kevin Phipps, Tom Griffin Dan Rolfe, Jianguo Rao STFC Rutherford Appleton Laboratory The ICAT Job Portal, or IJP, is a generic job submission system built on the ICAT scientific data catalog. In this presentation, I’m going to give an overview the IJP and cover some recent developments. Initial design and development on the IJP was carried out by Steve Fisher and Kevin Phipps, with more recent work by Rebecca Fair and myself. As our main (ahem, only) customers, Dan Rolfe and Jianguo Rao for Octopus in the Central Laser Facility have set the requirements for much of the design
2
Overview Search for datasets / datafiles using ICAT
Configure and submit jobs to process selected datasets / datafiles on one or more batch servers Submit single job for all selected datasets/datafiles, or separate jobs for each Jobs use IDS to retrieve data (or ICAT for metadata) Monitor progress of jobs and inspect output The basic aims of the IJP are fairly straightforward: to allow users to search for datasets and datafiles using the ICAT catalogue; then configure and submit jobs to process that data on one or more batch servers. Depending on the job configuration, users can submit a single job for multiple datasets or datafiles, or submit separate jobs for each one. The jobs can use the ICAT Data Service to retrieve datasets or datafiles, or query ICAT for their metadata The IJP allows users to monitor the progress of their jobs and inspect their output (including while they are running)
3
Architecture … search retrieve ICAT & IDS IJP Server IJP web app
User’s desktop GUI (browser) CLI REST Batch System 1 IJP batch connector 1 Batch server 1 Worker Node 1 Worker Node n … submit batch job batch server magic Batch System N IJP batch connector N Batch server N This diagram shows a rough outline of the logical architecture of the IJP. The central IJP server runs as a web application (currently within Glassfish). The primary interface is via the TopCAT GUI that runs in the user’s web browser. The IJP extends TopCAT via a plugin developed by Rebecca Fair. This communicates with the IJP server via a REST interface. There is also a command-line client that uses the REST interface. The interface supports job submission and monitoring. The IJP server can be connected to one or more “batch connectors” (via another RESTful interface). Each batch connector manages a particular batch processing system (e.g. Torque, Platform LSF, unix batch) and handles job submission and monitoring requests for that particular system. How the batch system handles jobs, e.g. farming them out to individual worker nodes, is up to it (for now, at least). Logical vs. physical : IJP server + batch connector usually on same machine. ICAT/IDS tend to be on separate machine. Demo system has it all in one box!
4
Finding data Use TopCAT to find data in ICAT
Configure job for single dataset or datafile or build a cart with multiple datasets/datafiles Configure job for cart Here is an outline of how the IJP GUI can be used to find data. You won’t be able to see the fine details on these screenshots, but I just want to concentrate on the overall shape. The top screenshot shows the Browse tab; it has the usual TopCAT tools for filtering and for ordering results; but there’s an extra option to filter results by job type (more on that later). Also note that each row has a green Configure Job button to run a job on just that one dataset. To run jobs on multiple datasets or datafiles, the user can add them to a cart; and the cart view has an extra (green) button to configure a job on the entire cart.
5
Job Types Part of the IJP configuration Each Job Type specifies:
Program (job script) to run Dataset types for which the job can be run If job is batch or interactive If job accepts datasets, datafiles or both If single job can take multiple datasets/datafiles Other job parameters / options GUI filters job types depending on selected data or filter data by selecting job type first A key part of the IJP configuration is the specification of the job types that it can run. Job Types are defined in XML. Each job type specifies… … whether a single job instance can … … other parameters and options that will be added to the command-line for the job … GUI filters jobtypes depending on the dataset type of the selected data
6
Job configuration Job options Submit options
Again, you probably can’t see the details, but this just shows a dialog to configure a job; the contents of the dialog are determined by the options specified in the job type. When multiple datasets/datafiles are selected, submit options depend on whether the jobtype allows multiple datasets/datafiles per job instance. If it does, then the user can still choose to submit one job on the lot, or submit a separate job for each item in the cart. If the jobtype doesn’t allow multiple items per job, then Submit will always submit multiple jobs, one per item (mouseover makes this doubly clear!)
7
Job submission IJP server gets estimates from each batch connector
Chooses one of the best Batch connector submits job to its batch system Jobscript executable defined in job type Job is passed dataset/file IDs, ICAT/IDS session tokens and job options Batch connector monitors submitted jobs Queue status, standard/error output IJP server monitors batch connectors IJP server holds job status and output Until user deletes the job Session tokens required so that jobscripts can resolve the ids. … and the GUI monitors the IJP server (see next slide) Deleting a job removes it from the IJP’s memory, but not any provenance records that the job creates in ICAT.
8
Monitoring batch jobs Job history, status, management
Output of running job Here are a couple of screenshots showing how users can monitor jobs from the IJP GUI. The IJP plugin extends TopCAT with a My Jobs tab that lists all of the user’s jobs. The second-last column shows the job status (Queued, Executing, Completed etc.) The last column contains buttons to Cancel a job (if it is queued or running) or Delete it (after it has completed). Clicking on a job displays its standard and/or error output; if the job is still running this will be updated as it changes.
9
Interactive jobs Batch connector selects a worker node
Node is removed from pool of available workers Sets up RDP session to run interactive executable RDP connection details passed back to IJP server GUI launches Remote Desktop (Windows) or gives pasteable command line (Linux) Batch connector releases worker once session is closed tries hard not to leave dangling interactive sessions As well as batch jobs, the IJP supports interactive jobs – jobs that launch an interactive GUI. When the user asks to run an interactive job, the chosen batch connector… Linux version is a little clunky – need to think about that… It’s possible to close Remote Desktop without exiting the RDP session, so batch connector will kill idle sessions.
10
Jobscripts Executable that runs on batch system workers
Receives dataset/datafile IDs, options, session tokens on command line Uses IDS to retrieve datasets/datafiles (or ICAT for metadata) Should add provenance records to ICAT Does not communicate with the IJP Provenance: e.g. this job was run with these datasets and parameters, and created these new datasets
11
Developing jobs Create jobscript Deploy jobscript on batch system
Python utility library for argument processing Python-icat or similar to work with ICAT / IDS Deploy jobscript on batch system Define jobtype XML Add to IJP server configuration (dynamic) Utility library mainly just extends the python args processor to declare the IJP arg options Torque batch processor uses puppet to manage the worker nodes; so add jobscripts to puppet config.
12
Recent developments New GUI TopCAT plugin (AngularJS)
RESTful interface to IJP server Original GWT GUI still part of server, but won’t be developed further OK, that should be Recent Development (singular). See Future Development for what else we’d hoped to have done by now The GUI work was done by Rebecca Fair, initially as a variant of TopCAT, then as a plugin.
13
Current status One active customer, Octopus (CLF)
Test system in place (ingest, jobscript development) Not yet in production Batch connectors Torque Unix batch (for demos/tests) Platform LSF (incomplete) Perhaps should say more about Octopus, but may not have time! Central Laser Facility. Ingest process does not use the IJP Hopefully in production next year.
14
Future development Improve batch system brokering
Add batch requirements to job types (e.g. requires GPU) Support versioning of datasets Specific requirement from Octopus – post-ingestion modifications IJP GUI should only show latest version of each dataset (custom results filtering) Version management separate from IJP, but may be developed as jobs Integration with / adoption of DaaS architecture? (Is DaaS specific to STFC?)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.