Download presentation
Presentation is loading. Please wait.
Published byGyles Harvey Modified over 9 years ago
1
November 3, 20001 FBSNG Overview Jim Fromm Farms and Clustered Systems Group, Computing Division, Fermilab
2
November 3,2000http://www-isd.fnal.gov/fbsng2 People FCS Group: Jim Fromm (Fermilab) Tanya Levshina (Fermilab) Igor Mandrichenko (Fermilab) Krzysztof Genser (Fermilab) Former FCS Group Members: Mark Breitung Marilyn Schweitzer FBSNG users/testers who provided significant feedback: Antonio Wong Chan (Academia Sinica, Taiwan, CDF) Yen-Chu Chen (Academia Sinica, Taiwan, CDF) Miroslav Siket (Academia Sinica, Taiwan, CDF) Heidi Schellman (Northwestern University, D0) Steve Wolbers (Fermilab, CDF) Ping Yeh (Academia Sinica, Taiwan, CDF) Thomas Las (Minooka Junior High, Minooka, IL).
3
November 3,2000http://www-isd.fnal.gov/fbsng3 History and Status FBS project goals Replace CPS batch in PC farms environment Develop a farm batch system for file-based parallel data processing style adopted for RunII Do not preclude event-based parallelism Milestones Spring 1998 - initial FBS design, first working prototypes Fall 1998- first production users (E871) Spring 1999 - review by CDF, D0 Fall 1999 - FBS v2.2 Fall 1999 - beginning of FBSNG project July 2000 - FBSNG v1.0 released Oct 2000 – FBSNG v1.1 released Currently FBSNG is installed on: Fixed target farm CDF, D0 farms NIKHEF (D0 collaborators)
4
November 3,2000http://www-isd.fnal.gov/fbsng4 FBSNG Immediate Project Goals Stop using LSF as scheduler and job storage as was done with previous versions of FBS. Reduce maintenance and support costs Avoid possible scalability problems Simplify maintenance Allows for addition of new features (such as resource management). Decision made to release FBSNG as soon as a core set of features were implemented. First requirement was to “Not break anything!” Preserve as many features of FBS as reasonably possible Add few fundamental features such as: Abstract resources Customizable scheduler
5
November 3,2000http://www-isd.fnal.gov/fbsng5 Long Term Goals Dynamic re-configuration (implemented in V1.1) Further development of resource management (resource pools implemented in v1.0) Integration with FIPC, the Farms Interprocess Communications toolkit developed at Fermilab. To make FBSNG an “open” system, accomplished through the API.
6
November 3,2000http://www-isd.fnal.gov/fbsng6 FBSNG Redesign Whenever possible, features were carried forward from FBS to FBSNG. The look and feel of FBSNG is very much like FBS, but FBS and FBSNG are not compatible. Feedback from users is that converting to FBSNG was relatively easy.
7
November 3,2000http://www-isd.fnal.gov/fbsng7 FBSNG Design (Big Picture) BMGR functions: Scheduling Resource management Job storage Communication with API clients
8
November 3,2000http://www-isd.fnal.gov/fbsng8 FBSNG Concepts: Farm Model
9
November 3,2000http://www-isd.fnal.gov/fbsng9 FBSNG Concepts: Job Consists of Sections
10
November 3,2000http://www-isd.fnal.gov/fbsng10 FBSNG Resources FBSNG allows for several new ideas in resource management Global resources Visible to the entire farm Examples: Disk space on NFS server Network bandwidth Local resources Visible on individual nodes Examples: CPU Local disk Attributes Attributes are local to a particular node. Attributes are just there, they aren’t “used up”. Examples: Special software installed (Fortran compiler). Version of OS FBSNG assumes users know how a job will use resources, and assumes that the user will give this info to FBSNG. Resources are just counters to FBSNG, it does not know anything about what they represent.
11
November 3,2000http://www-isd.fnal.gov/fbsng11 Resource Pools Pools are collections of similar resources. The actual resources in a resource pool are referred to as underlying resources. Examples: Multiple scratch disks on a given host. Users could specify 2GB of scratch disk, not caring which specific disk has 2GB free. A user could have a job that needs to run on any version of Linux. A resource pool named Linux could be created with underlying resources (attributes) of Linux52 and Linux61.
12
November 3,2000http://www-isd.fnal.gov/fbsng12 Resource Management Process type per task or project Soft association between queue and process type (user can override) User can request additional resources Queue is more of a scheduling than resource management entity More flexibility for users Only per-process type resource quotas
13
November 3,2000http://www-isd.fnal.gov/fbsng13 User Interface Job Submission: Users issue submit command and provide the name of a Job Description File(JDF). The JDF file contains control information needed by FBSNG such as: Section Name Executable Name Queue Number of processes to spawn. Job Control: Monitor status Kill/cancel Hold/release History Farm node status Resource utilization statistics Scheduler status
14
November 3,2000http://www-isd.fnal.gov/fbsng14 FBSNG: Example of Job Description File (JDF) SECTION Stage QUEUE=IO_Q EXEC=stage.sh VSN123 /stage01 NUMPROC=1 SECTION Reconstructor QUEUE=Long_Q EXEC=reco123.sh /stage01 /stage02 NUMPROC=10 DEPEND=done(Stage) PROC_RESOURCES=disk:5 Linux Blue SECTION Clean-up QUEUE=Short_Q PROC_TYPE = Light EXEC=clean.sh /stage02 VSN123 NUMPROC=1 PRIO_INC = 5 DEPEND=failed(Reconstructor) First section: pre-stage data Queue to submit to Command to execute 1 process Second section: reconstruction 10 processes Only if pre-staging succeeds 5(GB) of local disk, Linux, Blue node Emergency clean-up section Override default process type Run at higher priority Run only if reconstructor fails
15
November 3,2000http://www-isd.fnal.gov/fbsng15 Batch Process Environment Environment Variables FBS_JOB_ID FBS_SECTION_NAME FBS_JOB_SIZE- number of processes in the section FBS_SCRATCH- Scratch disk area for user processes. FBS_PROC_NO- logical process id(1…FBS_JOB_SIZE) FBS_SECTION_NAME – The name of the section. FBS_HOSTS- list of nodes assigned to this job. FBS_PROC_STDOUT- path of processes stdout FBS_PROC_STDERR- path of processes stderr HOME- home directory Others… Current working directory is HOME Stderr, stdout as specified in JDF
16
November 3,2000http://www-isd.fnal.gov/fbsng16 Scheduler Algorithms are based on the idea of dynamic priorities Controllable fair-share scheduling Projects are assigned relative shares of farm resources Guaranteed scheduling. Avoids starvation. No infinite delays for big jobs Will hold small jobs if necessary
17
November 3,2000http://www-isd.fnal.gov/fbsng17 API FBSNG provides a Python API that allows: Job submission Job monitoring and control Resource management and monitoring UI, GUI are layered on top of the API
18
November 3,2000http://www-isd.fnal.gov/fbsng18 FBSNG Requirements On control node: bmgr daemon logd daemon (optional) On each worker node: Launcher (root) Rstatd (optional) Software/hardware requirements: Python (Most of the FBSNG sources are Python) Tcl/Tk, Tkinter (for GUI) FCSLIB package available from fermitools. FCSLIB contains some Python modules used by FBSNG. Configuration files synchronized on all worker nodes (NFS works well)
19
November 3,2000http://www-isd.fnal.gov/fbsng19 FBSNG Project Status and Plans for Future FBSNG V1.0 (IRIX, Linux) released July 2000. FBSNG V1.1 (IRIX, Linux) released October 2000. Available in Fermitools V1.1 is in production. Feedback on V1.0 and V1.1 thus far has been positive. See http://www-isd.fnal.gov/fbsng for project details.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.