The OxGrid Resource Broker David Wallom
Overview OxGrid Resource Broking Why build our own Job Submission and other tools Future developments
OxGrid, a University Campus Grid Single entry point for users to shared and dedicated resources Seamless access to NGS and OSC for registered users
Resource Broking The original idea of the grid relied on efficient resource broking to abstract the user away from the resources This has been significantly neglected by grid software developers –Push or pull type of mechanism, each have significant advantages or disadvantages –Resources that have multiple job sources increase complexity many fold
Why build our own? OxGrid is intended to be a lightweight development Replacement of individual components should be simple –Use of service based interfaces are the goal Current solutions do not allow this with massive dependencies and non trivial maintenance requirements Condor-G is a simple off the shelf Grid system meta scheduler, why make it so much more complicated?
Condor Matchmaking Matchmaking is a methodology for Distributed Resource Management Conceptually simple: –Service providers and requesters advertise –Compatible advertisements are matched –Matched entities cooperate to perform service Developed for opportunistic environments –Use resources as and when available Thanks to the Miron and the Condor Team
Condor Matchmaking (Cont.) Customers and Servers advertise to a Matchmaking Service Advertisements describe advertising entities –Characteristics –Requirements and Constraints –Preferences These descriptions are called classified advertisements (classads) Thanks to the Miron and the Condor Team
Static and Dynamic Information Static information –e.g. processor architecture, physical memory, operating system, scheduling system, no. of nodes Dynamic information –e.g. system availability, scheduler load, queue length, used disk or memory
OxGrid Virtual Organisation Manager Database Final repository for authorisation information Stores additional static information for each resource such as capability and maximum number of submitted jobs for that node
Data Harvesting cycle Information sources can be added or removed at will Either a single repository for information aggregation (e.g. ngsinfo) or individual machines Simple internal representation of information gives ease of adding new types of info source
Generated classad MyType = "Machine" TargetType = "Job" Name = ”bedrock.oucs.ox.ac.uk-condor“ gatekeeper_url=”bedrock.oucs.ox.ac.uk/jobmanager-condor" Requirements=(CurMatches<20)& (TARGET.JobUniverse == 9) WantAdRevaluate = True UpdateSequenceNumber = CurMatches = 0 OpSys = "LINUX“ Arch = "INTEL" Memory = 501 MPI = False INTEL_COMPILER=True GCC3=True
Tuning Condor to act as a metascheduler The default configuration of Condor is as a cycle scavenger Alter this through ensuring that all available tasks are attempted to be matched with each pass of the Negotiator Since we are a Condor-G system only we change the default universe of the system to grid
Changes to Condor configuration DEFAULT_UNIVERSE = GLOBUS CLASSAD_LIFETIME = 900 NEGOTIATE_ALL_JOBS_IN_CLUSTER = True NEGOTIATOR_INTERVAL = 30 JOB_START_DELAY = 10 GRIDMANAGER_JOB_PROBE_INTERVAL=60
Job Submission Most users are comfortable with command-line applications –Condor submission scripts would be another language for our users to learn… –submission step as a scriptable application with argument Created job-submission
job-submission -h / -e -t Boolean transfer exe? -a EXE arguments -i Input files to be transferred -o Output files to be transferred
Job classad executable = update_file Transfer_Executable = True globusscheduler = $$(gatekeeper_url) Requirements = (TARGET.gatekeeper_url == "t2ce02.physics.ox.ac.uk/jobmanager-lcgpbs" || TARGET.gatekeeper_url == "condor.oucs.ox.ac.uk/jobmanager-condor" || TARGET.gatekeeper_url == "grid-compute.oesc.ox.ac.uk/jobmanager-pbsox" || TARGET.gatekeeper_url == "bedrock.oucs.ox.ac.uk/jobmanager-sge") && TARGET.gatekeeper_url =!= UNDEFINED && TARGET.OpSys == "LINUX" match_list_length = 1 arguments = TEST_3_2.in TEST_3_2.out transfer_input_files = TEST_3_2.in transfer_output_files = TEST_3_2.out WhenToTransferOutput = ON_EXIT universe = grid grid_type = gt2 notification = ERROR output = temp out error = temp err log = temp log queue
Additional User Tools oxgrid_certificate_import –Simplifies the installation of a user digital certificate to a single command oxgrid_q –Display the users current queue at the resource broker. Has the options to allow the user to see the full task queue. oxgrid_status –Displays the resources that are available to the user with options for all resource currently registering with the resource broker oxgrid_cleanup –Removes either a single submitted process or a range of child processes with their master
oxgrid_status
Users Statistics Materials science Inorganic chemistry Theoretical chemistry Biochemistry Computational biology Astrophysics Condensed matter physics Zoology Researchers and students
Orbitals and Electron Charge Distribution in Boron Nitride Nanostructures Dr. Amanda Barnard, (Materials Science) Simulation of the quantum dynamics of correlated electrons in a laser field. OxGrid made serious computational power easily available and was crucial for making the simulating algorithm work. Dr Dmitrii Shalashilin (Theoretical Chemistry) Molecular evolution of a large antigen gene family in African trypanosomes. OeRC/OxGrid has been key to my research and has allowed me to complete within a few weeks calculations which would have taken months to run on my desktop.Dr Jay Taylor (Statistics) OxGrid, Users
Future Developments As part of GridBS project development: –Additional direct submission into MS CCS using GridSAM BLAH –Addition of new types of data sources EGEE Grimoires Continue to improve packaging to ensure ease of installation and re-distribution
Conclusion We have designed a resource broker that is orders of magnitude small with minimal external dependencies Simple tools have allowed users of OxGrid easy access to resources in many different institutions Over 65k individual tasks have been submitted to connected resources since January