The OxGrid Resource Broker David Wallom. Overview OxGrid Resource Broking Why build our own Job Submission and other tools Future developments.

Slides:



Advertisements
Similar presentations
Community Grids Lab1 CICC Project Meeting VOTable Developed VotableToSpreadsheet Service which accepts VOTable file location as an input, converts to Excel.
Advertisements

1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
LUNARC, Lund UniversityLSCS 2002 Transparent access to finite element applications using grid and web technology J. Lindemann P.A. Wernberg and G. Sandberg.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Minerva Infrastructure Meeting – October 04, 2011.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager.
Embedded Systems Design ICT Embedded System What is an embedded System??? Any IDEA???
National Alliance for Medical Image Computing Grid Computing with BatchMake Julien Jomier Kitware Inc.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
Ashok Agarwal 1 BaBar MC Production on the Canadian Grid using a Web Services Approach Ashok Agarwal, Ron Desmarais, Ian Gable, Sergey Popov, Sydney Schaffer,
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Grid Computing I CONDOR.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, UWisc Condor Week April 13, 2010.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 Moving Beyond Campus Grids Steven Young Oxford NGS.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Review of Condor,SGE,LSF,PBS
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Interactive Workflows Branislav Šimo, Ondrej Habala, Ladislav Hluchý Institute of Informatics, Slovak Academy of Sciences.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
February 22-23, Washington D.C. SURA ENDyne Software for Dynamics of Electrons and Nuclei in Molecules. Developed by Dr. Yngve Öhrn and Dr. Erik Deumens,
Scheduling & Resource Management in Distributed Systems Rajesh Rajamani, May 2001.
Nicholas Coleman Computer Sciences Department University of Wisconsin-Madison Distributed Policy Management.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
10 March Andrey Grid Tools Working Prototype of Distributed Computing Infrastructure for Physics Analysis SUNY.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 22 February 2006.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Parag Mhashilkar (Fermi National Accelerator Laboratory)
Self Healing and Dynamic Construction Framework:
Workload Management System ( WMS )
GWE Core Grid Wizard Enterprise (
Building Grids with Condor
Understanding Supernovae with Condor
Basic Grid Projects – Condor (Part I)
The Condor JobRouter.
Overview of Workflows: Why Use Them?
Presentation transcript:

The OxGrid Resource Broker David Wallom

Overview OxGrid Resource Broking Why build our own Job Submission and other tools Future developments

OxGrid, a University Campus Grid Single entry point for users to shared and dedicated resources Seamless access to NGS and OSC for registered users

Resource Broking The original idea of the grid relied on efficient resource broking to abstract the user away from the resources This has been significantly neglected by grid software developers –Push or pull type of mechanism, each have significant advantages or disadvantages –Resources that have multiple job sources increase complexity many fold

Why build our own? OxGrid is intended to be a lightweight development Replacement of individual components should be simple –Use of service based interfaces are the goal Current solutions do not allow this with massive dependencies and non trivial maintenance requirements Condor-G is a simple off the shelf Grid system meta scheduler, why make it so much more complicated?

Condor Matchmaking Matchmaking is a methodology for Distributed Resource Management Conceptually simple: –Service providers and requesters advertise –Compatible advertisements are matched –Matched entities cooperate to perform service Developed for opportunistic environments –Use resources as and when available Thanks to the Miron and the Condor Team

Condor Matchmaking (Cont.) Customers and Servers advertise to a Matchmaking Service Advertisements describe advertising entities –Characteristics –Requirements and Constraints –Preferences These descriptions are called classified advertisements (classads) Thanks to the Miron and the Condor Team

Static and Dynamic Information Static information –e.g. processor architecture, physical memory, operating system, scheduling system, no. of nodes Dynamic information –e.g. system availability, scheduler load, queue length, used disk or memory

OxGrid Virtual Organisation Manager Database Final repository for authorisation information Stores additional static information for each resource such as capability and maximum number of submitted jobs for that node

Data Harvesting cycle Information sources can be added or removed at will Either a single repository for information aggregation (e.g. ngsinfo) or individual machines Simple internal representation of information gives ease of adding new types of info source

Generated classad MyType = "Machine" TargetType = "Job" Name = ”bedrock.oucs.ox.ac.uk-condor“ gatekeeper_url=”bedrock.oucs.ox.ac.uk/jobmanager-condor" Requirements=(CurMatches<20)& (TARGET.JobUniverse == 9) WantAdRevaluate = True UpdateSequenceNumber = CurMatches = 0 OpSys = "LINUX“ Arch = "INTEL" Memory = 501 MPI = False INTEL_COMPILER=True GCC3=True

Tuning Condor to act as a metascheduler The default configuration of Condor is as a cycle scavenger Alter this through ensuring that all available tasks are attempted to be matched with each pass of the Negotiator Since we are a Condor-G system only we change the default universe of the system to grid

Changes to Condor configuration DEFAULT_UNIVERSE = GLOBUS CLASSAD_LIFETIME = 900 NEGOTIATE_ALL_JOBS_IN_CLUSTER = True NEGOTIATOR_INTERVAL = 30 JOB_START_DELAY = 10 GRIDMANAGER_JOB_PROBE_INTERVAL=60

Job Submission Most users are comfortable with command-line applications –Condor submission scripts would be another language for our users to learn… –submission step as a scriptable application with argument Created job-submission

job-submission -h / -e -t Boolean transfer exe? -a EXE arguments -i Input files to be transferred -o Output files to be transferred

Job classad executable = update_file Transfer_Executable = True globusscheduler = $$(gatekeeper_url) Requirements = (TARGET.gatekeeper_url == "t2ce02.physics.ox.ac.uk/jobmanager-lcgpbs" || TARGET.gatekeeper_url == "condor.oucs.ox.ac.uk/jobmanager-condor" || TARGET.gatekeeper_url == "grid-compute.oesc.ox.ac.uk/jobmanager-pbsox" || TARGET.gatekeeper_url == "bedrock.oucs.ox.ac.uk/jobmanager-sge") && TARGET.gatekeeper_url =!= UNDEFINED && TARGET.OpSys == "LINUX" match_list_length = 1 arguments = TEST_3_2.in TEST_3_2.out transfer_input_files = TEST_3_2.in transfer_output_files = TEST_3_2.out WhenToTransferOutput = ON_EXIT universe = grid grid_type = gt2 notification = ERROR output = temp out error = temp err log = temp log queue

Additional User Tools oxgrid_certificate_import –Simplifies the installation of a user digital certificate to a single command oxgrid_q –Display the users current queue at the resource broker. Has the options to allow the user to see the full task queue. oxgrid_status –Displays the resources that are available to the user with options for all resource currently registering with the resource broker oxgrid_cleanup –Removes either a single submitted process or a range of child processes with their master

oxgrid_status

Users Statistics Materials science Inorganic chemistry Theoretical chemistry Biochemistry Computational biology Astrophysics Condensed matter physics Zoology Researchers and students

Orbitals and Electron Charge Distribution in Boron Nitride Nanostructures Dr. Amanda Barnard, (Materials Science) Simulation of the quantum dynamics of correlated electrons in a laser field. OxGrid made serious computational power easily available and was crucial for making the simulating algorithm work. Dr Dmitrii Shalashilin (Theoretical Chemistry) Molecular evolution of a large antigen gene family in African trypanosomes. OeRC/OxGrid has been key to my research and has allowed me to complete within a few weeks calculations which would have taken months to run on my desktop.Dr Jay Taylor (Statistics) OxGrid, Users

Future Developments As part of GridBS project development: –Additional direct submission into MS CCS using GridSAM BLAH –Addition of new types of data sources EGEE Grimoires Continue to improve packaging to ensure ease of installation and re-distribution

Conclusion We have designed a resource broker that is orders of magnitude small with minimal external dependencies Simple tools have allowed users of OxGrid easy access to resources in many different institutions Over 65k individual tasks have been submitted to connected resources since January