Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager.

Slides:



Advertisements
Similar presentations
Legacy code support for commercial production Grids G.Terstyanszky, T. Kiss, T. Delaitre, S. Winter School of Informatics, University.
Advertisements

Distributed Data Processing
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
A Computation Management Agent for Multi-Institutional Grids
The Community Authorisation Service – CAS Dr Steven Newhouse Technical Director London e-Science Centre Department of Computing, Imperial College London.
Dr. David Wallom Experience of Setting up and Running a Production Grid on a University Campus July 2004.
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.
OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
Workload Management Massimo Sgaravatto INFN Padova.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Microsoft Active Directory(AD) A presentation by Robert, Jasmine, Val and Scott IMT546 December 11, 2004.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Computational grids and grids projects DSS,
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Using NMI Components in MGRID: A Campus Grid Infrastructure Andy Adamson Center for Information Technology Integration University of Michigan, USA.
1 Introduction to Microsoft Windows 2000 Windows 2000 Overview Windows 2000 Architecture Overview Windows 2000 Directory Services Overview Logging On to.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
Institute For Digital Research and Education Implementation of the UCLA Grid Using the Globus Toolkit Grid Center’s 2005 Community Workshop University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 Moving Beyond Campus Grids Steven Young Oxford NGS.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Page 1 Printing & Terminal Services Lecture 8 Hassan Shuja 11/16/2004.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Linux Operations and Administration
Selenium server By, Kartikeya Rastogi Mayur Sapre Mosheca. R
The Storage Resource Broker and.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 22 February 2006.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Chapter 2: System Structures
Grid Computing.
Chapter 2: Operating-System Structures
Chapter 2: Operating-System Structures
Grid Computing Software Interface
Presentation transcript:

Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

Oxford Interdisciplinary e-Research Centre I e R C Outline What is a grid? Why make a campus grid? How we are making it? –Central Systems –Software –Resources –Users How can the ICT/ECE help this activity?

Oxford Interdisciplinary e-Research Centre I e R C What makes a Grid a Grid? Single sign-on to multiple resources located in different administrative domains. A Virtual Organisation of users that spans physical organisational boundaries.

Oxford Interdisciplinary e-Research Centre I e R C The Problem Many new problems in research have a need for massive computational and data access Research work increasingly limited by the capacity of accessible resources.

Oxford Interdisciplinary e-Research Centre I e R C The Solution If the computational or data need is too large for a single existing resource, construct a system able to concurrently use a number of appropriate resources. –Designed so that; use single sign-on to access multiple resources and switch between each seamlessly layout can be dynamically altered without user interruption once a job has been started or data placed on a remote resource, its status is monitored to make sure it stays running/available!

Oxford Interdisciplinary e-Research Centre I e R C Why make a campus grid? Many computers throughout the University under-utilised: –PCs, already purchased – depreciating daily Idle time and unused disk space are being wasted. e.g. OULS has up to 1200 desktop computers. –Clusters are expensive to purchase, house and run (extra FTEs). Rarely 100% utilised Users forced to queue to find suitable resources for their research.

Oxford Interdisciplinary e-Research Centre I e R C Why make a campus grid? Develop and deploy Grid technology to use under-utilised resources: –Higher utilisation Connect them together so that more often than not a free resource is available, minimising queue time. –Amplify system administrator effort. –Substantially increase the research computing power available Ensure that should applications reach a suitable resource ASAP, certainly quicker than in a single cluster

Oxford Interdisciplinary e-Research Centre I e R C OxGrid, a University Campus Grid Single entry point for Oxford users to shared and dedicated resources Seamless access to National Grid Service and OSC for registered users Single sign-on using PKI technology integrated with current methods NGSOSC OxGrid Central Management Resource Broker MDS/ VOM Storage College Resources Departmental Resources Oxford Users

Oxford Interdisciplinary e-Research Centre I e R C Authorisation And Authentication Initially use the standard UK e-Science Certification Authority –X509 digital certificates issued on a per user basis. –OUCS is a Registration Authority for this CA For users that only wish to access internal (university) resources, a Kerberos CA has been installed, controlled by the Oxford central Kerberos system (Herald username) Use an online credential repository to minimise user - certificate interaction

Oxford Interdisciplinary e-Research Centre I e R C Central System Components Information Service –Contains all system status information on which the resource broker makes decisions, retrieving information from all clients in the system Resource Broker –User access and distribution of submitted tasks to appropriate resources Systems monitoring –Monitoring system for helpdesk first point of system contact in case of problems Virtual Organisation Management and Resource Usage Service –Control a virtual community whose members can use various resources –Create accounting information so that full system as well as single resource use can be recorded and hence possibly charged for Storage –Create a dynamic multi-homed virtual file system –User metadata mark-up for improved data mining

Oxford Interdisciplinary e-Research Centre I e R C Grid Middleware Virtual Data Toolkit –Chosen for stability & support structure –Platform independent installation method –Widely used in other European production grid systems –Contains Globus Toolkit™ version 2.4 with several enhancements GSI enhanced OpenSSH myProxy Client & Server

Oxford Interdisciplinary e-Research Centre I e R C Information Server Globus Grid Resource Information Index Central LDAP database for system information System information, CPU, memory etc. Scheduler queue status, number of running & queued tasks Further additions to published data easily managed Pull model for retrieving data from clients

Oxford Interdisciplinary e-Research Centre I e R C Resource Broker Uses the Condor-G™ meta-scheduler –Can be considered a large batch processing system –Condor-G allows treatment of a remote resource (cluster, PC pool) as a local resource –Command-line tools available to perform job management (submit, query, cancel, etc.) with detailed logging –Simple job submission language which is translated into remote scheduler specific language Custom script for determination of resource status & priority. Integrated the Condor Resource description mechanism and Globus Monitoring and Discovery Service.

Oxford Interdisciplinary e-Research Centre I e R C OxGrid specific information added Priority of resource dependant on current load measured against possible load List of installed software on each node Resource usage permissions (registered users of NGS, OSC)

Oxford Interdisciplinary e-Research Centre I e R C Job to Resource Matching For each resource that is accessible to the Resource Broker a machine advertisement is created. –Contains information such as CPU type, available memory and any additional information such as load etc. For each job that is submitted to the Resource Broker a job advertisement is created. –This has the job requirements, such as CPU type, memory necessary etc. Specific daemon within the system does matchmaking between the job requirements and the resource properties.

Oxford Interdisciplinary e-Research Centre I e R C Resource Broker Operation

Oxford Interdisciplinary e-Research Centre I e R C Virtual Organisation Management Globus uses a mapping between Distinguished Name (DN) as defined in a Digital Certificate to local usernames on each resource. Important that for each resource that a user is expecting to use, his DN is mapped locally. Have to also make sure the correct resources are registered.

Oxford Interdisciplinary e-Research Centre I e R C Virtual Organisation Management and Accounting OxVOM –Custom in-house designed Web based user interface –Persistent information stored in relational database –User DN list retrieved by remote resources using standard tools Resource Usage Service –Installed software altered to include commands to determine job start and stop time as well as interface with host scheduling system –Using Global Grid Forum User Record Usage Service standard –Information returned from client to RUS server when job completed and stored in persistent database

Oxford Interdisciplinary e-Research Centre I e R C OxGrid VOM

Oxford Interdisciplinary e-Research Centre I e R C Resource Usage Service Enables presentation of system use to users as well as system owners Can form the basis of a charging model

Oxford Interdisciplinary e-Research Centre I e R C Systems Monitoring ‘Ganglia’ monitoring tool for system status and graphical representation Simple interface showing immediate hardware problems as well as system load Well understood by helpdesk and support staff Open source with simple configuration

Oxford Interdisciplinary e-Research Centre I e R C Ganglia System Monitoring

Oxford Interdisciplinary e-Research Centre I e R C Core Resources Individual Departmental Clusters (PBS, LSF, SGE) –Grid software interfaces –Management of users –Owner controlled access through local management software Condor clusters of PCs –Single master running up to ~500 nodes –Condor masters run either by owners or IeRC

Oxford Interdisciplinary e-Research Centre I e R C External Resources Only accessible to users that have registered with them –National Grid Service Peered access with individual systems –OSC Gatekeeper system User management done through standard account issuing procedures and manual DN mapping Controlled grid submission to Oxford Supercomputing Centre

Oxford Interdisciplinary e-Research Centre I e R C Services necessary to connect to OxGrid For a system to connect to OxGrid –Must support a minimum software set (without which it is impossible to submit jobs from the Resource Broker) Globus 2.4 job management and RUS compatible jobmanager MDS compatible information server –Desirable though not mandated OxVOM compatible grid-mapfile installation scripts With a scheduling system installed the system administrator is in control

Oxford Interdisciplinary e-Research Centre I e R C Connecting Clusters into OxGrid, 1 Direct connection –Install middleware etc. onto system head nodes Automated installation script Well known procedure –Known port numbers for services and port range for data transfer –Addition of ~30 user pool accounts Example of this type of setup is Oxford NGS node –Contact Steven Young (OeSC)

Oxford Interdisciplinary e-Research Centre I e R C Connecting Clusters into OxGrid, 2 Indirect –Separate gatekeeper system with submission components of local scheduler Transfer Queues on each gatekeeper Decouples Globus from local resources –Hides internals from the Grid users –Many clusters can be handled by one system jobmanager Example of this type of installation is the old OSC Gatekeeper. –Contact Jon Lockley (OSC)

Oxford Interdisciplinary e-Research Centre I e R C Connecting PCs, 1 Student labs, libraries and college terminal rooms Very different usage patterns for this type of resource –Systems inaccessible out of hours, greatest performance from dual boot using Windows/Scientific Linux Can have environmental and power considerations –24 hour access, coLinux virtual machine installation running in parallel with native OS Both of these types of systems use Condor and a Linux condor master server.

Oxford Interdisciplinary e-Research Centre I e R C Connecting PCs, 2 Install Windows Condor client –Runs a system service Configured either to hold when local user or to run at all times with low priority –Studies by several groups have shown that for modern systems a student user sees no system performance difference between the two –Downside there is a significant extra effort needed because of code recompiling and porting. Some code will not run because of external libraries availability –‘Services for Unix’ being investigated to run linux jobs natively on Windows systems.

Oxford Interdisciplinary e-Research Centre I e R C Environmentally aware Condor systems Increasingly system owners shutdown machines that are not being used. –Save electricity Develop a scheme to still use these systems within OxGrid –Take advantage of Wake-On-LAN technology. –Automate load balancing to start and stop worker nodes as necessary.

Oxford Interdisciplinary e-Research Centre I e R C Connecting Others Sun –Create Sun Grid Engine clusters and then perform direct connection method Mac –Apple have their own grid software Xgrid Not fully tested –Supported by Condor

Oxford Interdisciplinary e-Research Centre I e R C Data Management Engagement of data as well as computationally intensive research groups Provide a remote store for those groups that cannot resource their own Distribute the client software as widely as possible, including departments that are not currently engaged in e-Research

Oxford Interdisciplinary e-Research Centre I e R C Data Management Software for creation of system –Storage Resource Broker to create large virtual datastore Through central metadata catalogue users interface with single virtual file system though physical volumes may be on several network resources In built metadata capability

Oxford Interdisciplinary e-Research Centre I e R C SRB Architecture MCAT Disk Server1 Disk Server2 Mcat Server USER

Oxford Interdisciplinary e-Research Centre I e R C SRB as a Data Grid SRB MCAT DB SRB Data Grid has arbitrary number of servers Complexity is hidden from users

Oxford Interdisciplinary e-Research Centre I e R C SRB Client Implementations inQ – Window GUI browser Jargon – Java SRB client classes –Pure Java implementation mySRB – Web based GUI –run using web browser Java Admin Tool –GUI for User and Resource management Matrix – Web service for SRB work flow

Oxford Interdisciplinary e-Research Centre I e R C How users interact with OxGrid Log in to system head node (Resource Broker) Create digital credential Use ‘job-submission’ script to create and submit jobs onto Condor-G system.

Oxford Interdisciplinary e-Research Centre I e R C Supporting OxGrid First point of contact is OUCS Helpdesk through support . –Preset list of questions to ask and log files to see if available. –Not expected to do any actual debugging. –Pass problems onto Grid experts who pass hardware problems on a system by system basis to their own maintenance staff. Answer grid software problems themselves. Significant cluster support expertise within OeSC/IeRC. As one of the UK e-Science Centres we also have access to the Grid Support Centre.

Oxford Interdisciplinary e-Research Centre I e R C Users Installed several example applications –Plasma physics –Polymer physics –Biochemistry protein docking –Graphics rendering We have our first Oxford user code example –Dr Peter Grout, Chemistry Contacting currently registered users of both OSC as well as NGS. –Beneficial to these systems to remove ‘serial’ users that don’t need to be there to provide more capability to those that must be there. Data provision is an integral component of the grid –Contacting Humanities and other large data users

Oxford Interdisciplinary e-Research Centre I e R C Collaboration Configuring computational components to share resources between Harvard & Monash Universities as proof of principle of global campus grids. Configuring Storage System to allow safe, secure multi-site storage of data with Monash.

Oxford Interdisciplinary e-Research Centre I e R C How the ICT Strategy & ECE can help Produce single uniform configuration of ~2000 systems. Willingness at the design outset to include the capacity to use systems for computation and hence include as a key criteria in final system choice. Consider using a supported architecture that is popular with computationally active researchers. Use an underlying system management software that is flexible enough to allow for usage changes of resources, e.g. Alteris. Persuade that efficient usage of resources and sharing is within everyone's best interests.

Oxford Interdisciplinary e-Research Centre I e R C The Future Improve RB system usage algorithm Install Service based grid software on test system to provide transition information Package central server modules for public distribution

Oxford Interdisciplinary e-Research Centre I e R C The Future, 2 Develop Windows/Linux Condor pools so that all shared systems can be included Continue contacting users to expand the user base Design and construct user training courses.

Oxford Interdisciplinary e-Research Centre I e R C Conclusions Users are already able to log onto the Resource Broker and schedule work onto the NGS, OSC and OUCS Condor Systems We are working as quickly as possible to engage more users We need these users to then go out and evangelise to bring in both more users and resource.

Oxford Interdisciplinary e-Research Centre I e R C Contact Telephone: