3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner.

Slides:



Advertisements
Similar presentations
Distributed Systems Architecture Research Group Universidad Complutense de Madrid EGEE UF4/OGF25 Catania, Italy March 2 nd, 2009 State and Future Plans.
Advertisements

Part 7: CondorG A: Condor-G B: Laboratory: CondorG.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.
Tutorial on Distributed High Performance Computing 14:30 – 19:00 (2:30 pm – 7:00 pm) Wednesday November 17, 2010 Jornadas Chilenas de Computación 2010.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Assignment 3 Using GRAM to Submit a Job to the Grid James Ruff Senior Western Carolina University Department of Mathematics and Computer Science.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
6d.1 Schedulers and Resource Brokers ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson.
Grid Computing 7700 Fall 2005 Lecture 17: Resource Management Gabrielle Allen
KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Computing I CONDOR.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
Condor Birdbath Web Service interface to Condor
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
Parallel Optimization Tools for High Performance Design of Integrated Circuits WISCAD VLSI Design Automation Lab Azadeh Davoodi.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor-G Operations.
Grid job submission using HTCondor Andrew Lahiff.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks S. Natarajan (CSU) C. Martín (UCM) J.L.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks, An Overview of the GridWay Metascheduler.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Institute For Digital Research and Education Implementation of the UCLA Grid Using the Globus Toolkit Grid Center’s 2005 Community Workshop University.
1 Condor BirdBath SOAP Interface to Condor Charaka Goonatilake Department of Computer Science University College London
Grid Security: Authentication Most Grids rely on a Public Key Infrastructure system for issuing credentials. Users are issued long term public and private.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Review of Condor,SGE,LSF,PBS
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.
1 Introduction to Teaching Grid Computing Dr. Clayton Ferner University of North Carolina Wilmington Dr. Barry Wilkinson University of North Carolina Charlotte.
Grid Compute Resources and Job Management. 2 Job and compute resource management This module is about running jobs on remote compute resources.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Job Router.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor-G: Condor and Grid Computing.
GridWay Overview John-Paul Robinson University of Alabama at Birmingham SURAgrid All-Hands Meeting Washington, D.C. March 15, 2007.
Workload Management Workpackage
Wide Area Workload Management Work Package DATAGRID project
Grid Computing Software Interface
Condor-G: An Update.
Presentation transcript:

3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification date: Feb 15, 2010

Meta-schedulers Schedule jobs across distributed sites Highly desirable in a Grid computing environment. For a Globus installation, interfaces to local Globus GRAM installation, which in turn interfaces with local job scheduler Uses whatever local scheduler present at each site 3-2.2

3-2.3 Fig 3-17 Meta-scheduler interfacing to Globus GRAM

Condor-G A version of Condor that interfaces to Globus environment. Jobs submitted to Condor through Grid universe and directed to Globus job manager (GRAM) Fig 3-18

Simple job description file universe = grid grid_resource = gt4 wsrf/services/ManagedJobFactoryService Fork executable = /usr/bin/uptime log = condor_test1.log output = condor_test1.out error = condor_test1.error should_transfer_files = YES when_to_transfer_output = ON_EXIT queue 3-2.5

3-2.6 Communication between user, myProxy server, and Condor-G for long-running jobs Fig 3.19

Gridway A meta-scheduler designed specifically for a Grid computing environment Interfaces to Globus components. Project began in Now open source. Became part of Globus distribution from version onwards (June 2007)

Has ability to match jobs to resources using both static information about job and resources, and dynamic information (resource load) Dynamic scheduling Automatic job migration, including controlled by the job during execution Checking for both fault tolerance and dynamic job migration Reporting and accounting facilities Basic job dependencies (workflow) 3-2.8

Can be installed on: Client machines to interact with a distributed system or Server where multiple users access it. Uses file transfer, execution management, and information services of Globus 3-2.9

Globus components used with Gridway Fig 3-20

Submitting a Gridway Job Jobs described in a Gridway job template (GWJT). Sample EXECUTABLE = /bin/ls EXECUTABLE = /usr/bin/uptime STDOUT_FILE = stdout.${JOB_ID} STDERR_FILE = stderr.${JOB_ID} If above saved as file myJob.jt, command: gwsubmit -t myJog.jt would cause it to be submitted to be executed on a Grid resource

Job matching Uses REQUIREMENTS and RANK (similar to Condor) REQUIREMENTS expression has to evaluate to TRUE for the execution host to be consider at all for job. RANK expression computed for each host and host with higher ranks used first for job

Example: RANK = CPU_MHZ

Array Jobs Specifying multiple instances of job, with different arguments. EXECUTABLE = myJob.exe ARGUMENTS = ${TASK_ID} STDOUT_FILE = stdout_file.${TASK_ID} STDERR_FILE = stderr_file.${TASK_ID} RANK = CPU_MHZ RANK expression - processors with higher clock frequency preferred (should be coupled with processor type if higher clock frequency to mean higher performance.) Array of 10 instances of myJob.exe could be submitted with: gwsubmit -t myJob.jt -n

Distributed Resource Management (DRM) systems Term used to cover job schedulers and the like. Several choices of DRM’s for a system, each having different characteristics and modes of operation and different commands and APIs

Distributed Resource Management Application (DRMAA) (pronounced “drama”) Standard set of API’s for submission and control of jobs to DRM’s Bindings in C/C++, Java, Perl, Python, and Ruby for a range of DSMs including (Sun) Grid Engine, Condor, PBS/Torque, LSF and Gridway

Scheduler with DRMAA interface Fig 3.21

Example of the use of DRMAA Fig 3.22

Questions