Heterogeneous Grid Design and Implementation Thesis Presentation By Jeffrey Wells State University New York Institute of Technology May 7, 2008 CSC 599.

Slides:



Advertisements
Similar presentations
CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding or
Advertisements

Community Grids Lab1 CICC Project Meeting VOTable Developed VotableToSpreadsheet Service which accepts VOTable file location as an input, converts to Excel.
Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
High Performance Computing Course Notes Grid Computing.
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
The Globus Toolkit Gary Jackson. Introduction The Globus Toolkit is a product of the Globus Alliance ( It is middleware for developing.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Grid Services at NERSC Shreyas Cholia Open Software and Programming Group, NERSC NERSC User Group Meeting September 17, 2007.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Grids and Globus at BNL Presented by John Scott Leita.
Simo Niskala Teemu Pasanen
Globus Computing Infrustructure Software Globus Toolkit 11-2.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Grid Toolkits Globus, Condor, BOINC, Xgrid Young Suk Moon.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
High Performance Louisiana State University - LONI HPC Enablement Workshop – LaTech University,
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
GRAM: Software Provider Forum Stuart Martin Computational Institute, University of Chicago & Argonne National Lab TeraGrid 2007 Madison, WI.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Computing I CONDOR.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison
Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
© 2007 UC Regents1 Track 1: Cluster and Grid Computing NBCR Summer Institute Session 1.1: Introduction to Cluster and Grid Computing July 31, 2007 Wilfred.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Authors: Ronnie Julio Cole David
Grid Security: Authentication Most Grids rely on a Public Key Infrastructure system for issuing credentials. Users are issued long term public and private.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
© Geodise Project, University of Southampton, Geodise Middleware Graeme Pound, Gang Xue & Matthew Fairman Summer 2003.
Introduction to Grid Computing and its components.
Grid Interoperability Update on GridFTP tests Gregor von Laszewski
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Job Router.
10 March Andrey Grid Tools Working Prototype of Distributed Computing Infrastructure for Physics Analysis SUNY.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Parallel Computing Globus Toolkit – Grid Ayaka Ohira.
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
Mardi Gras Distributed Applications Conference Baton Rouge, LA
Presentation transcript:

Heterogeneous Grid Design and Implementation Thesis Presentation By Jeffrey Wells State University New York Institute of Technology May 7, 2008 CSC 599

Outline Purpose Overview Intro to Globus Toolkit and Condor Interoperability Experiments Results Conclusion

Purpose This thesis investigates the extent to which two open source approaches to Grid computing achieves interoperability. The Globus Alliance’s Globus Toolkit and the University of Wisconsin-Madison’s Condor scheduler were used, in this thesis, to offer an example of interoperability.

Overview What is a Grid? Condor Scheduler Globus Toolkit BITS Regional Grid SUNYIT Local Grid Network Grid Security

What is a Grid? What is a Grid you might ask… definition given by (Ian Foster of the University of Chicago) –  is a system that coordinates resources that are not subject to centralized control  uses standardized, open, general purpose protocols and interfaces  delivers non- trivial qualities of service Examples of Grids (TeraGrid has 20 Teraflops of computing power and 1 Petabyte of storage, Access Grid used for scheduling and conducting meetings, and eDiaMoND used for medical research in England)

Condor Scheduler Condor High Throughput Computing (HTC) – Ties idle resources together to harness their idle resource in a distributed fashion. Condor was developed by the University of Wisconsin-Madison Other distributed schedulers …  PBS (Portable Batch System )  LSF (Load Sharing Facility)  CSF (Community Scheduler Framework) SETI (Search of Extraterrestrial Intelligence)

Globus Toolkit The Globus Toolkit is an open source software toolkit used for building Grid systems and applications. It is constantly being developed by the Globus Alliance at the University of Chicago and many others all over the world. Other type of Grid toolkit…  Virtual Data Toolkit (VDT)

BITS Regional Grid bitsgw qw.cs. sunyit. edu Corning Community College SUNY Geneseo SUNYIT

SUNY IT Local Grid Network Globus Globus Condor Globus Globus 405 Condor Condor 605 bitsgw

Grid Security Grid Security Infrastructure (GSI)  implements public key cryptography as the backbone for its functionality  The reasons behind GSI are: the requirement for secure communication between resources of a Grid; prevent a centrally managed security system allow for a “signal sign-on” for users of the Grid. This includes delegation of credentials for jobs that require more than one resource and /or sites

SUNY Geneseo Debian Linux Cluster Condor Execute/Submit Services used, tested and evaluated: GridFTP, RFT (Reliable File Transfer) Delegation, authentication authorization Credential management Grid Security Infrastructure (GSI) Various Condor submits Globus Services

Condor Central Manager (Scheduler) Central Manager Submit/Execute Globus Central Manager Condor Central Manager (Scheduler) submits jobs either to a Condor Submit/Execute or Globus Machine. Each machine “advertises” via ClassAd to Central Manager its resources Central Manager matches up resource with submitted job requires Central Manger sends executable to remote resource that matches requirement. Once job is completed, Execute Machine reports back to Central Manager Central Manager reports final results. ClassAd/Results Job Request ClassAd/Results Job Request ClassAd/Results

Various Jobs Implemented Condor Jobs  Vanilla  Standard  Java  Parallel  Grid (Globus) Globus Jobs  Forwarded a job to Condor machine with a scheduler  From a Condor scheduler to a Globus machine (Globus Job).  Forward Jobs to other Globus machines.

Interoperability Experiments Globus, Condor and Condor-G Condor-G Interface Job Examples Condor to Globus Job Submit Globus to Condor Job Submit Test Scripts Swift Workflow Some More Test Scripts

Globus, Condor and Condor-G Linux Cluster Condor Workstation Pool Globus Services Condor Scheduler Condor-G manages jobs through the resource manager of the Globus Toolkit. Results of the Job passed to the Globus Toolkit are returned via the Condor-G interface. Condor_startd advertises about the resource and executes the job. Condor_starter spawns the remote job. Condor_shadow maintains the resources. Condor_master is responsible for keeping all the rest of the Condor daemons running. Condor_schedd submits jobs to remote resources for the job queue. Condor_negotiator is responsible for the match making.

Condor-G Interface Linux Cluster Globus Services Condor Workstation Pool Condor-G uses the Globus resource manager to start a job on the remote machine. It also manages the job running on the remote resource. Condor-G waits for the job to be completed and then returns the results. Condor-G interface

Job Examples Condor Job and Globus Script ====================== == Condor to Globus == test.submit ====================== universe = grid executable = myscript.sh arguments = TestJob 10 JobManager_type = Condor grid_type = gt4 globusscheduler = es/ ManagedJobFactoryService/ log = test.log output = test.output error = test.error should_transfer_files = YES when_to_transfer_output = ON_EXIT Queue #! /bin/sh echo "I'm process id $$ on" `hostname` echo "This is sent to standard error" 1>&2date echo "Running as binary $0" echo "My name (argument 1) is $1" echo "My sleep duration (argument 2) is $2" sleep $2 echo "Sleep of $2 seconds finished. Exiting" echo "RESULT: 0 SUCCESS“ Condor Job and MPI Program ########################## # Submit description file # for /bin/hostname # (Parallel) ######################### universe = parallel executable = /bin/hostname machine_count = 2 log = parallellogfile output = outfileMPI.$(NODE) error = errfileMPI.$(NODE) should_transfer_files = YES when_to_transfer_output = ON_EXIT queue MPI Program #include "mpi.h" #include int main( int argc, char* argv[] ) { int rank, size; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, & size ); printf( "I am %d of %d\n", rank, size ); MPI_Finalize(); return 0; }

Condor to Globus Job Submit Condor-G Condor (Scheduler) GASS Server Gate Keeper Job Manager Globus Toolkit Job 1.) Central Manager submits grid job 2.) Job Passes through Condor- G to Globus gate keeper 3.) Verify security via gate keeper 4.) Forward job to job manager5.) Process and return result to Central manager

Globus to Condor Job Submission Gram Client GASS Server GRAM Gatekeeper GRAM Job Manager Batch System Condor GASS Client Local Machine Remote Machine GRAM Job Request Creation Job RequestData Callback Grid - Proxy

Sample Test Scripts Perl Scripts were created to test most functionality of the BITS regional Grid Job submit from Globus to Condor  print " \n------> Submitting a Job to Condor on Stengel < \n";  system "globusrun-ws -submit -Ft Condor -S -c /bin/date";  Job submit from Condor to Globus  print "-----> Submitting a Condor Globus Job < \n";  system "condor_submit /home/wells/testjobs/condorjobs/globussubmits/submitGFor k";

Swift Workflow Swift is a data-oriented coarse-grained scripting language that supports dataset typing and mapping, dataset iteration, conditional branching, and sub-workflow composition The Swift programs, also known as workflows, are written in a language called SwiftScript Swift handles the execution of these programs on remote sites

Sample Test Scripts cont. Swift Job submit to SUNYIY3 (Geneseo)  print "\n \n";  system "swift sites.file /home/wells/testjobs/swiftjobs/sites3.xml /home/wells/testjobs/swiftjobs/first.swift";

Results Condor.pm is malformed for job submits from Globus to Condor. Addition of should_transfer_files = YES and when_to_transfer_output = ON_EXIT must be added to script. -S is used in the Globus Toolkit versus –s in Mpiexe.py, mpdlib.py was modified so that ws-gram was able to send a distributed job to MPICH2. Thanks to Dr. Ralph Butler of Middle Tennessee State University. Another application layer can easily be added to the Globus Toolkit. Applications are changing and maturing faster than the documentation. Mail groups and lists are not always helpful nor do they respond to questions. Documentation is scarce on the MPI-2 and Globus Toolkit connection and is also outdated. Documentation on the Condor and Globus interface is outdated. Resolved by installing Condor and then Globus with Condor scheduler.

Conclusion 1. It is necessary to modify the Condor.pm script in order to allow the Globus Toolkit to submit jobs to the Condor Scheduler. 2. It is necessary to correct Mpiexe.py, mpdlib.py in order for the Globus Toolkit to submit a distributed job to MPICH2. 3. Investigation found that –S is now used to submit a job to Condor under versus the –s under Another application layer can be easily added to the Globus Toolkit without effecting the interoperability with the Condor Scheduler. 5. Documentation is scarce on the MPI-2 and Globus Toolkit connection and is also outdated. 6. Applications are changing and maturing faster than the documentation.

References Globus Toolkit Version 4 Grid Security Infrastructure: A Standards Perspective. The Globus Security Team, Version 4 updated September 12, Retrieved on September 26, 2007 from Overview.pdf/ Overview.pdf/ Tanenbaum, A.(2003) Computer Networks Fourth Edition. New Jersey: Prentice Hall PTR Condor Users Manual Version 6.8 (2007) Retrieved September 24, 2007 from Globus Toolkit Administration Manual (2007) Retrieved September 24, 2007 from Swift Users Guide (Change Revision 1700). Retrieved on February 16, 2008 from Swift – Home (2007), retrieved on February 16, 2008 from Yong Zhao, Michael Hadean, Ben Clifford, Ian Foster, Gregor von Laszewski, Ioan Raicu, Tiberiu Stef-Praun, Mike Wilde Swift: Fast, Reliable, Loosely Coupled Parallel Computation (2007), retrieved on March 2, 2008 from

References (cont.) Mausolf, J. (2005) Grid In Action: Implementation SOA and Web Services In Grid. (2005, August 09). Retrieved September 24, 2007, from Foster, I. (2002) What is a Grid? A Three Point Checklist. Argonne National Laboratory & University of Chicago. Retrieved September 2, 2007 from Overview of the Grid Security Infrastructure, Globus Alliance Globus Toolkit. Retrieved May 6, 2008 from Noel, C (2007). What is a Grid? CETIC’s Tentative Definition. Retrieved on September 6, 2007 from