how Shibboleth can work with job schedulers to create grids to support everyone Exposing Computational Resources Across Administrative Domains H. David.

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.
4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Backup, Restore, and Server Replacement Josh Rose UCBU Software Engineer.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Grid Computing, B. Wilkinson, 20046d.1 Schedulers and Resource Brokers.
Exposing Computational Resources Across Administrative Domains The Condor Shibboleth Integration Project – a scalable alternative for the computational.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Ashok Agarwal 1 BaBar MC Production on the Canadian Grid using a Web Services Approach Ashok Agarwal, Ron Desmarais, Ian Gable, Sergey Popov, Sydney Schaffer,
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Grid Appliance – On the Design of Self-Organizing, Decentralized Grids David Wolinsky, Arjun Prakash, and Renato Figueiredo ACIS Lab at the University.
Hao Wang Computer Sciences Department University of Wisconsin-Madison Security in Condor.
1 1 Vulnerability Assessment of Grid Software Jim Kupsch Associate Researcher, Dept. of Computer Sciences University of Wisconsin-Madison Condor Week 2006.
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
Grid Computing I CONDOR.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
Condor Birdbath Web Service interface to Condor
GridShell + Condor How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner Edward Walker Miron Livney Todd Tannenbaum The Condor Development Team.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,
Using NMI Components in MGRID: A Campus Grid Infrastructure Andy Adamson Center for Information Technology Integration University of Michigan, USA.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
1 Condor BirdBath SOAP Interface to Condor Charaka Goonatilake Department of Computer Science University College London
Grid Security: Authentication Most Grids rely on a Public Key Infrastructure system for issuing credentials. Users are issued long term public and private.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Part Five: Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Review of Condor,SGE,LSF,PBS
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
Ian D. Alderman Computer Sciences Department University of Wisconsin-Madison Condor Week 2008 End-to-end.
Condor Services for the Global Grid: Interoperability between OGSA and Condor Clovis Chapman 1, Paul Wilson 2, Todd Tannenbaum 3, Matthew Farrellee 3,
Active Directory. Computers in organizations Computers are linked together for communication and sharing of resources There is always a need to administer.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
MGRID Architecture Andy Adamson Center for Information Technology Integration University of Michigan, USA.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
Configuring Advanced Windows Server 2012 R2 Services Exams4sure.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Quick Architecture Overview INFN HTCondor Workshop Oct 2016
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Building Grids with Condor
Condor: Job Management
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Condor: Firewall Mirroring
Condor-G Making Condor Grid Enabled
JRA 1 Progress Report ETICS 2 All-Hands Meeting
Presentation transcript:

how Shibboleth can work with job schedulers to create grids to support everyone Exposing Computational Resources Across Administrative Domains H. David Lambert, Stephen Moore, Arnie Miles, Chad La Joie, Brent Putman, Jess Cannata

Large amounts of computing power goes untapped, yet researchers cannot typically find computing power. Resource owners must set policies for the use of their equipment. Users must find and leverage resources that apply to their needs. The Paradox of Grid Computing

Secure grid-like installations are not growing beyond small groups of known players. but....WHY? The only method currently available for ensuring security of a resource involves personal interaction between resource owners and resource consumers. Enabling a user or resource to access a resource requires manually adding user to a local map file. Various methods of grouping users and resources to share certificates have sprung up.

On the other hand Grids that encourage resource owners to connect their machines to a central portal that only allows specific efforts to run have exploded. S.E.T.I. United Devices Grid.org IBM's World Community Grid What does this mean? Historically, getting massive quantities of resources on the grid has been a challenge. However, in situations where the potential resource owners are relieved of heavy administrative burdens, resource owners flock to the grid. When massive numbers of resources are made available to researchers, real work gets accomplished.

How are jobs executed? Modern Job Scheduling software include: Condor Sun Grid Engine (N1) PBS (Pro and Open) LSF Platform

Job scheduling software is unsurpassed in environments where there is only one administrative domain. Beowulf Clusters High Performance n-way devices Unfortunately, as soon as you begin to cross any sort of administrative line, these products become less robust. Intra-Campus grids Inter-Campus grids Attempts to leverage existing grid tools to handle this have resulted in compromises. Groups of users sharing one certificate. User management issues. Accounting issues.

In general, job scheduling software accepts a job description file that describes the work to be done. Job file is free form text, containing name-value pairs. We can therefore add anything we want to these files, as long as we teach the execution machines to understand.

# Example condor_submit input file # (Lines beginning with # are comments) Universe = vanilla Executable = /home/arnie/condor/my_job.condor Input = my_job.stdin Output = my_job.stdout Error = my_job.stderr Arguments = -arg1 -arg2 InitialDir = /home/arnie/condor/run_1 Queue Example Submission file (Condor)

Condor in the Beowulf, Supercomputer, or campus Grid world. Universe = vanilla Executable = /home/arnie/condor/my_job.condor Input = my_job.stdin Output = my_job.stdout Error = my_job.stderr Arguments = -arg1 -arg2 InitialDir = /home/arnie/condor/run_1 Queue User has an account on the cluster or HP device, all nodes are in a closely controlled administrative domain.

Schedd Collector Negotiator Central Manager (CONDOR_HOST ) Collector Negotiator Pool-Foo Central Manager Collector Negotiator Pool-Bar Central Manager Submit Machine Condor Grid with Flocking “Flocks” are introduced to each other by hostname or IP address.

Job Scheduling with Conventional “Grid” Products: Globus and Condor-G User submits job via Globus enabled version of Condor. Any number of resources “on the grid” accept jobs from Globus Gatekeeper and are distributed to Globus Job Managers to be distributed to resources. Each resource must physically map a Globus x.509 certificate to a local user account.

User and Resources Management Problems How does the owner of a grid resource grant access to large numbers of individuals? Summary of Limitations from Previous Examples How does the owner of a grid resource know when a user granted access by membership in an organization leaves that organization? How does a user easily get added to a resource? How does a user find available resources?

SAML based solutions provide secure access to attributes about a user to a resource to become a powerful partner to existing batch job schedulers. While Condor was already able to leverage user attributes from a local LDAP store, this project demonstrates the first time that Condor can consume user attributes from a remote store.

LDAP DB UW Condor Schedd Federation WAYF Georgetown IdP Georgetown University Running Job 10 9 Resource Condor Schedd Job ClassAd UW Shib/Condor Portal LDAP DB UW Condor Schedd Federation WAYF Georgetown IdP Georgetown University Running Job 10 9 Resource Condor Schedd Job ClassAd UW Shib/Condor Portal LDAP DB UW Condor Schedd Federation WAYF Georgetown IdP Georgetown University Running Job 10 9 Resource Condor Schedd Job ClassAd UW Shib/Condor Portal LDAP DB UW Condor Schedd Federation WAYF Georgetown IdP Georgetown University Running Job 10 9 Resource Condor Schedd Job ClassAd UW Shib/Condor Portal LDAP DB UW Condor Schedd Federation WAYF Georgetown IdP Georgetown University Running Job 10 9 Resource Condor Schedd Job ClassAd UW Shib/Condor Portal LDAP DB UW Condor Schedd Federation WAYF Georgetown IdP Georgetown University Running Job 10 9 Resource Condor Schedd Job ClassAd UW Shib/Condor Portal LD AP DB Shib/Cond or Portal Condor Schedd Condor Schedd Job ClassAd Resource ClassAd User at Site 'A'Resource at Site 'B' WAYF IdP Runn ing Job Condor Startd What we are doing now with Shibboleth, LDAP, and Condor User at Site 'A' is aware of a Resource at Site 'B' and Owner of Resource 'B' has granted access to Site 'A'. We leverage the free-text job submission files to add attributes from SAML to our jobs.

Resource Scheduler Running Job 1010 Resource Scheduler Running Job 1010 Resource Scheduler Running Job 1010 Resource Scheduler Running Job 1010 Company “A” University “B” Resource Discovery Network Node Resource Discovery Network Job Submission Client Identity Provider User Job File New Work: Phase II SAML based grid work engine with intelligent resource management Now, Resource owners can grant access to users based upon their attributes instead of their identities. Management of users is again the responsibility of the local administration, as it should be. When Resource Owners can easily set policies without worrying about user management and group memberships, they will become willing to attach their resources to this new computational Grid.

Intelligent Resource Management Users have their own policy decisions to make: Processor type, Operating System Type, executable location, data location, memory requirements, etc. In the perfect world, Users will have multiple Resources to choose from. These Resources will have different configurations that can match the User policy requirements. These varied Resources will also have an ever-changing availability! An Intelligent Resource Management System will allow users to launch jobs from their portal and trust that the work will be sent to the Resource that not only correctly matches the User's job policy, but has the least load on it. This will be done without the User being aware of where the work will be executed. This solution will be scheduler agnostic. An Intelligent Resource Management System will allow users to launch jobs from their portal and trust that the work will be sent to the Resource that not only correctly matches the User's job policy, but has the least load on it. This will be done without the User being aware of where the work will be executed. This solution will be scheduler agnostic.

Identity Provider Job Submission Client User Job File Resource Discovery Network Company “A” University “B” Resource Discovery Network Node Resource Discovery Network Node Resource Discovery Network Node Schedule r Running Job Running Job Running Job Running Job Example of Intelligent Agent

Acknowledgments Georgetown University: Charlie Leonhardt Steve Moore Arnie Miles Chad La Joie Bent Putman Jess Cannata University of Wisconsin: Miron Livny Todd Tannenbaum Ian Alderman Internet2: Ken Klingenstein, Mike McGill