Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,

Slides:



Advertisements
Similar presentations
Wei Lu 1, Kate Keahey 2, Tim Freeman 2, Frank Siebenlist 2 1 Indiana University, 2 Argonne National Lab
Advertisements

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyProxy and EGEE Ludek Matyska and Daniel.
Jaime Frey, Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison OGF.
CSF4, SGE and Gfarm Integration Zhaohui Ding Jilin University.
Part 7: CondorG A: Condor-G B: Laboratory: CondorG.
Greg Quinn Computer Sciences Department University of Wisconsin-Madison Condor on Windows.
1 Using Stork Barcelona, 2006 Condor Project Computer Sciences Department University of Wisconsin-Madison
Ian D. Alderman Computer Sciences Department University of Wisconsin-Madison Condor Week 2007 Signed.
Grid Security. Typical Grid Scenario Users Resources.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Alain Roy Computer Sciences Department University of Wisconsin-Madison 25-June-2002 Using Condor on the Grid.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Condor Project Computer Sciences Department University of Wisconsin-Madison Security in Condor.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Progress Report Barnett Chiu Glidein Code Updates and Tests (1) Major modifications to condor_glidein code are as follows: 1. Command Options:
Hao Wang Computer Sciences Department University of Wisconsin-Madison Security in Condor.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
Chapter 23 Internet Authentication Applications Kerberos Overview Initially developed at MIT Software utility available in both the public domain and.
3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner.
Condor Project Computer Sciences Department University of Wisconsin-Madison A Scientist’s Introduction.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor-G Operations.
Military Technical Academy Bucharest, 2004 GETTING ACCESS TO THE GRID Authentication, Authorization and Delegation ADINA RIPOSAN Applied Information Technology.
Grid job submission using HTCondor Andrew Lahiff.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.
Privilege separation in Condor Bruce Beckles University of Cambridge Computing Service.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap.
Condor-G A Quick Introduction Alan De Smet Condor Project University of Wisconsin - Madison.
Grid Security: Authentication Most Grids rely on a Public Key Infrastructure system for issuing credentials. Users are issued long term public and private.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
EGEE-II INFSO-RI Enabling Grids for E-sciencE The GILDA training infrastructure.
Review of Condor,SGE,LSF,PBS
Zach Miller Computer Sciences Department University of Wisconsin-Madison Securing Condor.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
FermiGrid School Steven Timm FermiGrid School FermiGrid 201 Scripting and running Grid Jobs.
Ian D. Alderman Computer Sciences Department University of Wisconsin-Madison Condor Week 2008 End-to-end.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor and DAGMan Barcelona,
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Job Router.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
HTCondor Security Basics HTCondor Week, Madison 2016 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor-G: Condor and Grid Computing.
Antonio Fuentes RedIRIS Barcelona, 15 Abril 2008 The GENIUS Grid portal.
HTCondor Security Basics
Grid Security.
Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")
Building Grids with Condor
HTCondor Security Basics HTCondor Week, Madison 2016
Condor Glidein: Condor Daemons On-The-Fly
Condor and Grids.
The Condor JobRouter.
GRID Workload Management System for CMS fall production
Condor-G Making Condor Grid Enabled
Condor-G: An Update.
Presentation transcript:

Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona, 2006

2 Agenda  Extended user’s tutorial  Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing  Case studies, and a discussion of your application‘s needs

3 Resources  There are many resources (machines) in the world, and many are or can be made available!  Groups of machines may be labeled as grids  Welcome to the power of the grid !

4 Condor and Grids  Condor has always been a tool to harness grid computing  Condor’s mechanisms have evolved as technologies have evolved. Roughly categorized:  Flocking  Glidein  The grid universe

5 Flocking A way for jobs to run within a different, separate Condor pool Condor runs here, and Condor runs there here there

6 Connect Condor Pools with Flocking  Flocking is a Condor-specific technology  Flocking is enabled with configuration  Jobs flock from here to there when they cannot be run here due to lack of available machines

7 Configuration  Configuration files contain lots of the administrative information used by Condor  Format is like that in submit description files: AttributeName = Value

8 Configuration here  For jobs to be able to flock from here to there  In the configuration file on the pool where jobs flock from: FLOCK_TO = FLOCK_COLLECTOR_HOSTS = $(FLOCK_TO) FLOCK_NEGOTIATOR_HOSTS = $(FLOCK_TO) HOSTALLOW_NEGOTIATOR_SCHEDD = $(COLLECTOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS)

9 Configuration there  In the configuration file on the pool where jobs flock to: FLOCK_FROM =,...,  To make security work: HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM) HOSTALLOW_WRITE_STARTD = $(HOSTALLOW_WRITE), $(FLOCK_FROM) HOSTALLOW_READ_COLLECTOR = $(HOSTALLOW_READ), $(FLOCK_FROM) HOSTALLOW_READ_STARTD = $(HOSTALLOW_READ), $(FLOCK_FROM)

10 Submit Description File Enable file transfer: universe = vanilla executable = myjob.exe input = myjob.input output = myjob.output log = myjob.log should_transfer_files = YES when_to_transfer_output = ON_EXIT queue

11 The Glidein Concept  Assume: We need more machines, and we have permission to use a set of machines  Glidein temporarily adds a set of machines to the local pool

12 Glidein  In addition, Glidein solves the problem: “My job needs to run on that particular resource, and my job needs Condor.”  For example: a job that must run under the standard universe

13 Glidein  Condor sends and runs its own executables on the resource  The needed resource appears to temporarily join the local Condor pool !

14 Glidein run condor_glidein to add the remote resource to the local pool local pool remote resource the master and startd daemons become grid universe jobs using gt2

15 Making Glidein Work  Change the configuration to give access permission ( HOSTALLOW_WRITE ) to the remote resource  No changes to jobs’ submit description files!  But, do enable file transfer in the submit description file: universe = vanilla executable = myjob.exe input = myjob.input output = myjob.output log = myjob.log should_transfer_files = YES when_to_transfer_output = ON_EXIT queue

16 Force Job to Glidein Resource In the submit description file: universe = standard executable = ajob.exe input = ajob.input output = ajob.output log = ajob.log requirements = \ ( machine == “example.mcs.anl.gov" ) \ && Arch != "" && OpSys != "" queue

17 The Grid Universe Most useful when 1.We want to send a job off to a far away machine 2.We want to hand a job to another batch processing system on the local machine 3.We want to send a job off to a far away machine, in order to hand that job to another batch processing system on that machine

18 The Grid Universe  All handled in the submit description file  Supports several back end types:  Globus: GT2, GT3, GT4  NorduGrid  UNICORE  Condor  PBS  LSF

19 Condor-G  Condor-G describes jobs to be handed off to a machine, and the machine is utilizing Globus middleware  gt 2: Globus Toolkit 1 or 2 or the pre-web services GRAM  gt 3: Globus Toolkit 3  gt 4: Globus Toolkit 4 or WS GRAM

20 Submit Description File For gt2: universe = grid input = job1.input output = job1.result log = job1.log grid_resource = gt2 example.wisc.edu/jobmanager queue jobmanager jobmanager-condor jobmanager-pbs jobmanager-lsf jobmanager-sge One of:

21 For gt3: universe = grid input = job2.input output = job2.result log = job2.log grid_resource = gt3 /gram/XXXManagedJobFactoryService queue Submit Description File Fork Condor PBS LSF SGE XXX is one of: IP address:Port number

22 For gt4: universe = grid input = job3.input output = job3.result log = job3.log grid_resource = gt4 service/ManagedJobFactoryService XXX queue Submit Description File Fork Condor PBS LSF SGE XXX is one of: IP address:Port number OR Host name:Port number

23 Nordugrid and the Submit Description File universe = grid input = job4.input output = job4.result log = job4.log grid_resource = nordugrid ngexample.com queue

24 Unicore and the Submit Description File universe = grid input = job5.input output = job5.result log = job5.log grid_resource = unicore usite.example.com vsite keystore_file = /frieda/certificates/keystore keystore_alias = “frieda” keystore_passphrase_file = /frieda/private/passphrase queue vsite is the name of the Unicore virtual resource

25 PBS and the Submit Description File  Details of the PBS installation in $(GLITE_LOCATION)/etc/batch_gahp.config universe = grid input = job6.input output = job6.result log = job6.log grid_resource = pbs queue

26 LSF and the Submit Description File  Details of the LSF installation in $(GLITE_LOCATION)/etc/batch_gahp.config universe = grid input = job7.input output = job7.result log = job7.log grid_resource = lsf queue

27 Condor-C  Condor is running here, and Condor is running over there  For the case where We want to send a job off to a far away machine, in order to hand that job to another batch processing system on that machine

28 Condor-C and the Submit Description File universe = grid input = job8.input output = job8.result log = job8.log grid_resource = condor remotecentralmanager.example.com +remote_jobuniverse = 5 +remote_requirements = True +remote_ShouldTransferFiles = "YES" +remote_WhenToTransferOutput = "ON_EXIT" queue schedd name collector machine name vanilla universe

29 Credentials  Not just anybody can use any resource at any time...  Key concepts: Authentication verification of an identity Authorization permission to do something

30 Authentication If Frieda says “I am Frieda.”, how do we distinguish this from if Frieda says “I am George Bush.” ?

31 Authentication  Bush can do whatever he pleases  If Frieda claims to be Bush, (and this is accepted), then Frieda can do whatever she pleases  Authentication attempts to verify the identity of the entity that is communicating

32 Authorization  Who is allowed (permitted) to do what  Frieda may run gt4 jobs on the Open Science Grid machines  Fred may write to files in /usr/bin  the Unix user root may do anything!  Can be implemented with a list of those authorized

33 Condor and Authentication Authentication within Condor comes in many forms. Here are three. 1.File system: Have the entity write a file. The OS attaches a name to the file owner. Condor checks that the entity’s claim is the same as the file owner. 2.GSI (Grid Security Infrastructure) 3.Kerberos

34 Authentication Idea A centralized certificate authority (CA) does verification of an entity’s identity. When satisfied, the CA issues a signed certificate (also called a credential) I am Frieda CA

35 Authentication To authenticate, the entity presents the certificate All is well, if we trust the CA and the remote machine I am Frieda CA

36 GSI Authentication  GSI uses X.509 certificates  Grid universe, submitting to back end types using Globus middleware (gt2, gt3, gt4), as well as nordugrid, and unicore use X.509 certificates  Condor can also use GSI

37 Revocation, Trust, and Proxies  The CA may revoke a credential  Frieda gives the signed credential to the remote machine. If the remote machine is malicious, it could impersonate Frieda. Therefore, a password protects the credential.  A proxy is a credential that includes the password, but is only valid for a specific (short) time period.  MyProxy software enables GSI proxy management