Introduction to the Grid and the glideinWMS architecture Tuesday morning, 11:15am Igor Sfiligoi Leader of the OSG Glidein Factory Operations University.

Slides:



Advertisements
Similar presentations
ANTHONY TIRADANI AND THE GLIDEINWMS TEAM glideinWMS in the Cloud.
Advertisements

Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Intermediate Condor: DAGMan Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor-G: A Case in Distributed.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
Grid job submission using HTCondor Andrew Lahiff.
Evolution of the Open Science Grid Authentication Model Kevin Hill Fermilab OSG Security Team.
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
© 2015 albert-learning.com How to talk to your boss How to talk to your boss!!
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
3 Compute Elements are manageable By hand 2 ? We need middleware – specifically a Workload Management System (and more specifically, “glideinWMS”) 3.
UCS D OSG Summer School 2011 Life of an OSG job OSG Summer School A peek behind the scenes The life of an OSG job by Igor Sfiligoi University of.
How to get the needed computing Tuesday afternoon, 1:30pm Igor Sfiligoi Leader of the OSG Glidein Factory Operations University of California San Diego.
Introduction to Distributed HTC and overlay systems Tuesday morning, 9:00am Igor Sfiligoi Leader of the OSG Glidein Factory Operations University of California.
UCS D OSG Summer School 2011 Intro to DHTC OSG Summer School An introduction to Distributed High-Throughput Computing with emphasis on Grid computing.
OSG Consortium Meeting - March 6th 2007Evaluation of WMS for OSG - by I. Sfiligoi1 OSG Consortium Meeting Evaluation of Workload Management Systems for.
UCS D OSG Summer School 2011 Overlay systems OSG Summer School An introduction to Overlay systems Also known as Pilot systems by Igor Sfiligoi University.
UCS D OSG Summer School 2011 Single sign-on OSG Summer School Single sign-on in Open Science Grid by Igor Sfiligoi University of California San Diego.
Security in OSG Tuesday afternoon, 4:15pm Igor Sfiligoi Member of the OSG Security team University of California San Diego.
Rome, Sep 2011Adapting with few simple rules in glideinWMS1 Adaptive 2011 Adapting to the Unknown With a few Simple Rules: The glideinWMS Experience by.
Madison, Apr 2010Igor Sfiligoi1 Condor World 2010 Condor-G – A few lessons learned by Igor UCSD.
Condor Week 09Condor WAN scalability improvements1 Condor Week 2009 Condor WAN scalability improvements A needed evolution to support the CMS compute model.
UCS D OSG School 11 Grids vs Clouds OSG Summer School Comparing Grids to Clouds by Igor Sfiligoi University of California San Diego.
Condor Week May 2012No user requirements1 Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented.
Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.
Turning science problems into HTC jobs Tuesday, Dec 7 th 2pm Zach Miller Condor Team University of Wisconsin-Madison.
Dealing with real resources Wed July 21st, 3:15pm Igor Sfiligoi, OSG Scalability Area coordinator and OSG glideinWMS factory manager.
Introduction to CSCI 1311 Dr. Mark C. Lewis
Quick Architecture Overview INFN HTCondor Workshop Oct 2016
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")
Workload Management System
Glidein Factory Operations
The CMS use of glideinWMS by Igor Sfiligoi (UCSD)
CREAM-CE/HTCondor site
Security in OSG Rob Quick
Building Grids with Condor
Create, connect and share respect: A better internet starts with you!
Basic Grid Projects – Condor (Part I)
Pierre Girard ATLAS Visit
Genre1: Condor Grid: CSECCR
Accelerated Introduction to Computer Science
Therapeutic Communication
Parent - Teacher Meetings As easy as A-B-C
Out of this world: 24/7 Online Chat
Credential Management in HTCondor
Condor-G: An Update.
Presentation transcript:

Introduction to the Grid and the glideinWMS architecture Tuesday morning, 11:15am Igor Sfiligoi Leader of the OSG Glidein Factory Operations University of California San Diego

2012 OSG User School Logistical reminder It is OK to ask questions  During the lecture  During the demos  During the exercises  During the breaks If I don't know the answer, I will find someone who likely does

2012 OSG User School Reminder - DHTC DHTC is about computing on more than one HTC system 3 Schedul er

2012 OSG User School This lecture goes into details of DHTC I just want to do my science. I will leave the direct DHTC tasks to the Overlay admins. But then it is up to me to fix your screw ups! Good. This is the spirit. You still should learn.

2012 OSG User School The Grid One instance of DHTC The idea behind the Grid is to provide a single interface to any HTC system  No matter where it is located  No matter who operates it  No matter what technology it uses Based on two principles  Single sign-on  An abstraction layer for job submission

2012 OSG User School Single sign-on The idea is simple  The user should use the same mechanism to submit jobs to any site (and there can be 100s of them!) Hi. I am Igor

2012 OSG User School OSG uses Certificates Think of it as a passport  It is issued once to you  You present it for inspection when doing immigration  The immigration officer uses the information in the passport to let you in In OSG it is essentially a file More details in the afternoon

2012 OSG User School OSG uses Certificates Think of it as a passport  It is issued once to you  You present it for inspection when doing immigration  The immigration officer uses the information in the passport to let you in In OSG it is essentially a file More details in the afternoon Make sure you get one today! You will need it for the storage exercises tomorrow. (We have no Grid exercises)

2012 OSG User School Abstraction layer for submission Again, you want the same mechanism to submit jobs to any site We put an abstraction layer between the user and the site-specific technology 3 jobs, please Known as CE = Compute Element

2012 OSG User School Theory and practice In practice, no single abstraction layer  Several products: GRAM, CREAM, ARC Although OSG mostly uses GRAM  But even this is under discussion

2012 OSG User School Theory and practice In practice, no single abstraction layer  Several products: GRAM, CREAM, ARC Although OSG mostly uses GRAM  But even this is under discussion Need a flexible submission tool

2012 OSG User School Enter Condor-G Condor happens to be the best, most flexible submit tool  Indeed, the recommended tool in OSG Condor-G is just a name for the components handling “Grid universe” jobs  You would still be using condor_submit and condor_q

2012 OSG User School Condor-G details Condor-G doesn't manage remote resources  It just forwards and monitors jobs sent to remote HTC systems  No condor_status No matchmaking  User explicitly specifies API and site to use condor_sche dd Schedul er CE condor_submit “Grid UNL” Grid_submit Grid_q

2012 OSG User School CE as a black box Practically all CE implementations provide only minimal functionality  Job submission  Basic job monitoring  Job removal If anything goes wrong, very hard to discover the core reason  Not always, but way too often  Requires contact with the remote admins Side effect of abstraction

2012 OSG User School CE as a black box Practically all CE implementations provide only minimal functionality  Job submission  Basic job monitoring  Job removal If anything goes wrong, very hard to discover the core reason  Not always, but way too often  Requires contact with the remote admins Side effect of abstraction Avoid direct use of the Grid, if you can. Find someone else who does it for you.

2012 OSG User School Questions so far?

2012 OSG User School Reminder - glideinWMS A Condor based overlay system  i.e. looks like a regular Condor system to the users  Adds a resource provisioning service (i.e. the lease manager) Schedul er Schedd glideinWMS Collector Negotiato r

2012 OSG User School The inner structure The glideinWMS is really composed of two components  A VO Frontend – The matchmaker  A Glidein Factory – The pilot submitter Schedul er Schedd Glidein Factory Collector Negotiato r VO Fronten d

2012 OSG User School The Glidein Factory The Glidein Factory is the interface to the Grid  Essentially, an additional abstraction layer Meant to be operated by an expert team on behalf of many user communities  Think of it as a service, not as a piece of SW The factory operators will deal with Grid details  Including debugging misbehaving glideins

2012 OSG User School Glidein Factory internals Essentially a slave to VO Frontends  Will submit on their behalf  Using their certificate  Main role is monitoring and debugging Uses Condor-G under the hood Site scheduler Glidein Factory VO Fronten d CE condor_sche dd VO Frontend

2012 OSG User School Glidein Factory internals Essentially a slave to VO Frontends  Will submit on their behalf  Using their certificate  Main role is monitoring and debugging Uses Condor-G under the hood Site scheduler Glidein Factory VO Fronten d CE condor_sche dd VO Frontend I don't expect you will ever need to operate a Glidein Factory. But you never know

2012 OSG User School VO Frontend The “brain” of a glideinWMS system  Decides when and where to send glideins  Will talk to one or more gfactories Each user community needs one  i.e each VO == Virtual Organization  Alongside the Condor daemons Not much Grid knowledge needed here  Apart from when things go really wrong!

2012 OSG User School VO Frontend Matchmaking Can we skip over this? I just want to do my science! She will be a great scientist! But first you have to graduate! And operating a VO Frontend will be your service work. Maybe you can get away without. But you are a likely candidate.

2012 OSG User School VO Frontend Matchmaking The VO Frontend config defines the matchmaking policy  For both levels of matchmaking Unfortunately, the two levels expressed in two different languages  Python expression – Frontend logic  ClassAd expression – Startd requirements Example config

2012 OSG User School Monitoring and debugging The system mostly run itself  But sometimes things do go wrong Two major sources of monitoring  Condor itself (condor_q, condor_status)  The VO Frontend Web page and logs Similar for debugging More details in the demo.

2012 OSG User School Questions? Questions? Comments?  Feel free to ask me questions later: Igor Sfiligoi Upcoming sessions  Now – 12:00pm  Demo  12:00pm – 1:30pm  Lunch + Tour  1:30pm –  Next lecture - How to get the needed computing 26

2012 OSG User School Copyright notice 27 This presentation contains images copyrighted by ToonClipart.com These images have been licensed to Igor Sfiligoi for use in his presentations Any other use of them is prohibited