UCS D OSG Summer School 2011 Overlay systems1 2011 OSG Summer School An introduction to Overlay systems Also known as Pilot systems by Igor Sfiligoi University.

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska – Lincoln (Open Science Grid Hat)
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor-G: A Case in Distributed.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
glideinWMS: Quick Facts  glideinWMS is an open-source Fermilab Computing Sector product driven by CMS  Heavy reliance on HTCondor from UW Madison and.
An Introduction to High-Throughput Computing Monday morning, 9:15am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
Ian D. Alderman Computer Sciences Department University of Wisconsin-Madison Condor Week 2008 End-to-end.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
Eileen Berman. Condor in the Fermilab Grid FacilitiesApril 30, 2008  Fermi National Accelerator Laboratory is a high energy physics laboratory outside.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
3 Compute Elements are manageable By hand 2 ? We need middleware – specifically a Workload Management System (and more specifically, “glideinWMS”) 3.
UCS D OSG Summer School 2011 Life of an OSG job OSG Summer School A peek behind the scenes The life of an OSG job by Igor Sfiligoi University of.
How to get the needed computing Tuesday afternoon, 1:30pm Igor Sfiligoi Leader of the OSG Glidein Factory Operations University of California San Diego.
Introduction to Distributed HTC and overlay systems Tuesday morning, 9:00am Igor Sfiligoi Leader of the OSG Glidein Factory Operations University of California.
Ways to Connect to OSG Tuesday, Wrap-Up Lauren Michael, CHTC.
Condor Week 2007Glidein Factories - by I. Sfiligoi1 Condor Week 2007 Glidein Factories (and in particular, the glideinWMS) by Igor Sfiligoi.
Introduction to the Grid and the glideinWMS architecture Tuesday morning, 11:15am Igor Sfiligoi Leader of the OSG Glidein Factory Operations University.
UCS D OSG Summer School 2011 Intro to DHTC OSG Summer School An introduction to Distributed High-Throughput Computing with emphasis on Grid computing.
Condor Week Apr 30, 2008Pseudo Interactive monitoring - I. Sfiligoi1 Condor Week 2008 Pseudo-interactive monitoring in Condor by Igor Sfiligoi.
OSG Consortium Meeting - March 6th 2007Evaluation of WMS for OSG - by I. Sfiligoi1 OSG Consortium Meeting Evaluation of Workload Management Systems for.
European Condor Week CDF experience with Condor glide-ins and GCB - Igor Sfiligoi1 European Condor Week 2006 Using Condor Glide-Ins and GCB to run.
UCS D OSG Summer School 2011 Single sign-on OSG Summer School Single sign-on in Open Science Grid by Igor Sfiligoi University of California San Diego.
Rome, Sep 2011Adapting with few simple rules in glideinWMS1 Adaptive 2011 Adapting to the Unknown With a few Simple Rules: The glideinWMS Experience by.
Madison, Apr 2010Igor Sfiligoi1 Condor World 2010 Condor-G – A few lessons learned by Igor UCSD.
Condor Week 09Condor WAN scalability improvements1 Condor Week 2009 Condor WAN scalability improvements A needed evolution to support the CMS compute model.
UCS D OSG School 11 Grids vs Clouds OSG Summer School Comparing Grids to Clouds by Igor Sfiligoi University of California San Diego.
Condor Week May 2012No user requirements1 Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented.
Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.
Arlington, Dec 7th 2006 Glidein Based WMS 1 A pilot-based (PULL) approach to the Grid An overview by Igor Sfiligoi.
Dealing with real resources Wed July 21st, 3:15pm Igor Sfiligoi, OSG Scalability Area coordinator and OSG glideinWMS factory manager.
WLCG IPv6 deployment strategy
Running LHC jobs using Kubernetes
Quick Architecture Overview INFN HTCondor Workshop Oct 2016
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
Outline Expand via Flocking Grid Universe in HTCondor ("Condor-G")
Workload Management System
POW MND section.
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
Glidein Factory Operations
How to enable computing
I2G CrossBroker Enol Fernández UAB
The CMS use of glideinWMS by Igor Sfiligoi (UCSD)
Building Grids with Condor
Condor Glidein: Condor Daemons On-The-Fly
Basic Grid Projects – Condor (Part I)
WMS Options: DIRAC and GlideIN-WMS
Brian Lin OSG Software Team University of Wisconsin - Madison
The Condor JobRouter.
Condor-G Making Condor Grid Enabled
GLOW A Campus Grid within OSG
JRA 1 Progress Report ETICS 2 All-Hands Meeting
Condor-G: An Update.
Presentation transcript:

UCS D OSG Summer School 2011 Overlay systems OSG Summer School An introduction to Overlay systems Also known as Pilot systems by Igor Sfiligoi University of California San Diego

UCS D OSG Summer School 2011 Overlay systems2 Summary of past lessons ● HTC is maximizing CPU use over long periods ● And getting lots of computation done ● DHTC is HTC over many sites ● Grid sites have a CE with an abstract API ● Direct Grid submission requires job partitioning ● Job partitioning is hard

UCS D OSG Summer School 2011 Overlay systems3 Introduction to Overlay systems in the DHTC context

UCS D OSG Summer School 2011 Overlay systems4 Why is job partitioning hard? ● Intermediate queues with unknown policies Condor LSF PBS CE Grid_Submit You would need to know the future to do correct partitioning

UCS D OSG Summer School 2011 Overlay systems5 If only we could have a global scheduler ● No need to explicitly partition Scheduler Not going to happen

UCS D OSG Summer School 2011 Overlay systems6 Why we cannot have a global scheduler ● Existing infrastructure ● Local users, local policies ● Money & politics ● Being able to work when WAN goes down ●...

UCS D OSG Summer School 2011 Overlay systems7 Scheduler What about a subset? ● Can we convince the sites to lease some of the resources to a 3 rd party HTC system? Yes, can be done

UCS D OSG Summer School 2011 Overlay systems8 Scheduler Resource leasing ● The global scheduler owns the leased resources Just like simple HTC

UCS D OSG Summer School 2011 Overlay systems9 How do we lease in the Grid ● Each Grid job is a lease ● So let's submit a HTC system as a Grid job Scheduler CE Submit htc.jdl Sites don't limit what users submit

UCS D OSG Summer School 2011 Overlay systems10 Scheduler CE Scheduler Overlay system ● We effectively create an overlay system ● A HTC system on top of another HTC system ● The “HTC Grid job” is called a Pilot job Resource provisio ning

UCS D OSG Summer School 2011 Overlay systems11 Hiding the D from DHTC ● Just a simple HTC from a user point of view Condor LSF PBS CE ? Scheduler But didn't we just move the problem?

UCS D OSG Summer School 2011 Overlay systems12 Provisioning not as hard ● Main problem in user job partitioning ● All jobs are important! ● User interested in when the last job finishes ● In pilot job “partitioning” ● All jobs are the same ● User interested in the total number of resources provisioned Much easier

UCS D OSG Summer School 2011 Overlay systems13 Pilot systems in real life ● glideinWMS ● Used by several OSG VOs, including CMS ● PANDA ● Used mostly by ATLAS ● DIRAC ● Not used in OSG, used by LHCb Will concentrate on glideinWMS

UCS D OSG Summer School 2011 Overlay systems14 Overlay systems A high level overview of glideinWMS

UCS D OSG Summer School 2011 Overlay systems15 What is glideinWMS ● A pilot system based on Condor ● Condor as a global HTC system ● Additional glideinWMS processes used to create and submit the pilot jobs ● Developed by CMS (as a generalization of CDF work) ● Based on original Condor glidein work ● Home page:

UCS D OSG Summer School 2011 Overlay systems16 Glidein = Condor pilot ● Glidein ● A properly configured condor_startd as a Grid job Scheduler CE Collector Schedd ? Startd I am ready, give me work Matc h

UCS D OSG Summer School 2011 Overlay systems17 Condor LSF PBS CE ? Schedd Glidein pool ● Just like a regular Condor pool for the user Collecto r Startd CE Including condor_sta tus

UCS D OSG Summer School 2011 Overlay systems18 Scheduler CE Collector Schedd I am ready, give me work Matc h glideinWMS ● glideinWMS processes are the ones that actually configure and submit the glideins ● The user does not need to do anything ● Condor-G used under the hood Schedd-G glideinWMS Startd

UCS D OSG Summer School 2011 Overlay systems19 Resource selection ● Users may want to run only on a subset of resources ● i.e. have some requirements ● You don't want to provision resources that user jobs will not use! ● glideinWMS thus does matchmaking

UCS D OSG Summer School 2011 Overlay systems20 glideinWMS matchmaking ● Not as sophisticated as the rest of Condor ● Policy centralized in glideinWMS ● No “requirements” expression in job ClassAd ● On the plus side, very easy on users ● Just add an attribute ● Typical basic setup has +DESIRED_Sites=”...” (startd requirements contain stringListMember(GLIDEIN_Site,DESIRED_Sites)=?=True)

UCS D OSG Summer School 2011 Overlay systems21 Architecture ● Separates glidein submission from matchmaking ● Factory knows about sites and advertises their existence (w/attrs) ● Frontend does the matchmaking and regulates number of glideins ● Can be N:M glidein factory frontend Grid glidein factory Schedd Collector Submit 1 Submit 2 FE

UCS D OSG Summer School 2011 Overlay systems22 PANDA ● High level overview, just for comparison ● Heavily based on Web standards

UCS D OSG Summer School 2011 Overlay systems23 Get your hands dirty ● This is all the theory you need to know for now ● Exercise time ● Feel free to ask question