How I learned to Stop Worrying and Love Preemption

Slides:



Advertisements
Similar presentations
Greg Thain Computer Sciences Department University of Wisconsin-Madison Condor Parallel Universe.
Advertisements

Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska – Lincoln (Open Science Grid Hat)
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor's Use of the Cisco Unified Computing System.
Making HTCondor Energy Efficient by identifying miscreant jobs Stephen McGough, Matthew Forshaw & Clive Gerrard Newcastle University Stuart Wheater Arjuna.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
High Throughput Urgent Computing Jason Cope Condor Week 2008.
Lecture 3: Computer Performance
Simulating Condor Stephen McGough, Clive Gerrard & Jonathan Noble Newcastle University Paul Robinson, Stuart Wheater Arjuna Technologies Limited Condor.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
The material in this presentation is the property of Fair Isaac Corporation. This material has been provided for the recipient only, and shall not be used,
1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009.
Large, Fast, and Out of Control: Tuning Condor for Film Production Jason A. Stowe Software Engineer Lead - Condor CORE Feature Animation.
Ian Alderman A Little History…
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
June 10, D0 Use of OSG D0 relies on OSG for a significant throughput of Monte Carlo simulation jobs, will use it if there is another reprocessing.
Purdue Campus Grid Preston Smith Condor Week 2006 April 24, 2006.
OSG Production Report OSG Area Coordinator’s Meeting Aug 12, 2010 Dan Fraser.
Pre-GDB 2014 Infrastructure Analysis Christian Nieke – IT-DSS Pre-GDB 2014: Christian Nieke1.
Review of Condor,SGE,LSF,PBS
The impacts of climate change on global hydrology and water resources Simon Gosling and Nigel Arnell, Walker Institute for Climate System Research, University.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
ITFN 2601 Introduction to Operating Systems Lecture 4 Scheduling.
Mar 27, gLExec Accounting Solutions in OSG Gabriele Garzoglio gLExec Accounting Solutions in OSG Mar 27, 2008 Middleware Security Group Meeting Igor.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
Landing in the Right Nest: New Negotiation Features for Enterprise Environments Jason Stowe.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
How High Throughput was my cluster? Greg Thain Center for High Throughput Computing.
LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.
John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Caging the CCLRC Compute Zoo (Activities at.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Computer Model Simulation STEPHEN GOWDY FNAL 30th March 2015 Computing Model Simulation 1.
CPU Scheduling CS Introduction to Operating Systems.
Purdue RP Highlights TeraGrid Round Table November 5, 2009 Carol Song Purdue TeraGrid RP PI Rosen Center for Advanced Computing Purdue University.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Creating Grid Resources for Undergraduate Coursework John N. Huffman Brown University Richard Repasky Indiana University Joseph Rinkovsky Indiana University.
Job Priorities and Resource sharing in CMS A. Sciabà ECGI meeting on job priorities 15 May 2006.
How to get the needed computing Tuesday afternoon, 1:30pm Igor Sfiligoi Leader of the OSG Glidein Factory Operations University of California San Diego.
CHTC Policy and Configuration
Condor A New PACI Partner Opportunity Miron Livny
Computer Architecture & Operations I
Connecting LRMS to GRMS
Examples Example: UW-Madison CHTC Example: Global CMS Pool
Computer Architecture & Operations I
High Availability in HTCondor
Moving from CREAM CE to ARC CE
ATLAS Sites Jamboree, CERN January, 2017
Accounting in HTCondor
Chapter 2.2 : Process Scheduling
Process Scheduling B.Ramamurthy 9/16/2018.
Accounting, Group Quotas, and User Priorities
Haiyan Meng and Douglas Thain
Lecture 21: Introduction to Process Scheduling
كيــف تكتـب خطـة بحـث سيئـة ؟؟
الدكتـور/ عبدالناصـر محمـد عبدالحميـد
Welcome to HTCondor Week #18 (year 33 of our project)
Basic Grid Projects – Condor (Part I)
What’s Different About Overlay Systems?
Chapter 9 Uniprocessor Scheduling
Engaging Volunteers Online Scott Druery
Engaging Volunteers Online Luke Mansfield
Lecture 21: Introduction to Process Scheduling
Layered Queueing Network Modeling of Software Systems Murray Woodside 5201 Canal Building Building Security System (buffering)
Grid Laboratory Of Wisconsin (GLOW)
GLOW A Campus Grid within OSG
Welcome to (HT)Condor Week #19 (year 34 of our project)
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

How I learned to Stop Worrying and Love Preemption

Best thing about Condor? The Community Community as Metaphor

Two Questions How much more room in OSG? How long can jobs (effectively) run?

Not an OSG talk

3 method for performance analysis Modeling Difficult in opportunistic pools Simulation Measurement

Measurement Simply submit a ton of jobs All measurement via user log

User log trick: +TheSite="$$(GLIDEIN_Site)" job_ad_information_attrs = TheSite

028 (4.1.0) 02/17 18:38:03 Job ad information event triggered. TriggerEventTypeNumber = 1 EventTypeNumber = 28 TriggerEventTypeName = "ULOG_EXECUTE" Proc = 87 Subproc = 0 TheSite = "Purdue" CurrentTime = time() MyType = "ExecuteEvent"

76,161 total cpu-hours

Wall clock slowdown

How to fix? Mix and match Opportunistic for throughput Local resources for tail chopping

Summary Opportunistic cycles are still very useful Fairness and Throughput can be duals Opportunistic + dedicated very powerful