Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.

Slides:



Advertisements
Similar presentations
Experiences with GridWay on CRO NGI infrastructure / EGEE User Forum 2009 Experiences with GridWay on CRO NGI infrastructure Emir Imamagic, Srce EGEE User.
Advertisements

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Workload Management Massimo Sgaravatto INFN Padova.
Xuan Guo Chapter 1 What is UNIX? Graham Glass and King Ables, UNIX for Programmers and Users, Third Edition, Pearson Prentice Hall, 2003 Original Notes.
AN INTRODUCTION TO LINUX OPERATING SYSTEM Zihui Han.
Case study 2 Android – Mobile OS.
Condor Project Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
17/09/2004 John Kewley Grid Technology Group Introduction to Condor.
Grid Computing, B. Wilkinson, 20046d.1 Schedulers and Resource Brokers.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Grid Toolkits Globus, Condor, BOINC, Xgrid Young Suk Moon.
A+ Guide to Managing and Maintaining Your PC Fifth Edition Chapter 15 Installing and Using Windows XP Professional.
National Alliance for Medical Image Computing Grid Computing with BatchMake Julien Jomier Kitware Inc.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
High Performance Louisiana State University - LONI HPC Enablement Workshop – LaTech University,
HTCondor and BOINC. › Berkeley Open Infrastructure for Network Computing › Grew out of began in 2002 › Middleware system for volunteer computing.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Computing I CONDOR.
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Introduction to Unix Part 1 Research Computing Workshops Office of Information Technology & Mississippi Center for Supercomputing Research Jason Hale &
1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
CE Operating Systems Lecture 3 Overview of OS functions and structure.
The GRID and the Linux Farm at the RCF HEPIX – Amsterdam HEPIX – Amsterdam May 19-23, 2003 May 19-23, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, A.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Review of Condor,SGE,LSF,PBS
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Internet2 AdvCollab Apps 1 Access Grid Vision To create virtual spaces where distributed people can work together. Challenges:
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor and Virtual Machines.
HTCondor Security Basics HTCondor Week, Madison 2016 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
CernVM and Volunteer Computing Ivan D Reid Brunel University London Laurence Field CERN.
UDel CISC361 Study Operating System principles - processes, threads - scheduling - mutual exclusion - synchronization - deadlocks - memory management -
Harvesting Free Windows CPU Cycles for Linux Applications using Sandboxing Rasmus Andersen Dept. of Computer Science, University of Copenhagen, Denmark.
Computing Clusters, Grids and Clouds Globus data service
2. OPERATING SYSTEM 2.1 Operating System Function
HTCondor Security Basics
Example: Rapid Atmospheric Modeling System, ColoState U
Introduction to Operating System (OS)
University of Technology
Grid Means Business OGF-20, Manchester, May 2007
HTCondor Security Basics HTCondor Week, Madison 2016
Basic Grid Projects – Condor (Part I)
Lecture Topics: 11/1 General Operating System Concepts Processes
Prof. Leonardo Mostarda University of Camerino
Co-designed Virtual Machines for Reliable Computer Systems
Section 1: Linux Basics and SLES9 Installation
Operating Systems Structure
Operating System Overview
Virtual Machine Migration for Secure Out-of-band Remote Management in Clouds T.Unoki, S.Futagami, K.Kourai (Kyushu Institute of Technology) OUT-OF-BAND.
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

Condor Overview Bill Hoagland

Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware under distributed ownership

Condor History Developed by University of Wisconsin- Madison Computer Science Department First put into production use 15 years ago –Mature and stable

Condor Availability Freely available under a BSD style license Not open source, code is not distributed publicly

Supported Systems Solaris 8, 9, & 10 (Sparc) Red Hat & Fedora Core (x86) MS Windows 2000, XP & 2003 Server (x86) Mac OS 10.3 & 10.4 (PPC) Other Unixes (SuSE, AIX, HPUX,Yellow Dog, Debian)

Condor Design Originally developed for “cycle stealing” from idle machines Retains robustness to failures and changing availability from this legacy

Condor Goal “High throughput” vs “High performance” –High performance - fast machines (ie. Cray) –High throughput - many machines, fault tolerant infrastructure (ie.

Condor Components Job queueing Scheduling policy Priority mechanism Resource monitoring Resource management

Condor Highlights Checkpointing –Checkpointing saves complete running process and I/O state to disk

Checkpointing Allows recovery from failures –Roll back to the last saved state Allows process migration –Move saved state and restart

Checkpointing continued Can compress checkpoint images Checkpoint mechanism can be used outside of Condor

Checkpointing continued Some limitations –Single process space –Single kernel thread –Cannot save state of file open for both read and write Not supported on all platforms

Checkpointing continued Must have object files Usually requires no changes Relink code to include condor library layer, e.g. $ condor_compile gcc -o foo foo.c

Condor Highlights Remote system calls –Preserves user environment on remote machine –Users need not make files available or have access to remote machine

Condor Highlights Pools of Machines can be Hooked Together –Jobs submitted to one pool can migrate to a second –Subject to the policies of each pools owner

Condor Highlights Jobs can be Ordered –Jobs can be ordered because of dependencies easily –Dependencies are described in a directed acyclic graph

Condor Highlights Condor Enables Grid Computing –Condor has been designed with grid support hooks –Globus controlled resources

Condor Highlights Sensitive to the Desires of Machine Owners –Machine owners may set almost any usage policy

Condor Highlights Powerful priority policy mechanism –Requirements and preferences are associated with jobs and machines –A negotiation process matches job requirements then ranks on preferences

Condor Security Condors purpose is to allow users to run arbitrary code on large numbers of machines Assumes users are trustworthy

Condor Security continued Cannot protect against users that can elevate their privileges Does not run user jobs in sandboxes

Condor Security continued Can prevent unauthorized access to Condor Optional authentication e.g. Kerberos, Grid Security Infrastructure (GSI), others

Condor Security continued Can ensure that user data has not been examined or tampered with Optional encryption and integrity checking of all network traffic

Condor Backfill When machine completely idle… –Configure default job –Support for BOINC

Condor Configuration Controlled by hierarchical config files –Well commented –Human readable –In some cases, more clear than the manual

Condor Adminstration CondorView –Web based statistics –Machine and user data

Condor Website