Condor Project Computer Sciences Department University of Wisconsin-Madison Condor: A Project and.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
EE384y: Packet Switch Architectures
1
Libra: An Economy driven Job Scheduling System for Clusters Jahanzeb Sherwani 1, Nosheen Ali 1, Nausheen Lotia 1, Zahra Hayat 1, Rajkumar Buyya 2 1. Lahore.
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
1 Copyright © 2002 Pearson Education, Inc.. 2 Chapter 2 Getting Started.
Distributed Systems Architectures
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
Myra Shields Training Manager Introduction to OvidSP.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
/ 1 N. Williams Grid Middleware Experiences Nadya Williams OCI Grid Computing, University of Zurich
CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding or
11 Application of CSF4 in Avian Flu Grid: Meta-scheduler CSF4. Lab of Grid Computing and Network Security Jilin University, Changchun, China Hongliang.
Condor Project Computer Sciences Department University of Wisconsin-Madison Introduction Condor.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
1 Hyades Command Routing Message flow and data translation.
David Burdett May 11, 2004 Package Binding for WS CDL.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Condor Project Computer Sciences Department University of Wisconsin-Madison Eager, Lazy, and Just-in-Time.
UK Condor Week NeSC – Scotland – Oct 2004 Condor Team Computer Sciences Department University of Wisconsin-Madison The Bologna.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
The 5S numbers game..
Auto-scaling Axis2 Web Services on Amazon EC2 By Afkham Azeez.
1. 2 Objectives Become familiar with the purpose and features of Epsilen Learn to navigate the Epsilen environment Develop a professional ePortfolio on.
Red Tag Date 13/12/11 5S.
ACT User Meeting June Your entitlements window Entitlements, roles and v1 security overview Problems with v1 security Tasks, jobs and v2 security.
ETS4 - What's new? - How to start? - Any questions?
PP Test Review Sections 6-1 to 6-6
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Sample Service Screenshots Enterprise Cloud Service 11.3.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Adding Up In Chunks.
SLP – Endless Possibilities What can SLP do for your school? Everything you need to know about SLP – past, present and future.
Global Analysis and Distributed Systems Software Architecture Lecture # 5-6.
By CA. Pankaj Deshpande B.Com, FCA, D.I.S.A. (ICA) 1.
Subtraction: Adding UP
Systems Analysis and Design in a Changing World, Fifth Edition
Essential Cell Biology
PSSA Preparation.
Essential Cell Biology
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Immunobiology: The Immune System in Health & Disease Sixth Edition
Energy Generation in Mitochondria and Chlorplasts
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Hunter of Idle Workstations Miron Livny Marvin Solomon University of Wisconsin-Madison URL:
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Review of Condor,SGE,LSF,PBS
Scheduling & Resource Management in Distributed Systems Rajesh Rajamani, May 2001.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Introduction.
Condor A New PACI Partner Opportunity Miron Livny
Basic Grid Projects – Condor (Part I)
Condor-G Making Condor Grid Enabled
Presentation transcript:

Condor Project Computer Sciences Department University of Wisconsin-Madison Condor: A Project and a System Scientific Data Intensive Computing Workshop 04 Microsoft Research May 2004

2 Outline What is the Condor Project? What is the Condor HTC Software? Recipe for using desktops for science Data!

3 The Condor Project (Established 85) Distributed High Throughput Computing research performed by a team of ~35 faculty, full time staff and students.

4 The Condor Project (Established 85) Distributed High Throughput Computing research performed by a team of ~35 faculty, full time staff and students who: face software engineering challenges in a heterogeneous distributed environment are involved in national and international grid collaborations, actively interact with academic and commercial users, maintain and support large distributed production environments, and educate and train students. Funding – US Govt. (DoD, DoE, NASA, NSF, NIH), AT&T, IBM, INTEL, Microsoft, UW-Madison, …

5 A Multifaceted Project Harnessing the power of clusters - opportunistic and/or dedicated (Condor) Job management services for Grid applications (Condor-G, Stork) Fabric management services for Grid resources (Condor, GlideIns, NeST) Distributed I/O technology (Parrot, Kangaroo, NeST) Job-flow management (DAGMan, Condor, Hawk) Distributed monitoring and management (HawkEye) Technology for Distributed Systems (ClassAD, MW) Packaging and Integration (NMI, VDT)

6 Outline What is the Condor Project? What is the Condor HTC Software? Recipe for using desktops for science Data!

7 What is Condor? Condor converts collections of distributively owned workstations and dedicated clusters into a distributed fault-tolerant high- throughput computing (HTC) facility. Distributed Ownership: decrease in cost- performance ratio caused Huge increase in organization aggregate computing capacity Much smaller increase in the capacity accessible by a single person HTC Large amounts of processing capacity sustained over very long time periods

8 Condor can manage a large number of jobs Managing a large number of jobs You specify the jobs in a file and submit them to Condor, which runs them all and keeps you notified on their progress Mechanisms to help you manage huge numbers of jobs (1000s), the data, etc. Condor can handle work flow / inter-job dependencies (DAGMan) Condor users can set job priorities Condor administrators can set user priorities

9 Condor can manage Dedicated Resources… Dedicated Resources Compute Clusters Manage Node monitoring, scheduling Job launch, monitor & cleanup

10 …and Condor can manage non-dedicated resources Non-dedicated resources examples: Desktop workstations in offices Workstations in student labs Non-dedicated resources are often idle --- ~70% of the time! Condor can effectively harness the otherwise wasted compute cycles from non-dedicated resources

11 Some HTC Challenges Condor does whatever it takes to run your jobs, even if some machines… Crash (or are disconnected) Run out of disk space Dont have your software installed Are frequently needed by others Are far away & managed by someone else

12 The Condor System Unix and Win2k/XP Operational since 1986 Just at UW: more than 1800 CPUs in 10 pools on our campus Software available free on the web Open license Adopted by the real world (Galileo, Maxtor, Micron, Oracle, Tigr, Xerox, NASA, Texas Instruments, … )

13 Downloads and Deployments

14

15 Outline What is the Condor Project? What is the Condor HTC Software? Recipe for using desktops for science Data!

16 Recipe Tip: Useful Distributed Ownership mechanisms in Condor Checkpoint / Migration Checkpoint == picture of process state Enables preempt/resume scheduling and migration, ensures forward progress Remote System Calls Redirect I/O and other system calls back to the submit machine. Matchmaking with ClassAds

17 ClassAds Set of bindings of Attribute Names to Expressions Self-describing (no separate schema) Combine query and data Arbitrarily composed and nested Bilateral Resource owners are generous if it doesnt cost them anything!

18 Examples [ Type= "Job"; Owner= "raman"; Cmd= "run_sim"; Args= "-Q "; Cwd= "/u/raman"; Memory= 31; Qdate= ;... Rank= other.Kflops... Requirements= other.Type =... ] [ Type= "Machine"; Name= "xxy.cs...."; Arch= "iX86"; OpSys= "Solaris"; Mips= 104; Kflops= 21893; State= "Unclaimed"; LoadAvg= ;... Rank=...; Requirements=...; ]

19 Attribute Expressions Constants 104, , "iX86" References attr, self.attr, other.attr, expr.attr Operators+, *, >>, =, &&,... Functions strcat, substr, floor, member,... Lists { expr, expr,... } ClassAds [ name=expr; name=expr;... ]

20 Examples Descriptive attributes Type = "Job"; Owner = "raman"; Arch = "iX86"; OpSys = "Solaris"; Memory = 64;// megabytes Disk = ;// k bytes

21 Examples Current state Daytime = 36017;// secs past midnight KeyboardIdle = 1432;// seconds State = "Unclaimed"; LoadAvg = ;

22 Examples Parameters ResearchGrp = { "raman", "miron", "solomon", "jbasney" }; Friends = { "tannenba", "wright" }; Untrusted = { "rival", "riffraff" }; WantCheckpoint = 1;

23 Examples Derived data Rank =// machine's rank for job 10 * member(other.Owner,ResearchGrp) + member(other.Owner, Friends); Rank =// job's rank for machine Kflops/1E3 + other.Memory/32;

24 Examples Job constraint Requirements = other.Type = "Machine" && Arch = "iX86" && OpsSys = "Solaris" && Disk > && other.Memory >= self.Memory;

25 Examples Machine constraint Requirements = ! member(other.Owner, Untrusted) && Rank >= 10 ? true : Rank > 0 ? (LoadAvg 15*60) : DayTime 18*60*60;

26 Matching Algorithm To match two ads A and B Set up environment such that in A self evaluates to A other evaluates to B other attributes are searched for first in A and then in B and vice versa (with A and B interchanged) Check if A.Requirements and B.Requirements both evaluate to true A.Rank and B.Rank for preferences

27 Three-valued Logic other.Memory > 32all other.Memory == 32UNDEFINED other.Memory != 32 if other has no !(other.Memory == 32)"Memory" attribute other.Mips >= 10 || other.Kflps >= 1000 TRUEif either attribute exists and satisfies the given condition

28 Recipe Tip: Build from Bottom up! Start with a service for a single user, on a single machine. Personal Condor Condor on your own workstation, no local system/root access required, no system administrator intervention needed

29 your workstation personal Condor 600 Condor jobs

30 Personal Condor?! Whats the benefit of a Condor Pool with just one user and one machine?

31 Your Personal Condor will... … keep an eye on your jobs and will keep you posted on their progress … implement your policy on the execution order of the jobs … keep a log of your job activities … add fault tolerance to your jobs … implement your policy on when the jobs can run on your workstation

32 Expand from your desktop… Build a Condor pool inside your organization Install Condor on multiple machines, pointing them to your initial machine as the manager. Utilize Condor resources at remote organizations (build a grid) Takes advantage of your Condor-using friends… Get permission to access their resources flock Then configure your Condor pool to flock to these pools Accounting system is flocking aware

33 your workstation Friendly Condor Pool personal Condor 600 Condor jobs Condor Pool

34 Condor-G What about resources at remote organizations that are NOT managed via Condor? (perhaps they are managed via PBS, SGE, LSF, …) Condor-G Job task-broker for Grid Middleware. Submit jobs to resources managed via grid middleware such as Globus (GT2 & GT3), Nordugrid, Unicore, or Oracle (or Condor) Oracle: run PL/SQL programs on Oracle just like a normal job, via transactions, put in DAGs, etc.

35 Condor GlideIn Problems What if the grid middleware or remote scheduler doesnt provide services I want? What about end-to-end semantic guarantees? Solution Submit the Condor daemons to remote schedulers instead of the job When the resources run these GlideIn jobs, they will temporarily join her Condor Pool, and run the job as usual.

36 your workstation Friendly Condor Pool personal Condor 600 Condor jobs Globus Grid PBS LSF Condor Condor Pool glide-in jobs

37 Outline What is the Condor Project? What is the Condor HTC Software? Recipe for using desktops for science Data! Harmonize computation w/ data storage and data movement.

38 Data Movement: Stork Scheduler for wide-area data transfer Condor historically focused on CPU allocation Data movement was implicit side-effect Stork elevates data movement to be a first class citizen Data movement is another type of node within a job dependency graph Data movement is now queued, scheduled, monitored, managed, check-pointed

39 Data Access: Parrot Useful in distributed batch systems where one has access to many CPUs, but no consistent distributed filesystem (BYOFS!). Works with legacy programs % gv /gsiftp/ % grep Yahoo /http/

40 Data Storage: NeST Storage management software Complementary piece of Condor software; adds storage management to the traditional CPU management Key features User level Guaranteed storage reservations that allow higher-level scheduling and planning (e.g. Stork) Flexible, extendible protocol layer allows easy integration with existing middle-ware and applications Easily deployable via glide-in

41 Practical and easily deployable User-level; requires no privilege Package NeST as standard batch jobs Result: Managed storage General; glide-in works everywhere Gliding-in storage mgmt Internet SGE NeST Home store

42 BirdBath SOAP Interfaces to Condor Services LBNL: Workflow, ZSI (soon ? LIGO, Laser Interferometer Gravitational-Wave Observatory ) IU: Portals UK College of London|Cambridge:.NET

43 The Idea Computing power is everywhere, we try to make it usable by anyone.

44 Thank you! Condor Project on the Web: