Download presentation
Presentation is loading. Please wait.
Published byAngel Marjorie Haynes Modified over 9 years ago
1
Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu http://www.cs.wisc.edu/~pfc High-Throughput Computing With Condor
2
www.cs.wisc.edu/condor Who Are We?
3
www.cs.wisc.edu/condor The Condor Project (Established ‘85) Distributed systems CS research performed by a team that faces: software engineering challenges in a Unix/Linux/NT environment, active interaction with users and collaborators, daily maintenance and support challenges of a distributed production environment, and educating and training students. Funding - NSF, NASA,DoE, DoD, IBM, INTEL, Microsoft and the UW Graduate School.
4
www.cs.wisc.edu/condor The Condor System
5
www.cs.wisc.edu/condor The Condor System › Unix and NT › Operational since 1986 › More than 1300 CPUs at UW-Madison › Available on the web › More than 150 clusters worldwide in academia and industry
6
www.cs.wisc.edu/condor What is Condor? › Condor converts collections of distributively owned workstations and dedicated clusters into a high- throughput computing facility. › Condor uses matchmaking to make sure that everyone is happy.
7
www.cs.wisc.edu/condor What is High-Throughput Computing? › High-performance: CPU cycles/second under ideal circumstances. “How fast can I run simulation X on this machine?” › High-throughput: CPU cycles/day (week, month, year?) under non-ideal circumstances. “How many times can I run simulation X in the next month using all available machines?”
8
www.cs.wisc.edu/condor What is High-Throughput Computing? › Condor does whatever it takes to run your jobs, even if some machines… Crash! (or are disconnected) Run out of disk space Don’t have your software installed Are frequently needed by others Are far away & admin’ed by someone else
9
www.cs.wisc.edu/condor What is Matchmaking? › Condor uses Matchmaking to make sure that work gets done within the constraints of both users and owners. › Users (jobs) have constraints: “I need an Alpha with 256 MB RAM” › Owners (machines) have constraints: “Only run jobs when I am away from my desk and never run jobs owned by Bob.”
10
www.cs.wisc.edu/condor “What can Condor do for me?” Condor can… › …do your housekeeping. › …improve reliability. › …give performance feedback. › …increase your throughput!
11
www.cs.wisc.edu/condor Some Numbers: UW-CS Pool 6/98-6/00 4,000,000hours ~450 years “Real” Users1,700,000hours ~260 years CS-Optimization610,000hours CS-Architecture350,000hours Physics245,000hours Statistics80,000hours Engine Research Center38,000hours Math90,000hours Civil Engineering27,000hours Business970hours “External” Users165,000hours ~19 years MIT76,000hours Cornell38,000hours UCSD38,000hours CalTech18,000hours
12
www.cs.wisc.edu/condor Condor & Physics
13
www.cs.wisc.edu/condor Current CMS Activity › Simulation (CMSIM) for CalTech provided >135,000 CPU hours to date peak day ~ 4000 CPU hours via NCSA Alliance, Condor has allocated 1,000,000 hours total to CalTech › Simulation and Reconstruction (CMSIM + ORCA) for HEP group at UW-Madison
14
www.cs.wisc.edu/condor INFN Condor Pool - Italy › Italian National Institute for Research in Nuclear and Subnuclear Physics › 19 locations, each running a Condor pool › as few as 1 CPU -- to >100 CPUs › each locally controlled › each “flocks” jobs to other pools when available
15
www.cs.wisc.edu/condor Particle Physics Data Grid › The PPDG Project is... a software engineering effort to design, implement, experiment, evaluate, and prototype HEP-specific data-transfer and caching software tools for Grid environments › For example...
16
www.cs.wisc.edu/condor Condor PPDG Work › Condor Data Manager technology to automate & coordinate data movement from a variety of long- term repositories to available Condor computing resources & back again keeping the pipeline full! SRB (SDSC), SAM (Fermi), PPDG HRM
17
www.cs.wisc.edu/condor PPDG Collaborators
18
www.cs.wisc.edu/condor National Grid Efforts › GriPhyN (Grid Physics Network) › National Technology Grid - NCSA Alliance (NSF-PACI) › Information Power Grid - IPG (NASA) › close collaboration with the Globus project
19
www.cs.wisc.edu/condor I have 600 simulations to run. How can Condor help me?
20
www.cs.wisc.edu/condor My Application … Simulate the behavior of F(x,y,z) for 20 values of x, 10 values of y and 3 values of z (20*10*3 = 600) F takes on the average 3 hours to compute on a “typical” workstation ( total = 1800 hours ) F requires a “moderate” (128MB) amount of memory F performs “moderate” I/O - (x,y,z) is 5 MB and F(x,y,z) is 50 MB
21
www.cs.wisc.edu/condor Step I - get organized! › Write a script that creates 600 input files for each of the (x,y,z) combinations › Write a script that will collect the data from the 600 output files › Turn your workstation into a “ Personal Condor ” › Submit a cluster of 600 jobs to your personal Condor › Go on a long vacation … (2.5 months)
22
www.cs.wisc.edu/condor your workstation personal Condor 600 Condor jobs
23
www.cs.wisc.edu/condor Step II - build your personal Grid › Install Condor on the desktop machine next door › …and on the machines in the classroom. › Install Condor on the department’s Linux cluster or the O2K in the basement. › Configure these machines to be part of your Condor pool. › Go on a shorter vacation...
24
www.cs.wisc.edu/condor your workstation personal Condor 600 Condor jobs Group Condor
25
www.cs.wisc.edu/condor Step III - take advantage of your friends › Get permission from “friendly” Condor pools to access their resources › Configure your personal Condor to “flock” to these pools › reconsider your vacation plans...
26
www.cs.wisc.edu/condor your workstation friendly Condor personal Condor 600 Condor jobs Group Condor
27
www.cs.wisc.edu/condor Think BIG. Go to the Grid.
28
www.cs.wisc.edu/condor Upgrade to Condor-G A Grid-enabled version of Condor that uses the inter-domain services of Globus to bring Grid resources into the domain of your Personal Condor Easy to use on different platforms Robust Supports SMPs & dedicated schedulers
29
www.cs.wisc.edu/condor Step IV - Go for the Grid › Get access (account(s) + certificate(s)) to a “Computational” Grid › Submit 599 “Grid Universe” Condor- glide-in jobs to your personal Condor › Take the rest of the afternoon off...
30
www.cs.wisc.edu/condor your workstation friendly Condor personal Condor 600 Condor jobs Globus Grid PBS LSF Condor Group Condor 599 glide-ins
31
www.cs.wisc.edu/condor What Have We Done with the Grid Already? › NUG30 quadratic assignment problem 30 facilities, 30 locations minimize cost of transferring materials between them posed in 1968 as challenge, long unsolved but with a good pruning algorithm & high-throughput computing...
32
www.cs.wisc.edu/condor NUG30 Personal Condor Grid For the run we will be flocking to -- the main Condor pool at Wisconsin (600 processors) -- the Condor pool at Georgia Tech (190 Linux boxes) -- the Condor pool at UNM (40 processors) -- the Condor pool at Columbia (16 processors) -- the Condor pool at Northwestern (12 processors) -- the Condor pool at NCSA (65 processors) -- the Condor pool at INFN (200 processors) We will be using glide_in to access the Origin 2000 (through LSF ) at NCSA. We will use "hobble_in" to access the Chiba City Linux cluster and Origin 2000 here at Argonne.
33
www.cs.wisc.edu/condor NUG30 - Solved!!! Sender: goux@dantec.ece.nwu.edu Subject: Re: Let the festivities begin. Hi dear Condor Team, you all have been amazing. NUG30 required 10.9 years of Condor Time. In just seven days ! More stats tomorrow !!! We are off celebrating ! condor rules ! cheers, JP.
34
www.cs.wisc.edu/condor Conclusion Computing power is everywhere, we try to make it usable by anyone.
35
www.cs.wisc.edu/condor Need more info? › Condor Web Page (http://www.cs.wisc.edu/condor) › Peter Couvares (pfc@cs.wisc.edu)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.