Greg Thain Computer Sciences Department University of Wisconsin-Madison Condor Parallel Universe.

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

Generic MPI Job Submission by the P-GRADE Grid Portal Zoltán Farkas MTA SZTAKI.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Greg Quinn Computer Sciences Department University of Wisconsin-Madison Condor on Windows.
More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.
1 Using Stork Barcelona, 2006 Condor Project Computer Sciences Department University of Wisconsin-Madison
Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Supporting MPI Applications on EGEE Grids Zoltán Farkas MTA SZTAKI.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
Condor Project Computer Sciences Department University of Wisconsin-Madison Asynchronous Notification in Condor By Vidhya Murali.
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machine Universe in.
Jim Basney Computer Sciences Department University of Wisconsin-Madison Managing Network Resources in.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Condor Project Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Cheap cycles from the desktop to the dedicated cluster: combining opportunistic and dedicated scheduling with Condor Derek Wright Computer Sciences Department.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Computing I CONDOR.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, UWisc Condor Week April 13, 2010.
Greg Thain Computer Sciences Department University of Wisconsin-Madison cs.wisc.edu Interactive MPI on Demand.
Sharif University of technology, Parallel Processing course, MPI & ADA Server Introduction By Shervin Daneshpajouh.
Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,
Condor Project Computer Sciences Department University of Wisconsin-Madison A Scientist’s Introduction.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
Alain Roy Computer Sciences Department University of Wisconsin-Madison I/O Access in Condor and Grid.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
Condor Usage at Brookhaven National Lab Alexander Withers (talk given by Tony Chan) RHIC Computing Facility Condor Week - March 15, 2005.
Interactive Workflows Branislav Šimo, Ondrej Habala, Ladislav Hluchý Institute of Informatics, Slovak Academy of Sciences.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Derek Wright Computer Sciences Department University of Wisconsin-Madison New Ways to Fetch Work The new hook infrastructure in Condor.
How to for compiling and running MPI Programs. Prepared by Kiriti Venkat.
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
Holding slide prior to starting show. Applications WG Jonathan Giddy
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job.
Greg Thain Computer Sciences Department University of Wisconsin-Madison Configuring Quill Condor Week.
1 HPCI Presentation Kulathep Charoenpornwattana. March 12, Outline Parallel programming with MPI Running MPI applications on Azul & Itanium Running.
Joe Meehean Computer Sciences Department University of Wisconsin-Madison Problems of Dynamic Service.
1 Running MPI on “Gridfarm” Bryan Carpenter February, 2005.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor and DAGMan Barcelona,
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Job Router.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Interpreted Jobs.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor and Virtual Machines.
Building the International Data Placement Lab Greg Thain Center for High Throughput Computing.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor-G: Condor and Grid Computing.
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
Greg Quinn Computer Sciences Department University of Wisconsin-Madison Privilege Separation in Condor.
Condor DAGMan: Managing Job Dependencies with Condor
Moving CHTC from RHEL 6 to RHEL 7
Building Grids with Condor
Condor: Job Management
Using the Parallel Universe beyond MPI
Accounting, Group Quotas, and User Priorities
Basic Grid Projects – Condor (Part I)
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
The Condor JobRouter.
Condor: Firewall Mirroring
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

Greg Thain Computer Sciences Department University of Wisconsin-Madison Condor Parallel Universe

Overview › Task vs. Job Parallelism › New Condor support for Task- Parallelism › Other goodies

The Talk in one Slide Parallel Universe can run any * task parallel job Not just MPICH Not just MPI…

Job vs Task Parallelism › Condor historically focused on Job Parallelism › Job parallelism either manually or via DAGman › Rest of talk on task parallelism › Can also get task parallel via pvm or MW

Parallel Universe › Adaptation of MPI universe › Modifications based on experience with MPI › User feedback › But, more than just MPI

MPI lifecycle without Condor › Lam Version 1. lamboot lamboot -ssi boot ssh machine_file 2. mpirun mpirun -np 8 exe arg1 arg lamhalt lamhalt

Scheduling › Need “Dedicated Scheduler”  "Dedicated" has a specific Condor meaning  Nodes running MPI require a dedicated scheduler  A Given machine can have many opportunistic schedulers ... but only 1 dedicated scheduler

DedicatedScheduler surprises › DedicatedScheduler co-opts normal negotiation cycle › Preemption and scheduling work differently than opportunistic › DedicatedScheduler schedules First- Fit, sorted by UserJobPrio › Condor_q –analyze mystery!

Job startup › Same file transfer, etc. as Vanilla › One shadow, many starters › Starter runs sshd on all machines, does key exchange › Starter runs the exe on first machine  (head node, Rank0)

Your script Here › Script on the head node has contact file › We provide samples for LAM, MPICH › We try to mimic “by hand” startup › Use condor_ssh to start remote jobs › When script exits, condor cleans up

Parallel Example Submit Machine Execute Machines Schedd Shadow Startd Sshd Script Job starter

Example submit file Universe = Parallel # executable is a script executable = script # the real binary transfer_input_files = executable arguments = arg1 arg2 arg3 machine_count = 8 output = out.$(Cluster).$(NODE) queue

Example Script chmod 755 simple lamboot –ssi boot rsh $MACHINE_FILE mpirun –np $NO_MACHINES simple lamhalt

Example submit file 2 Universe = Parallel Requirements = (Hostname == “somemachine”) queue Requirements = (Hostname != “somemachine”) queue 7

Example Script 2 mach1 = `sed –n 1p $MACHINE_FILE` mach2 = `sed –n 2p $MACHINE_FILE`./server & ssh $mach1 client_app ssh $mach2 client_app wait

Summary › With Parallel Universe in Condor 6.8 comes: › Support for most MPI implementations (some scripting required) › Somewhat better MPI scheduling › Better node placement via condor matchmaking

Questions? › Thank you!