Condor to Every Corner of Campus Condor Week 2010 Preston Smith Purdue University.

Slides:



Advertisements
Similar presentations
Condor use in Department of Computing, Imperial College Stephen M c Gough, David McBride London e-Science Centre.
Advertisements

Windows Deployment Services WDS for Large Scale Enterprises and Small IT Shops Presented By: Ryan Drown Systems Administrator for Krannert.
Building Campus HTC Sharing Infrastructures Derek Weitzel University of Nebraska – Lincoln (Open Science Grid Hat)
Building a Sustainable Data Center Matthew Holmes Johnson County Community College.
Deliver your Technology-Based Labs with VMware Lab Manager 5/6/2010 Michael Fudge.
Senior Design Lab Policies Presented by: Trey Murdoch CSC IT Staff.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Offering your Windows Server Class Online. Tony Basilico Community College of Rhode Island
PRESTON SMITH ROSEN CENTER FOR ADVANCED COMPUTING PURDUE UNIVERSITY A Cost-Benefit Analysis of a Campus Computing Grid Condor Week 2011.
Copyright Anthony K. Holden, This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial,
Cloud Don McGregor Research Associate MOVES Institute
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machine Universe in.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Derek Wright Computer Sciences Department, UW-Madison Lawrence Berkeley National Labs (LBNL)
Minerva Infrastructure Meeting – October 04, 2011.
Windows 2000 and Active Directory Services at UQ Scott Sinclair Senior Systems Programmer Software Infrastructure Group
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
Condor Project Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Tanenbaum 8.3 See references
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
Implementing a Central Quill Database in a Large Condor Installation Preston Smith Condor Week April 30, 2008.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Virtual Desktops and Flex CSU-Pueblo Joseph Campbell.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
Virtualization Lab 3 – Virtualization Fall 2012 CSCI 6303 Principles of I.T.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Grid Appliance – On the Design of Self-Organizing, Decentralized Grids David Wolinsky, Arjun Prakash, and Renato Figueiredo ACIS Lab at the University.
University Information Services Managed Cluster Service (MCS) and Desktop Services Computer Laboratory Thursday 9 th October 2014.
John Kewley e-Science Centre CCLRC Daresbury Laboratory 28 th June nd European Condor Week Milano Heterogeneous Pools John Kewley
Ian Alderman A Little History…
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Step By Step Windows Server 2003 Installation Guide Step By Step Windows Server 2003 Installation Guide.
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
608D CloudStack 3.0 Omer Palo Readiness Specialist, WW Tech Support Readiness May 8, 2012.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Experiences with a HTCondor pool: Prepare to be underwhelmed C. J. Lingwood, Lancaster University CCB (The Condor Connection Broker) – Dan Bradley
Block1 Wrapping Your Nugget Around Distributed Processing.
CAEN Wireless Network College of Engineering University of Michigan October 16, 2003 Dan Maletta.
Simplifying Resource Sharing in Voluntary Grid Computing with the Grid Appliance David Wolinsky Renato Figueiredo ACIS Lab University of Florida.
Condor In Flight at The Hartford 2006 Transformations Condor Week 2007 Bob Nordlund.
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
Cliff Addison University of Liverpool Campus Grids Workshop October 2007 Setting the scene Cliff Addison.
Purdue Campus Grid Preston Smith Condor Week 2006 April 24, 2006.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
| nectar.org.au NECTAR TRAINING Module 5 The Research Cloud Lifecycle.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Hands-On Virtual Computing
Purdue RP Highlights TeraGrid Round Table May 20, 2010 Preston Smith Manager - HPC Grid Systems Rosen Center for Advanced Computing Purdue University.
ISU Workshop IV 4/5/07 1 OSG  ISU Brief History: ISU linux farm began producing B A B AR MC in 2001 (12 cpus, 200 GB) by 2005 had 41.
1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.
Advisory Board Meeting March Earth System Science.
Group # 14 Dhairya Gala Priyank Shah. Introduction to Grid Appliance The Grid appliance is a plug-and-play virtual machine appliance intended for Grid.
John Samuels October, Why Now?  Vista Problems  New Features  >4GB Memory Support  Experience.
 Prepared by: Eng. Maryam Adel Abdel-Hady
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Purdue RP Highlights TeraGrid Round Table November 5, 2009 Carol Song Purdue TeraGrid RP PI Rosen Center for Advanced Computing Purdue University.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Guide to Operating Systems, 5th Edition
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Creating a Windows 10 Virtual machine
Dev Test on Windows Azure Solution in a Box
Introduce yourself Presented by
Presentation transcript:

Condor to Every Corner of Campus Condor Week 2010 Preston Smith Purdue University

Outline Campus Grids Condor Nests at Purdue –Central Clusters –Computing Labs –Departmental Resources –TeraGrid Budget Realities in 2010 –IT Cost Reduction Spreading Wings Across Campus –Making Condor Easy for IT –Dashboard Virtualization –VMGlide Condor Week 2010

Campus Grids Open Science Grid Campus Grids Workshop held in January –Identified themes common to many Campus Grid implementations Barriers are often diplomatic rather than technological At the core, a campus grid is a way for an institution to share resources, and maximize its investment in computing –Many different ways to share resources Purdue, FermiGrid, GLOW, others all implement in their own way Condor Week /?pid=

Community Clusters Purdue’s model for resource sharing begins here Peace of Mind –Professional systems administration so faculty and graduate students can concentrate on research. Low Overhead –Central data center provides infrastructure such as networking, storage, racks, floor space, cooling, and power. Cost Effective –Works with vendors to obtain the best price for computing resources, pooling funds from different disciplines to leverage greater group purchasing power. Large purchases also leveraged for departmental server acquisitions Condor Week 2010

Community Clusters Condor Week 2010 Backfilling on idle HPC cluster nodes –Condor runs on idle cluster nodes (nearly 16,000 cores today) when a node isn’t busy with PBS (primary scheduler) jobs

Central Cluster Usage Maximizing value from investment Condor Week 2010 Condor: 15.7% PBS: 81% Harvesting 15% of this many machines’ availability is 22 million potential hours per year!

Student Labs ITaP operates nearly 2000 lab machines used in classrooms, general student labs, and for departments. –Nearly 6000 cores among those 2000 machines Condor Week 2010

Distributed IT Less than half of IT at Purdue is centralized –Remainder is in individual colleges and departments – 27,317 desktop machines at West Lafayette, relatively few of which are operated by ITaP Many of these islands of IT are quite large –Agriculture, Computer Science, Engineering, Management, Physical Facilities, Liberal Arts, Education –1000+ machines each Many of these IT organizations are in the Condor Grid already But many are not… Condor Week 2010 This is where the room for growth is!

Grid Overview Condor Week 2010

TeraGrid Purdue provides the campus Condor pool to the nation via the TeraGrid –50% of jobs on TeraGrid in were single-CPU –Of those, 64% ran for an hour or less (Arvind Gopu of Indiana University – TG’07) Robetta gateway, many others regularly use Purdue Condor on TeraGrid Condor will continue to be a TeraGrid resource through the end of the TeraGrid project Condor Week 2010

Blah Blah Blah You say “Sure, we’ve heard all this before.. What’s new?” Condor Week 2010

State Budgets in 2010 A common conversation on campuses today –Higher Ed in Indiana has been directed to reduce budget by 5.5% At Purdue, we have been given the following charge for IT: –“Identifying cost savings approaches that will generate at least $15M recurring over time while providing high quality information technology (IT) services to meet the University’s strategic goals.” Data Centers Computer Labs Power Savings Strategic Sourcing for Purchasing Condor Week 2010

What does this have to do with Condor? The Campus Grid ties into several of these areas –Data Centers Building community clusters instead of private ones, and then maximizing usage with Condor –Computer Labs Centralize management of labs – and run Condor on the machines –Strategic Sourcing in purchasing For example, community cluster purchase for good pricing –Power Savings Virtualized data centers, power off idle computers “Power credits” for running Condor Condor Week 2010

Power Off or Install Condor Recommendations from committee report –“Thou shalt turn off thy computer or install Condor and join the Campus Grid” –“Thou should power-save your machines and we should find tools to manage their waking-up”

Power? The Blue-Ribbon committee making recommendations probably didn’t know this.. Condor Week 2010 But this also sounds like a job for Condor! –Killing two “birds” with one stone –Add machines to the Campus Grid – harvest the cycles, as already recommended –Power-save machines by a policy –Wake them up when needed

Now What? Currently a “recommendation”, not quite a “policy” yet What happens when it becomes a policy? –The bet is that IT folks won’t want to shut their machines down overnight, when they’re currently do backups/software distribution/deep freezing…. –Expect a tsunami of demand coming our way Condor Week 2010

How to Prepare? Host periodic on-campus Condor “Boot camp” for users and sysadmins –These are very much like the User and Administrator tutorials many of you were in yesterday (Thanks to the Condor team for letting us base things off of their materials) Ease of deployment –Provide pre-configured binaries Windows, Linux (RHEL, Debian, Ubuntu, Fedora) Configurability –Centrally managing Condor Configurations on machines with distributed ownership –… while leaving configuration also in the hands of the machine’s owners Machine owners need to be confident that they remain in control of how their machines are used. –Condor is perfect for this! Scoreboard: – “My Dean wants to know how much work our machines have provided” –The president asks how much work her individual machine has done Security questions? Condor Week 2010

Manageability So, given: –Thousands upon thousands of Windows lab machines, or all sorts of machines around campus that my staff don’t administratively control.. How do we manage Condor on them? We use Cycle Computing’s CycleServer VM appliances are configured to report in to CycleServer for management As are the native OS installers that we distribute Condor Week 2010 PLUG ALERT

Managing Configurations Around Campus Condor Week 2010

Scoreboard A common question – –How much work has my machine done for Condor? Even the president has asked… –System-tray application (a la condor_birdwatcher) being developed to query startd history to answer that very question. For an high-level view: –CondorView is helpful, but not quite what we want –CycleServer has been able to help Condor Week 2010

F A’d Qs about Security Can any ding-dong submit any code to my machine? –No – only specific machines with access limited to people with Purdue Career Accounts run a schedd Ok, fine, but what if they submit something nasty to our machine? –Then we know who they are and go club them with the appropriate IT policies. What about data on our faculty members’ workstations Is it safe? Could a job steal it? –Well, maybe. Are their file permissions set appropriately? Condor Week 2010

Sandbox College of Engineering asks – –Can we sandbox Condor jobs away from the execution host? We think “sure” – and it’ll also make those Windows boxes more generally useful. –Maximizing investment again Condor Week 2010

Condor in VM appliances Many ways to skin that cat –CoLinux from several years ago –Marquette from a few minutes ago –Condor as a virtual machine manager from a few minutes before that Some effort spent similar to what Marquette’s doing But mostly on what we’ve dubbed VM-Glide –Using Condor to submit VM “nodes” as jobs to the lab machines –Lab machines run VMWare workstation – Which is ok for Universities to use for “instruction and research” and for “grid and utility computing” if you enroll in a partner program Condor Week 2010

VM-Glide Our solution is based on the Grid Appliance infrastructure from Florida’s ACIS lab IPOP P2P network fabric –Solves NAT issues and IP space problems that come with bridged networking –No requirement for single VPN router to connect real network with the virtual overlay network. –See talk from Condor Week 2009 We only need to run IPOP services (a userland application) on all central submit nodes to access nodes in the virtual pool Condor Week 2010

How well did this work? Set up a dedicated schedd with lots of disk to hand out VMs to student labs –(Fast) disk is important – checkpointing memory adds up! Configure lab machines to claim the entire machine when a single VM Universe job runs –Apparently users notice when 4 or more VMs try and evict when they sit down Now we’re cooking – got nearly 1000 VMs running in labs over a weekend –All of which are running user jobs –IPOP fabric holds up great Condor Week 2010

So, you run this all the time now, right? Well, not quite –Even with just 1 VM per machine, vacating is still noticeable by the end users –Lab admins say: “Maybe it’s the 100Mbit connection the machines are on”. After we cried a little inside.. –How to deal with this? Use squids local to labs to cache VM images? –Nope, the lab network architecture doesn’t lend to that Pre-stage VMs on machines and just start with Condor? –Nope, VM-GAHP doesn’t actually let you do that. Upgrade network in labs? –Cost-prohibitive – switch gear is old enough that it’s not gigabit capable Condor Week 2010

Next steps? Fortunately, a campus network upgrade is in progress –With new switches, will benchmark again Lab admins enabling vt support in BIOS –Allow for 64-bit VMs (more jobs want this) –Will probably make VMWare run faster, too Pre-stage VMs –Hack the VM-GAHP to start pre-staged VMs –Or use a file transfer plugin to copy from local hard drive Condor Week 2010

What’s Next We expect to add Condor to machines from all across campus –And system-wide.. We hope to use Condor as the tool to manage power on machines across campus Virtualization of compute environments will be a key characteristic of this environment –In labs and desktops, as well as on cluster nodes (KVM) Condor Week 2010 Thanks to the Condor Team for all the Software!

The End Questions? OSG Campus Grids Workshop