Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University of Oklahoma Sumanth J.V. University of Nebraska-Lincoln Condor Week, University of Wisconsin, Tuesday May
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May Condor: Linux vs Windows Condor inside Linux: full featured Condor inside Windows®: “clipped” No autocheckpointing No job automigration No remote system calls No Standard Universe
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May Lots of PCs in IT Labs At many institutions, there are lots of PC labs managed by a central IT organizations. If the head of IT (e.g., CIO) is on board, then all of these PCs can be Condorized. But, these labs tend to be Windows® labs, not Linux. So you can’t take the Windows® desktop experience away from the desktop users, just to get Condor. So, how can we have Linux Condor AND Windows® desktop on the same PC at the same time?
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May Solution Attempt #1: VMware Attempted solution: VMware Linux as native host OS Condor inside Linux VMware inside Linux Windows® inside VMware Tested on ~200 PCs in IT PC labs (Union, library, dorms, Physics Dept) In production for over a year
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May VMware Disadvantages Attempted solution: VMware Linux as native host OS Condor inside Linux VMware inside Linux Windows® inside VMware Disadvantages VMware costs money! (Less so now than then.) Crashy VMware performance tuning (straight to disk) was unstable Sensitive to hardware heterogeneity Painful to manage CD/DVD burners and USB drives didn’t work in some PCs.
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May A Better Solution: coLinux Cooperative Linux (coLinux) FREE! Runs inside native Windows® No sensitivity to hardware type Better performance Easier to customize Smaller disk footprint and lower CPU usage in idle Minimal management required (~10 hours/month)
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May Compatibility Issue About 30 of the 200 lab PCs we installed coLinux on had problems with it, so those PCs now run a prerelease version of coLinux. We have no idea why the production version of coLinux was a miserable failure on these 30 PCs, nor why the prerelease version succeeded.
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May Preventing BSOD The Data Execution Prevention feature inside Windows®, when running on some newer processors, can conflict with coLinux and cause system failure. The solution to this problem is to add the /NOEXECUTE switch to the Windows® boot.ini.
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May Network Issue Networking options Bridged: Each PC has to have a second IP address, so the institution has to have plenty of spare IP addresses available. (Oklahoma solution) NAT: The Condor pool requires a Generic Connection Broker (GCB) on a separate, dedicated PC (hardware $), and has some instability. Switched to OpenVPN.(Nebraska solution) Nebraska experimented with port forwarding in Windows®, but abandoned it for OpenVPN because of security and usability.
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May Traversing NATs and Firewalls What is GCB (Connection Broker)? Socket level approach. A broker arranges connections between machines inside the firewall and machines outside the firewall. What is OpenVPN (Open Virtual Private Network)? A network within a network. Virtual network adapter. Virtual IP (static/dynamic). TCP within UDP. Client/Server architecture. All to All communication. All traffic is encrypted by default.
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May OpenVPN When using GCB, each machine is represented by a unique port on the broker. Central Manager sees all the machines as, etc. Only applications linked against GCB work. (Condor is already linked) When using OpenVPN, each machine has a unique virtual IP address in the VPN. Simplifies troubleshooting. Central Manager is also part of the OpenVPN and runs in server mode. ClientConnect.py Determines Virtual IP of a new client based on its Real IP. E.g. node has real IP gets virtual IP Pushes this configuration to the clients. Updates /etc/hosts. OpenVPN lockups can be fixed with mssfix 1200 and fragment 1200 options?
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May OpenVPN No modification of application required to use OpenVPN. We have successfully mounted NFS shares (CMS stack). Inbound SSH access Since all-to-all communication is present, even MPICH works. Remember all-to-all communication still has to go through the OpenVPN server. Secure No firewall required in coLinux.
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May Monitoring Issue Condor inside Linux monitors keyboard and mouse usage to decide when to suspend a job. In coLinux, this is tricky. We had to set up a Visual Basic script on the Windows® side to send the keyboard and mouse information to coLinux. UNL implements a similar idea in C++, and OU is now doing likewise. UNL collects all the keyboard and mouse data on a server, while OU does it on each local machine. But the result is the same.
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May Monitoring coLinux Labs How to determine whether all the machines in each lab are up an running? condor_status only displays working machines. What about missing machines? We need a list of expected but missing nodes per lab. We need a physical layout of the nodes in each lab. MYSQL database to store lab info. Need to separately handle static and dynamic IP labs. Static IPs are easy to handle. Store IP and relative co-ordinates of the node. Dynamic IP store a lambda function expressing how to determine if a machine belongs to a lab. E.g. lambda x: '18-' in x - matches node-18-2, node-18-3 … Store expected number of machines per lab, known hardware/software issues as notes per machine. Compare output of condor_status and MYSQL database. Demo: Web front-end developed for mod_python.
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May How to Build a Multistate Grid To make a prairie It takes a clover and one bee. One clover, and a bee, and reverie. The reverie alone will do, If bees are few. – Emily Dickinson, /08/24/on-our-walk/
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May OU’s NSF CI-TEAM Project OU recently received a grant from the National Science Foundation’s Cyberinfrastructure Training, Education, Advancement, and Mentoring for Our 21st Century Workforce (CI-TEAM) program. Objectives: Teach general HPC concepts to a broad audience Provide Condor resources to the national community Teach users to use Condor and sysadmins to deploy and administer it Teach bioinformatics students to use BLAST over Condor
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May OU NSF CI-TEAM Project teach students and faculty to use FREE Condor middleware, stealing computing time on idle PCs; teach system administrators to deploy and maintain Condor on PCs; teach bioinformatics students to use BLAST on Condor; provide Condor Cyberinfrastructure to the national community (FREE). Condor pool of 750 desktop PCs (already part of the Open Science Grid); Supercomputing in Plain English workshops via videoconferencing; Cyberinfrastructure rounds (consulting) via videoconferencing; drop-in CDs for installing full-featured Condor on a Windows PC (Cyberinfrastructure for FREE); sysadmin consulting for installing and maintaining Condor on desktop PCs. OU’s team includes: High School, Minority Serving, 2-year, 4-year, masters-granting; 11 of the 15 institutions are in 4 EPSCoR states (AR, KS, NE, OK). Cyberinfrastructure Education for Bioinformatics and Beyond Objectives:OU will provide:
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May OU NSF CI-TEAM Project Participants at OU (29 faculty/staff in 16 depts) Information Technology OSCER: Neeman (PI) College of Arts & Sciences Botany & Microbiology: Conway, Wren Chemistry & Biochemistry: Roe (Co-PI), Wheeler Mathematics: White Physics & Astronomy: Kao, Severini (Co-PI), Skubic, Strauss Zoology: Ray College of Earth & Energy Sarkeys Energy Center: Chesnokov College of Engineering Aerospace & Mechanical Engr: Striz Chemical, Biological & Materials Engr: Papavassiliou Civil Engr & Environmental Science: Vieux Computer Science: Dhall, Fagg, Hougen, Lakshmivarahan, McGovern, Radhakrishnan Electrical & Computer Engr: Cruz, Todd, Yeary, Yu Industrial Engr: Trafalis OU Health Sciences Center, Oklahoma City Biochemistry & Molecular Biology: Zlotnick Radiological Sciences: Wu (Co-PI) Surgery: Gusev Participants at other institutions (19 faculty/staff at 14 institutions) California State U Pomona (masters-granting, minority serving): Lee Contra Costa College (2-year, minority serving): Murphy Earlham College (4-year): Peck Emporia State U (masters-granting, EPSCoR): Pheatt, Ballester Kansas State U (EPSCoR): Andresen, Monaco Langston U (masters-granting, minority serving, EPSCoR): Snow Oklahoma Baptist U (4-year, EPSCoR): Chen, Jett, Jordan Oklahoma School of Science & Mathematics (high school, EPSCoR): Samadzadeh St. Gregory’s U (4-year, EPSCoR): Meyer U Arkansas (EPSCoR): Apon U Central Oklahoma (masters-granting, EPSCoR): Lemley, Wilson U Kansas (EPSCoR): Bishop U Nebraska-Lincoln (EPSCoR): Swanson U Northern Iowa (masters-granting): Gray E E E E
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May How to Create a Multistate Grid? Grids aren’t primarily about technology! You need to recruit people, by offering them more than you ask them to provide. 1.Go to their institution. 2.Give a really fun and interesting talk about your stuff. 3.Tell them that they can use your stuff for free. 4.Make them commit to using your stuff. 5.Help them use your stuff. 6.If possible, get them to visit you and see your stuff.
Unclipped Condor in Windows via coLinux Condor Week, Tuesday May OU NSF CI-TEAM Project Participants at OU (29 faculty/staff in 16 depts) Information Technology OSCER: Neeman (PI) College of Arts & Sciences Botany & Microbiology: Conway, Wren Chemistry & Biochemistry: Roe (Co-PI), Wheeler Mathematics: White Physics & Astronomy: Kao, Severini (Co-PI), Skubic, Strauss Zoology: Ray College of Earth & Energy Sarkeys Energy Center: Chesnokov College of Engineering Aerospace & Mechanical Engr: Striz Chemical, Biological & Materials Engr: Papavassiliou Civil Engr & Environmental Science: Vieux Computer Science: Dhall, Fagg, Hougen, Lakshmivarahan, McGovern, Radhakrishnan Electrical & Computer Engr: Cruz, Todd, Yeary, Yu Industrial Engr: Trafalis OU Health Sciences Center, Oklahoma City Biochemistry & Molecular Biology: Zlotnick Radiological Sciences: Wu (Co-PI) Surgery: Gusev Participants at other institutions (19 faculty/staff at 14 institutions) California State U Pomona (masters-granting, minority serving): Lee Contra Costa College (2-year, minority serving): Murphy Earlham College (4-year): Peck Emporia State U (masters-granting, EPSCoR): Pheatt, Ballester Kansas State U (EPSCoR): Andresen, Monaco Langston U (masters-granting, minority serving, EPSCoR): Snow Oklahoma Baptist U (4-year, EPSCoR): Chen, Jett, Jordan Oklahoma School of Science & Mathematics (high school, EPSCoR): Samadzadeh St. Gregory’s U (4-year, EPSCoR): Meyer U Arkansas (EPSCoR): Apon U Central Oklahoma (masters-granting, EPSCoR): Lemley, Wilson U Kansas (EPSCoR): Bishop U Nebraska-Lincoln (EPSCoR): Swanson U Northern Iowa (masters-granting): Gray E E E E
Thanks for your attention! Questions?