The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service.
3rd Campus Grid SIG Meeting. Agenda Welcome OMII Requirements document Grid Data Group HTC Workshop Research Computing SIG? AOB Next meeting (AG)
University of Southampton Electronics and Computer Science M-grid: Using Ubiquitous Web Technologies to create a Computational Grid Robert John Walters.
PlanetLab What is PlanetLab? A group of computers available as a testbed for computer networking and distributed systems research.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Upgrading Software CIT 1100 Chapter4.
John Kewley e-Science Centre GIS and Grid Computing Workshop 13 th September 2005, Leeds Grid Middleware and GROWL John Kewley
OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
CMPTR1 CHAPTER 3 COMPUTER SOFTWARE Application Software – The programs/software/apps that we run to do things like word processing, web browsing, and games.
Data streaming, collaborative visualization and computational steering using Styx Grid Services Jon Blower 1 Keith Haines 1 Ed Llewellin 2 1 Reading e-Science.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 8: Implementing and Managing Printers.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
Tripwire Enterprise Server – Getting Started Doreen Meyer and Vincent Fox UC Davis, Information and Education Technology June 6, 2006.
© 2010 VMware Inc. All rights reserved VMware ESX and ESXi Module 3.
Computer for Health Sciences
Eucalyptus Virtual Machines Running Maven, Tomcat, and Mysql.
Fundamentals of Networking Discovery 1, Chapter 2 Operating Systems.
DIRAC Web User Interface A.Casajus (Universitat de Barcelona) M.Sapunov (CPPM Marseille) On behalf of the LHCb DIRAC Team.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
©Kwan Sai Kit, All Rights Reserved Windows Small Business Server 2003 Features.
Computing on the Cloud Jason Detchevery March 4 th 2009.
Running Climate Models On The NERC Cluster Grid Using G-Rex Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre Environmental.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Mr C Johnston ICT Teacher
VirtualBox What you need to know to build a Virtual Machine.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Presents computation Grid Harness the power of Windows, Unix, Linux and Mac OS/X machines.
Building simple, easy-to-use grids with Styx Grid Services and SSH Jon Blower, Keith Haines Reading e-Science Centre Environmental Systems Science Centre.
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
1 Vulnerability Assessment of Grid Software James A. Kupsch Computer Sciences Department University of Wisconsin Condor Week 2007 May 2, 2007.
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
Privilege separation in Condor Bruce Beckles University of Cambridge Computing Service.
Styx Grid Services: Lightweight, easy-to-use middleware for e-Science Jon Blower Keith Haines Reading e-Science Centre, ESSC, University of Reading, RG6.
Composing workflows in the environmental sciences using Web Services and Inferno Jon Blower, Adit Santokhee, Keith Haines Reading e-Science Centre Roger.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Virtual Workspaces Kate Keahey Argonne National Laboratory.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
IT320 OPERATING SYSTEM CONCEPTS Unit 3: Welcome to Linux September 2012 Kaplan University 1.
Experiment Management System CSE 423 Aaron Kloc Jordan Harstad Robert Sorensen Robert Trevino Nicolas Tjioe Status Report Presentation Industry Mentor:
GIS in the cloud: implementing a Web Map Service on Google App Engine Jon Blower Reading e-Science Centre University of Reading United Kingdom
Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.
1 THE COMPUTER. 2 Input Processing Output Storage 4 basic functions.
© Paradigm Publishing, Inc. 4-1 Chapter 4 System Software Chapter 4 System Software.
Internet2 AdvCollab Apps 1 Access Grid Vision To create virtual spaces where distributed people can work together. Challenges:
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
John Kewley e-Science Centre All Hands Meeting st September, Nottingham GROWL: A Lightweight Grid Services Toolkit and Applications John Kewley.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Grid Remote Execution of Large Climate Models (NERC Cluster Grid) Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
2: Operating Systems Networking for Home & Small Business.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Reading e-Science Centre
Chapter 1: Introduction
Dag Toppe Larsen UiB/CERN CERN,
Dag Toppe Larsen UiB/CERN CERN,
Popular Operating System Chapter 8
Containers in HPC By Raja.
Popular Operating Systems
DHCP, DNS, Client Connection, Assignment 1 1.3
Software - Operating Systems
Presentation transcript:

The Inferno Grid (and the Reading Campus Grid) Jon Blower Reading e-Science Centre Many others, School of Systems Engineering, IT Services

Introduction  Reading are in early stages of Campus Grid construction  Currently consists of two flocked Condor pools –More of which later  Also experimenting with the Inferno Grid –Condor-like system for pooling ordinary desktops –Although (like Condor) it could be used for more than this  The Inferno Grid is commercial software but free to UK e- Science community  Secure, low maintenance, firewall-friendly  Perhaps not (yet) as feature-rich as Condor

The Inferno operating system  The Inferno Grid is based upon the Inferno OS  Inferno OS is built from the ground up for distributed computing –Mature technology, good pedigree (Bell Labs, Pike & Ritchie)  Extremely lightweight (~ 1MB RAM) so can run as emulated application identically on multiple platforms (Linux, Windows, etc)  Hence it is a powerful base for Grid middleware  Everything in Inferno is represented as a file or set of files –cf. /dev/mouse in Unix  So to create a distributed system, just have to know how to share “files” – uses a protocol called Styx for this  Inferno OS is released under Liberal Licence (free and open source) for non-commercial use  Can run applications in the host OS (Linux, Windows etc)  Secure: certificate-based authentication, plus strong encryption built-in at OS level

The Inferno Grid  Built as an application in the Inferno OS –Hence uses OS’s built-in security and ease of distribution –Can run under all platforms that Inferno OS runs on  Essentially high-throughput computing cf. Condor  Created by Vita Nuova (  Free academic licence, but also used “for real”: –Evotec OAI (speeds up drug discovery) 90% utilisation of machines –“Major government department” modelling disease spread in mammals –Other major company (can’t say more!)  University installations at Reading and York  (AHM2004 – created Inferno Grid from scratch easily)

Host OS (Windows, Linux, MacOSX, Solaris, FreeBSD) Inferno OS (Virtual OS) Inferno Grid software (a Limbo program) (Can also run Inferno native on bare hardware) Could write all applications in Limbo (Inferno’s own language) and run on all platforms, guaranteed!

Inferno Grid system overview  Matches jobs submitted to abilities of “worker nodes” –The whole show is run by a scheduler machine  Jobs are ordinary Windows/Linux/Mac executables  Process is different from that of Condor –Unless Condor has changed/is changing…  In Condor, workers run daemon processes that wait for jobs to be sent to them –i.e. “scheduler-push” –Requires incoming ports to be open on each worker node  In the Inferno Grid, workers “dial into” the scheduler and ask “have you got any work for me?” –i.e. “worker-pull” or “labour exchange” –No incoming ports need to be open –Doesn’t poll – uses persistent connections –Studies have shown this to be more efficient (not sure which ones… ;-)

Architecture Scheduler – listens for job submissions and workers “reporting for duty” Job submission is via supplied GUI. Could create other apps (command-line, Web interface) Firewall: single Incoming port open Workers can be in different admin. domains Worker firewalls: No incoming ports open. Single outgoing port open (to fixed, known server) Workers can connect and disconnect at will.

Control Create, start, stop, delete and change job priority with immediate effect Information Job description and parameters Job Administration Status Detailed progress report for currently selected job Display All scheduled jobs with current status and priority

Node Administration Connected Not Connected Dead Blacklisted Control Include, exclude and delete nodes Job Group Assign nodes to individual job groups Information See node operating system & list of installed packages Power Bar See how much of the grid is being utilised % available % in use At a Glance Viewing Quickly see the current state of the grid with colour coded job ids Job id Task id Display All known nodes and current status

Pros and Cons  Pros: –Easy to install and maintain –Good security: See next slide –“Industry quality”  Cons: –Small user base and not-great documentation Hence learning curve –Doesn’t have all Condor’s features E.g. migration, MPI universe, reducing impact on primary users –No Globus integration yet But probably not hard to do – JobManager for Inferno? –Security mechanism is Inferno’s own But might see other mechanisms in Inferno in future –Question over scalability (100s of machines, fine: 1000s… not sure) Inferno Grids don’t “flock” yet

Security and impact on primary users  Only one incoming port on the scheduler needs to be open through the firewall  Nothing runs as root  All connections in the Inferno Grid can be authenticated and encrypted –Public-key certificates for auth, variety of encryption algs –Cert. usage is transparent, user is not aware it’s there –Similar to SSL in principle  Can setup worker nodes to only run certain jobs –So can prevent arbitrary code from being run  Doesn’t have all of Condor’s options for pausing jobs on keyboard press, etc –Runs jobs under low priority  But could set up so that workers don’t ask for work if they are loaded –But what happens to a job that has already started?

Other points  Slow-running tasks are reallocated until whole job is finished.  Could fairly easily write different front-ends for Inferno Grid for job submission and monitoring –Don’t have to use supplied GUI –ReSC’s JStyx library could be used to write Java GUI or JSP  In fact, code base is small enough to make significant customisation realistic –Customise worker node behaviour  “Flocking” probably not hard to do –Schedulers could exchange jobs –Or workers could know about more than one scheduler  Inferno OS can be used to very easily create a distributed data store –This data store can link directly with the Inferno Grid  Caveat: We haven’t really used this in anger yet!

Building an Inferno Grid in this room…  These are conservative estimates (I think)  Install scheduler (Linux machine) – 10 minutes  Install worker node software (Windows) – 2 minutes each  Run toy job and monitor it within 15 minutes of start  Set up Inferno Certificate Authority – 1 minute  Provide Inferno certificates to all worker nodes – 2 minutes per node  Provide Inferno cert to users + admins – 2 minutes each  Fully-secured (small) Inferno Grid up and ready in an hour or two.  If you know what you’re doing!! (remember that docs aren’t so good…  )

Reading Campus Grid so far  Collaboration between School of Systems Engineering, IT Services and e-Science Centre  Haven’t had as much time as we’d like to investigate Inferno Grid  But have an embryonic “Campus Grid” of two flocked Condor pools –Although both at Reading, come under different admin domains –Getting them to share data space was challenging, and firewalls caused initial problem –(Incidentally, the Inferno Grid had no problems at all crossing the domains)  Small number of users running MPI and batch jobs –Animal and Microbial Sciences, Environmental Systems Science Centre  Ran demo project for University  An “heroic effort” at the moment, but we are trying to secure funding

Novel features of RCG  Problem: Most machines are Windows but most people want *nix environment for scientific programs  “Diskless Condor”: –Windows machines reboot into Linux overnight –Loads Linux from a network-shared disk image –Uses networked resources only (zero impact on hard drive) –In morning, reboots back into Windows  Looking into CoLinux ( –Free VM technology for running Linux under Windows –Early days, but initial look is promising.

Future work  Try to get funding!  Intention is to make CG key part of campus infrastructure –IT Services are supportive  Installation of SRB for distributed data store  Add clusters/HPC resources to Campus Grid  Working towards NGS compatibility

Conclusions  Inferno Grid has lots of good points, especially in terms of security, ease of installation and maintenance –Should be attractive to IT Services…  We haven’t used it “in anger” yet but it is used successfully by others (in academia, industry and govt) –Caveat: these people tend to run a single app (or small number of apps) rather than general code  Doesn’t have all of Condor’s features  We don’t want to fragment effort or become marginalised –Would be great to see good features of Inferno appear in Condor, esp. “worker pull” mechanism