HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.

Slides:



Advertisements
Similar presentations
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Advertisements

SALSA HPC Group School of Informatics and Computing Indiana University.
NSF Site Visit HYDRA Using Windows Desktop Systems in Distributed Parallel Computing.
Dinker Batra CLUSTERING Categories of Clusters. Dinker Batra Introduction A computer cluster is a group of linked computers, working together closely.
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
Windows HPC Server 2008 Presented by Frank Chism Windows and Condor: Co-Existence and Interoperation.
VL-e PoC Introduction Maurice Bouwhuis VL-e work shop, April 7 th, 2006.
A CHAT CLIENT-SERVER MODULE IN JAVA BY MAHTAB M HUSSAIN MAYANK MOHAN ISE 582 FALL 2003 PROJECT.
Introduction to windows operating system i
David A. Lifka Chief Technical Officer Cornell Theory Center Cycle Scavenging with Windows The Virtues of Virtual Server David
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Keeping Everyone Happy: Serving Windows Users in a Macintosh Environment Laurie Sutch, University of Michigan.
IBIS System: Requirements and Components Lois M. Haggard Office of Public Health Assessment.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
Web Servers Web server software is a product that works with the operating system The server computer can run more than one software product such as .
Chapter 4: What is an operating system?. What is an operating system? A program or collection of programs that coordinate computer usage among users and.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
1 Web Server Administration Chapter 1 The Basics of Server and Web Server Administration.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Grid Computing I CONDOR.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
Example: Sorting on Distributed Computing Environment Apr 20,
Remote Access Using Citrix Presentation Server December 6, 2006 Matthew Granger IT665.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Hands-On Microsoft Windows Server Implementing Microsoft Internet Information Services Microsoft Internet Information Services (IIS) –Software included.
OS Services And Networking Support Juan Wang Qi Pan Department of Computer Science Southeastern University August 1999.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
Processes Introduction to Operating Systems: Module 3.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
SMBL and Blast Joe Rinkovsky Unix Systems Support Group Indiana University.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services.
CMPF124 Basic Skills For Knowledge Workers Chapter 1 – Part 1 Introduction To Windows Operating Systems.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
1 Network Communications A Brief Introduction. 2 Network Communications.
Creating Grid Resources for Undergraduate Coursework John N. Huffman Brown University Richard Repasky Indiana University Joseph Rinkovsky Indiana University.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
CernVM and Volunteer Computing Ivan D Reid Brunel University London Laurence Field CERN.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Chapter 1: Introduction
2. OPERATING SYSTEM 2.1 Operating System Function
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
PHP / MySQL Introduction
Chapter 1: Introduction
University of Technology
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 3: Windows7 Part 1.
Chapter 1: Introduction
Internet Protocols IP: Internet Protocol
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Operating Systems Structure
Chapter 1: Introduction
Presentation transcript:

HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve Simms, Adam Sweeny, Peng Wang University Information Technology Services Indiana University

Problem Description  Turn Windows desktop systems at IUB student labs into a scientific resource.  2300 systems, 3 year replacement cycle  1.5 Teraflops  Fast ethernet or better  Harvest idle cycles.  Turn Windows desktop systems at IUB student labs into a scientific resource.  2300 systems, 3 year replacement cycle  1.5 Teraflops  Fast ethernet or better  Harvest idle cycles.

Constraints  Systems dedicated to students using desktop office applications — not parallel scientific computing  Microsoft Windows environment  Daily software rebuild  Systems dedicated to students using desktop office applications — not parallel scientific computing  Microsoft Windows environment  Daily software rebuild

What are they good for?  Many small computations and a few small messages  Master-worker  Parameter studies  Monte Carlo  Many small computations and a few small messages  Master-worker  Parameter studies  Monte Carlo

Assembling small ephemeral resources  Coordinate through message passing — very simple to implement top-level solution  MPI not designed to handle ephemeral resources  Not simple to implement solution on the ground  Coordinate through message passing — very simple to implement top-level solution  MPI not designed to handle ephemeral resources  Not simple to implement solution on the ground

Solution  Simple Message Brokering Library (SMBL)  Limited replacement for MPI  Process and Port Manager (PPM)  Message broker  Condor NT  Job management  Web portal  Job submission  Simple Message Brokering Library (SMBL)  Limited replacement for MPI  Process and Port Manager (PPM)  Message broker  Condor NT  Job management  Web portal  Job submission

The Big Picture We’ll discuss each part in more detail next… The shaded box indicates components hosted on multiple desktop computers

Portal  Creates and submits Condor files, handles data files  Apache based  PHP web interface  Creates and submits Condor files, handles data files  Apache based  PHP web interface 

Condor  Condor for Windows NT/2000/XP  “Vanilla universe” -- no support for check- pointing or parallelism  Provides:  Security  Match-making  Fair sharing  File transfer  Job submission, suspension, preemption, restart  Condor for Windows NT/2000/XP  “Vanilla universe” -- no support for check- pointing or parallelism  Provides:  Security  Match-making  Fair sharing  File transfer  Job submission, suspension, preemption, restart

SMBL  IO multiplexing server in charge of message delivery for each parallel session (serialize communication)  Client library implements selected MPI-like calls  Both server and client library based on TCP socket abstraction  IO multiplexing server in charge of message delivery for each parallel session (serialize communication)  Client library implements selected MPI-like calls  Both server and client library based on TCP socket abstraction

SMBL (Contd … ) Managing Temporary Workers  SMBL server maintains a dynamic pool of client process connections  Worker job manager hides details of ephemeral workers at the application level  Porting from MPI is fairly straight forward  SMBL server maintains a dynamic pool of client process connections  Worker job manager hides details of ephemeral workers at the application level  Porting from MPI is fairly straight forward

Process and Port Manager (PPM)  Assigns port/host to each parallel session  Starts the SMBL server and application processes on demand  Direct workers to their servers  Assigns port/host to each parallel session  Starts the SMBL server and application processes on demand  Direct workers to their servers

Once again … the big picture The shaded box indicates components hosted on multiple desktop computers

System Layout  PPM, SMBL server and web portal running on Linux server -- Dual Intel Xeon 1.7 GHz, 2 GB memory and GigE inter-connect  STC Windows worker machines -- Combination of different OS (Windows 2000/XP) and network inter-connect speeds (GigE/100 Mbps/10 Mbps)  PPM, SMBL server and web portal running on Linux server -- Dual Intel Xeon 1.7 GHz, 2 GB memory and GigE inter-connect  STC Windows worker machines -- Combination of different OS (Windows 2000/XP) and network inter-connect speeds (GigE/100 Mbps/10 Mbps)

Applications  FastDNAml-p  Parallel application, master-worker model, small granularity of work  Provides generic interface for parallel communication library (MPI, PVM, SMBL)  Reliability built in: Foreman process copes with delayed or lost workers  Blast  Meme  FastDNAml-p  Parallel application, master-worker model, small granularity of work  Provides generic interface for parallel communication library (MPI, PVM, SMBL)  Reliability built in: Foreman process copes with delayed or lost workers  Blast  Meme

Portal

Applications – FastDNAml

FastDNAml-p Performance

Other Applications – Parallel MEME

Other Applications – BLAST

Experience Red: total owner Blue: total idle Green: total Condor

Work in Progress/Future Direction  Teragrid’ize Hydra cluster – allow TG users to access resource  New Portal – JSR 168 compliant, certificate based authentication capability  Range of applications – Virtual machines, so forth  Teragrid’ize Hydra cluster – allow TG users to access resource  New Portal – JSR 168 compliant, certificate based authentication capability  Range of applications – Virtual machines, so forth

Summary  Large parallel computing facility created at very low cost  SMBL parallel message passing library that can deal with ephemeral resources  PPM port broker that can handle multiple parallel sessions  SMBL (Open Source) Home –  Large parallel computing facility created at very low cost  SMBL parallel message passing library that can deal with ephemeral resources  PPM port broker that can handle multiple parallel sessions  SMBL (Open Source) Home –

Links and References  Hydra Portal –  SMBL home page –  Condor home page --  IU Teragrid home page –  Parallel FastDNAml –  Blast --  Meme --  Hydra Portal –  SMBL home page –  Condor home page --  IU Teragrid home page –  Parallel FastDNAml –  Blast --  Meme --