NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing.

Slides:



Advertisements
Similar presentations
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Advertisements

Dinker Batra CLUSTERING Categories of Clusters. Dinker Batra Introduction A computer cluster is a group of linked computers, working together closely.
Information Technology Center Introduction to High Performance Computing at KFUPM.
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
Windows HPC Server 2008 Presented by Frank Chism Windows and Condor: Co-Existence and Interoperation.
VL-e PoC Introduction Maurice Bouwhuis VL-e work shop, April 7 th, 2006.
NPACI Panel on Clusters David E. Culler Computer Science Division University of California, Berkeley
Grid Programming Environment (GPE) Grid Summer School, July 28, 2004 Ralf Ratering Intel - Parallel and Distributed Solutions Division (PDSD)
David A. Lifka Chief Technical Officer Cornell Theory Center Cycle Scavenging with Windows The Virtues of Virtual Server David
Cross Cluster Migration Remote access support Adianto Wibisono supervised by : Dr. Dick van Albada Kamil Iskra, M. Sc.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
VMware vCenter Server Module 4.
Overview of the ODP Data Provider Sergey Sukhonosov National Oceanographic Data Centre, Russia Expert training on the Ocean Data Portal technology, Buenos.
1 SAMBA. 2 Module - SAMBA ♦ Overview The presence of diverse machines in the network environment is natural. So their interoperability is critical. This.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.

Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
DISTRIBUTED COMPUTING
SUSE Linux Enterprise Desktop Administration Chapter 12 Administer Printing.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Grid Computing I CONDOR.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
Guide to Linux Installation and Administration, 2e1 Chapter 10 Managing System Resources.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Grid MP at ISIS Tom Griffin, ISIS Facility. Introduction About ISIS Why Grid MP? About Grid MP Examples The future.
How to use Remote Desktop and Remote Support. What is remote desktop? Remotely control your computer from another office, from home, or while traveling.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
June 6, 2007TeraGrid '071 Clustering the Reliable File Transfer Service Jim Basney and Patrick Duda NCSA, University of Illinois This material is based.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Hands-On Microsoft Windows Server Implementing Microsoft Internet Information Services Microsoft Internet Information Services (IIS) –Software included.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
Building the e-Minerals Minigrid Rik Tyer, Lisa Blanshard, Kerstin Kleese (Data Management Group) Rob Allan, Andrew Richards (Grid Technology Group)
SMBL and Blast Joe Rinkovsky Unix Systems Support Group Indiana University.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services.
CMPF124 Basic Skills For Knowledge Workers Chapter 1 – Part 1 Introduction To Windows Operating Systems.
WSV207. Cluster Public Cloud Servers On-Premises Servers Desktop Workstations Application Logic.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Enabling the use of e-Infrastructures with.
The Hungarian ClusterGRID Project Péter Stefán research associate NIIF/HUNGARNET
CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.
APARTMENT MAINTENANCE SYSTEM M.Tech( Ph.D) HOD of C.S.E & I.T Dept.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Creating Grid Resources for Undergraduate Coursework John N. Huffman Brown University Richard Repasky Indiana University Joseph Rinkovsky Indiana University.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
CernVM and Volunteer Computing Ivan D Reid Brunel University London Laurence Field CERN.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
SUBMITTED BY: NAIMISHYA ATRI(7TH SEM) IT BRANCH
PHP / MySQL Introduction
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Advanced Computing Facility Introduction
Basic Grid Projects – Condor (Part I)
Introduction to Apache
Internet Protocols IP: Internet Protocol
Cluster Computers.
Introduction to research computing using Condor
Presentation transcript:

NSF Site Visit HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Introduction… Windows desktop systems at IUB student labs –2300 systems, 3 year replacement cycle –Pentium IV (>=1.6 GHz), 256/512/1024 MB memory, 10/100 Mbps/GigE, Windows XP –More than 1.5 TF NSF Site Visit

Possibly Utilize Idle Cycles? Red: total owner Blue: total idle Green: total Condor NSF Site Visit

Problem Description Once again... Windows desktop systems at IUB student labs: – As a scientific resource – Harvest idle cycles NSF Site Visit

Constraints Systems dedicated to students using desktop office applications — not parallel scientific computing – making their availability unpredictable and sporadic Microsoft Windows environment Daily software rebuild (updates) NSF Site Visit

What could these systems be used for? Many small computations and a few small messages –Foreman-worker –Parameter studies –Monte Carlo Goal: High Throughput Computing (not HPC) –Parallel runs of the aforementioned small computations to make better use of resource –Parallel libraries – MPI, PVM, etc. – have constraints if availability of resources is ephemeral i.e. not predictable NSF Site Visit

Solution Simple Message Brokering Library (SMBL) –Limited replacement for MPI Both server and client library based on TCP socket abstraction –Porting from MPI is fairly straight forward Process and Port Manager (PPM) Plus … –Condor for job management, file transfer, no checkpointing or parallelism –Web portal for job submission NSF Site Visit

The Big Picture We’ll discuss each part in more detail next… The shaded box indicates components hosted on multiple desktop computers NSF Site Visit

SMBL (Server) SMBL server maintains a dynamic pool of client process connections Worker job manager hides details of ephemeral workers at the application level SMBL RankCondor Assigned Node 0 (Foreman) Wrubel Computing Center, sacramento 1Chemistry Student Lab, computer_14 2CS Student Lab, computer_8 3Library, computer_6 SMBL Server Process Table for 4 CPU parallel session NSF Site Visit

SMBL (Server) SMBL server maintains a dynamic pool of client process connections Worker job manager hides details of ephemeral workers at the application level SMBL RankCondor Assigned Node 0 (Foreman) Wrubel Computing Center, sacramento 1Chemistry Student Lab, computer_14 2Physics Student Lab, computer_11 3Library, computer_6 SMBL Server Process Table for 4 CPU parallel session NSF Site Visit

SMBL (Client) Client library implements selected MPI-like calls –MPI_Send ()  SMBL_Send () –MPI_Recv ()  SMBL_Recv () In charge of message delivery for each parallel process NSF Site Visit

Process and Port Manager (PPM) Starts the SMBL server and application processes on demand Assigns port/host to each parallel session Directs workers to their servers NSF Site Visit

PPM with two SMBL servers (two parallel sessions) SMBL RankCondor Assigned Node 0 (Foreman)Wrubel Computing Center, sacramento 1Chemistry Student Lab, computer_14 2CS Student Lab, computer_8 3Wells Library, computer_6 0 (Foreman)Wrubel Computing Center, sacramento 1Wells Library, computer_27 2Biology Student Lab, computer_4 3CS Student Lab, computer_2 PPM (cont’d...) NSF Site Visit Parallel Session 1 Parallel Session 2

Once again … the big picture The shaded box indicates components hosted on multiple desktop computers NSF Site Visit

Recent Development Hydra cluster Teragrid enabled! (Nov 2005) –Allow TG users to use resource –Virtual Host based solution – two different URLs for IU and Teragrid users –Teragrid users authenticate against PSC’s Kerberos server NSF Site Visit

PPM, SMBL server, Condor and web portal running on Linux server –Dual Intel Xeon 3.0 GHz, 4 GB memory, GigE Second Linux server running Samba to serve BLAST database System Layout NSF Site Visit

Portal Creates and submits Condor files, handles data files Apache/PHP based Kerberos authentication URLs: – (IU users) – (Teragrid users) NSF Site Visit

Utilization of Idle Cycles Red: total owner Blue: total idle Green: total Condor NSF Site Visit

Summary Large parallel computing facility created at a low cost –SMBL parallel message passing library that can deal with ephemeral resources –PPM port broker that can handle multiple parallel sessions SMBL Homepage – (Open Source) NSF Site Visit

Links and References Hydra Portal – (IU users) – (Teragrid users) SMBL home page: Condor home page: IU Teragrid home page – NSF Site Visit

Links and References (cont’d..) Parallel FastDNAml: Blast: Meme: NSF Site Visit