PVM and MPI What Else is Needed For Cluster Computing? Al Geist Oak Ridge National Laboratory www.csm.ornl.gov/~geist DAPSYS/EuroPVM-MPI Balatonfured,

Slides:



Advertisements
Similar presentations
Parallel Virtual Machine Rama Vykunta. Introduction n PVM provides a unified frame work for developing parallel programs with the existing infrastructure.
Advertisements

Threads, SMP, and Microkernels
Windows Deployment Services WDS for Large Scale Enterprises and Small IT Shops Presented By: Ryan Drown Systems Administrator for Krannert.
Discovering Computers Fundamentals, Third Edition CGS 1000 Introduction to Computers and Technology Fall 2006.
Xen , Linux Vserver , Planet Lab
Harness and H2O Alternative approaches to metacomputing Distributed Computing Laboratory Emory University, Atlanta, USA
Distributed Processing, Client/Server, and Clusters
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
Managing Agent Platforms with the Simple Network Management Protocol Brian Remick Thesis Defense June 26, 2015.
Hands-On Microsoft Windows Server 2003 Administration Chapter 5 Administering File Resources.
Mgt 240 Lecture Website Construction: Software and Language Alternatives March 29, 2005.
Understanding and Managing WebSphere V5
Client/Server Architectures
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
INTRODUCTION TO WEB DATABASE PROGRAMMING
Operating Systems Operating System
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
STRATEGIES INVOLVED IN REMOTE COMPUTATION
Module 13: Configuring Availability of Network Resources and Content.
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
PVM and MPI What is more preferable? Comparative analysis of PVM and MPI for the development of physical applications on parallel clusters Ekaterina Elts.
PVM. PVM - What Is It? F Stands for: Parallel Virtual Machine F A software tool used to create and execute concurrent or parallel applications. F Operates.
1 Guide to Novell NetWare 6.0 Network Administration Chapter 13.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Chapter Fourteen Windows XP Professional Fault Tolerance.
University of Management & Technology 1 Operating Systems & Utility Programs.
Chapter 2: Operating-System Structures. 2.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 14, 2005 Operating System.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing,
Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Ch 1. A Python Q&A Session Spring Why do people use Python? Software quality Developer productivity Program portability Support libraries Component.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Migrating Desktop Marcin Płóciennik Marcin Płóciennik Kick-off Meeting, Santander, Graphical.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
OS and System Software for Ultrascale Architectures – Panel Jeffrey Vetter Oak Ridge National Laboratory Presented to SOS8 13 April 2004 ack.
Sample School Website. What is wrong with the existing School Webspace Site? Can only host static pages – no dynamic content possible. Can not be edited.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
Migrating Desktop Bartek Palak Bartek Palak Poznan Supercomputing and Networking Center The Graphical Framework.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
CSC190 Introduction to Computing Operating Systems and Utility Programs.
Linux Operations and Administration
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 1.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
System Architecture CS 560. Project Design The requirements describe the function of a system as seen by the client. The software team must design a system.
PVM and MPI.
REDCap General Overview
Chapter 2: The Linux System Part 1
Software models - Software Architecture Design Patterns
Presentation transcript:

PVM and MPI What Else is Needed For Cluster Computing? Al Geist Oak Ridge National Laboratory DAPSYS/EuroPVM-MPI Balatonfured, Hungary September 11, 2000

EuroPVM-MPI EuroPVM-MPI Dedicated to the hottest developments of PVM and MPI PVM and MPI are the most used tools for parallel programming The hottest trend driving PVM and MPI today is PC clusters running Linux and/or Windows This talk will look at gaps in what PVM and MPI provide for Cluster Computing. What role the GRID may play and What is happening to fill the gaps…

New release this summer – PVM includes: Optimized msgbox routines –More scalable, more robust New Beowulf-linux port –Allows clusters to be behind firewalls yet work together Smart virtual machine startup –Automatically determines the reason for “Can’t start pvmd” Works with Windows2000 –Installshield version available –Improved Win32 communication performance PVM Latest News: New Third party PVM Software: PythonPVM 0.9, JavaPVM, interface PVM port using SCI interface

PVM-1 MPI-1MPI-2I-MPI PVM-2PVM-3PVM-3.4 Harness Wide-area GRID experiments Ten Years of Cluster Computing Building a Cluster Computing Environment for 21 st Century Networks of Workstations PC Clusters

Jun-93 Nov-94 Jun-96 Nov-97 Jun-99 Nov-00 Jun-02 Nov-03 Jun-05 Nov-06 Jun-08 Nov-09 Performance [GFlop/s] N=1 N=500 N=10 1 TFlop/s 1 PFlop/s 2005 Entry at 1TFlop/s 2010 Peak at 1PFlop/s ASCI IBM Compaq TOP500 Trends – Next 10 years Even the largest machines are clusters

PC Clusters are cost effective from a hardware perspective Many Universities and companies can afford 16 to 100 nodes. System administration is an overlooked cost: people to maintain cluster software written for each cluster higher failure rates for COTS Presently there is lack of tools for managing large clusters Trend in Affordable PC clusters

C3 Command line cluster toolset Cluster Computing Tools cl_pushimage() – push system image across cluster cl_shutdown() – shutdown specified nodes cl_push() – push files/directories across cluster cl_rm() – remove files from multiple nodes cl_get() – gather cluster files to one location cl_ps() – returns results of multi-node ps cl_kill() – kill application across entire cluster cl_exec() – execution of any command across specified nodes Functions C3 is a command line based toolset for system administration and user level operations on a single cluster. C3 functions may also be called in a program. C3 is multithreaded and each function executes in parallel. only executable by sysadmin Software available

VTK ported to Linux Clusters –Visualization toolkit making use of C3 cluster package AVS Express ported to PC Cluster –Expensive but standard package asked for by apps. –Requires AVS Site license to eliminate the cost of individual node licenese Cumulvs plug-in for AVS Express –Combines the interactive visualization and computational steering of Cumulvs with the visualization tools of AVS. Visualization using Clusters Lowering the Cost of High Performance Graphics

M3C tool suite Cluster Computing Tools Reserve nodes within or across clusters Submit Job to queue system Monitor nodes of cluster – also adding a PAPI interface Install software on selected nodes Reboot, shutdown, add user, etc. to selected cluster nodes. Display properties of nodes Growing list of plug-in modules Suite of user interface tools for system administration and simultaneous monitoring of multiple PC clusters. Written in Java – web based remote access.

OSCAR Bring uniformity to cluster creation and use Make clusters more broadly acceptable Foster commercial versions of the cluster software Goals OSCAR is a collection of the best known software for building, programming, and using clusters. The collection effort is lead by a national consortium which includes: IBM, SGI, Intel, ORNL, NCSA, MCS Software. Other vendors invited. National Consortium for Cluster software For more details see Stephen Scott (talk Monday 12:00 DAPSYS track)

ORNL M C Tool Architecture designed to work both within and across organizations 3 URL CGI C3 Scripts/monitors URL CGI Custom Scripts URL CGI Third-party Scripts GUI proxy URL front-end back-end M C 3 Java applet Based GUI cluster 1cluster 2 cluster 1 Interface thru XML files M3C proxy allows one sysadmin to monitor and update multiple clusters M3C GUI allows user to submit and monitor jobs ORNL UTK SDSC

Information services Uniform naming, locating, and allocating distirbuted resources Data management and access Single log-on security GRID “Ubiquitous” Computing GRID Forum is helping define higher level services MPI and PVM are often seen as lower level capabilities that GRID frameworks support. GlobusNetSolveCondor LegionNeosSinRG

Cumulvs – collaborative computational steering Recent Highlights Release of new version Development of CAVE viewer Works w/ PNL global arrays Made CCA compliant Cumulvs was the initial reason we started Harness. Collective port A DOE effort to provide a standard for interoperability of high performance components developed by many different groups in different languages or frameworks. Common Component Architecture

HARNESS Exploring New Capabilities in Heterogeneous Distributed Computing Parallel Plug-in environment Extend the concept of a plug-in to the parallel computing world. Dynamic with no restrictions on functions. Distributed peer-to-peer control No single point of failure unlike typical client/server models. Multiple distributed virtual machines merge/split Provide a means for short-term sharing of resources and collaboration between teams. Building on our experience and success with PVM create a fundamentally new heterogeneous virtual machine based on three research concepts: Goal

Host D Host C Host B Host A Virtual Machine Operation within VM uses Distributed Control process control user features HARNESS daemon Customization and extension by dynamically adding plug-ins Component based daemon Merge/split with other VMs Another VM HARNESS Virtual Machine Scalable Distributed control and Component based Daemon

HARNESS Latest News Provide a practical environment and illustrate extensibility Harness Core (beta release ready)(see talk Monday 16:30 Track 1) Task library and Harness Daemon software Provides API to load, unload plug-ins and distributed control. PVM Plug-in (stalled for summer now back on track) Provides PVM API veneer to support exiting PVM applications. Fault tolerant MPI plug-in (see talk Monday 16:50 Track 1) Provides MPI API for 30 most used functions. Semantics adjusted to allow recovery from corrupted communicator. VIA communication plug-in (looking at multi-interface transfer) To illustrate how different low level communication plug-ins can be used within Harness. And to provide high performance

Parallel Plug-in Research For Heterogeneous Distributed Virtual Machine Tough Research problems include: Heterogeneity (has delayed ‘C’ H-core development.) Synchronization - Dynamic installation Interoperation between same plug-in on different tasks between task plug-in and daemon plug-in between daemon plug-ins Partial success One research goal is to understand and implement a dynamic parallel plug-in environment. provides a method for many users to extend Harness in much the same way that third party serial plug-ins extend Netscape, Photoshop, and Linux.

Fault Tolerant MPI Motivation As application and machine sizes grow the MTBF is less than the application run time. MPI standard is based on a static model so any decrease in tasks leads to corrupted communicator (MPI_COMM_WORLD). Develop MPI plugin that takes advantage of Harness robustness to allow a range of recovery alternatives to an MPI application. Not just another MPI implementation. FT-MPI follows the syntax of MPI standard Communication performance on par with MPICH Presently uses PVM3.4.3 fault recovery until Harness is ready)

Fault Tolerant MPI Recovery requires MPI semantic changes Key step to MPI recovery is creating a communicator that app can use to continue. Accomplished by modifying the semantics of two MPI functions. MPI_COMM_CREATE ( comm, group, newcomm) MPI_COMM_SPLIT ( comm, color, key, newcomm) Creates a new communicator that contains all surviving processes Allows MPI_COMM_WORLD to be specified as both input and output communicator.

No single point (or set of points) of failure for Harness. It survives as long as one member still lives. All members know the state of the virtual machine, and their knowledge is kept consistent w.r.t. the order of changes of state. (Important parallel programming requirement!) No member is more important than any other (at any instant) i.e. here isn’t a pass-around “control token” Symmetric Peer-to-Peer Distributed Control Characteristics

Harness kernels on each host have arbitrary priority assigned to them (new kernels are always given the lowest priority) Virtual machine A task on this host requests a new host be added VM state held by each kernel 2. Each adds request to a list of pending changes 1. Send host/T#/data to neighbor in ring 3. Distributed Control Harness Two Phase Arbitration

Supports multiple simultaneous updates Harness Distributed Control Control is Scalable, Asynchronous, and Parallel add host Fast host delete or recovery from fault Parallel recovery from multiple host failures Supports fast host adding Scalable Design 1<=S<=P

Follow the links from my Web site For more Information Also - Copy of these slides