1 Cluster Operating System Support For Parallel Autonomic Computing Andrzej M. Goscinski, J. Silcock, M. Hobbs School of Information Technology Deakin.

Slides:



Advertisements
Similar presentations
1 From Grids to Service-Oriented Knowledge Utilities research challenges Thierry Priol.
Advertisements

17 May Multiple Sites. 17 May Multiple Sites This presentation assumes you are already familiar with Doors and all its standard commands It.
An Advanced Shell Theory Based Tire Model by D. Bozdog, W. W. Olson Department of Mechanical, Industrial and Manufacturing Engineering The 23 rd Annual.
Automating Test File Creation Using Excel, UltraEdit, and Batch files to build test data.
© Pearson Education Limited, Chapter 8 Normalization Transparencies.
Threads, SMP, and Microkernels
Tim Richards, Tim Green, Simo Varis EFIS Information Resource Discovery - Demonstrator (a.k.a EFIS-RD/ Metadata) 28 June 2005.
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
8 April Doors TM Set System Options. 8 April Set System Options Allows you to set certain standard Doors operating parameters and enable certain.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Chapter 3 Process Description and Control
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
Yingping Huang and Gregory Madey University of Notre Dame A W S utonomic eb-based imulation Presented by Tariq M. King Published by the IEEE Computer Society.
Computer Systems/Operating Systems - Class 8
1 Providing a Single System Image: The GENESIS Approach Andrzej M. Goscinski School of Information Technology Deakin University.
Object Based Operating Systems1 Learning Objectives Object Orientation and its benefits Controversy over object based operating systems Object based operating.
PRASHANTHI NARAYAN NETTEM.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Ch4: Distributed Systems Architectures. Typically, system with several interconnected computers that do not share clock or memory. Motivation: tie together.
Hands-On Microsoft Windows Server 2008
Distributed Shared Memory Systems and Programming
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
Distributed Systems: Concepts and Design Chapter 1 Pages
Copyright © Clifford Neuman and Dongho Kim - UNIVERSITY OF SOUTHERN CALIFORNIA - INFORMATION SCIENCES INSTITUTE Advanced Operating Systems Lecture.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
A Summary of the Distributed System Concepts and Architectures Gayathri V.R. Kunapuli
THE VISION OF AUTONOMIC COMPUTING. WHAT IS AUTONOMIC COMPUTING ? “ Autonomic Computing refers to computing infrastructure that adapts (automatically)
Shuman Guo CSc 8320 Advanced Operating Systems
Distributed System Services Fall 2008 Siva Josyula
System Components ● There are three main protected modules of the System  The Hardware Abstraction Layer ● A virtual machine to configure all devices.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 1: Characterization of Distributed & Mobile Systems Dr. Michael R.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Chapter 2 Operating System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Processes and threads.
Clouds , Grids and Clusters
2. OPERATING SYSTEM 2.1 Operating System Function
Distributed Shared Memory
Operating Systems (CS 340 D)
Chapter 1: Introduction
Operating System Structure
Introduction to Operating System (OS)
University of Technology
Real-time Software Design
Threads, SMP, and Microkernels
ASPECT ORIENTATED PROGRAMMING RESEARCH
Fault Tolerance Distributed Web-based Systems
Lecture 4- Threads, SMP, and Microkernels
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Introduction To Distributed Systems
Chapter 2 Operating System Overview
Operating System Overview
Distributed Systems and Concurrency: Distributed Systems
Presentation transcript:

1 Cluster Operating System Support For Parallel Autonomic Computing Andrzej M. Goscinski, J. Silcock, M. Hobbs School of Information Technology Deakin University Geelong, Vic 3217, Australia

June 2004COSET’20042 A Need for More than Execution Performance Performance is a critical assessment criterion Performance is a critical assessment criterion Security, reliability, and ease of programming are neglected Security, reliability, and ease of programming are neglected Furthermore Furthermore –Parallel computers are seen as being user unfriendly –Parallel processing is not used on daily basis –Ordinary users have to be involved in programming activities that are of the operating system nature –Ordinary engineers, managers, etc do not have, and should not have, specialized knowledge needed to program operating system oriented activities

June 2004COSET’20043 Aim of Our Research IBM has launched a comprehensive program IBM has launched a comprehensive program –“to re-examine an obsession with faster, smaller, and more powerful” –“to look at the evolution of computing from a more holistic perspective” IBM’s Autonomic Computing - one of the Grand Challenges IBM’s Autonomic Computing - one of the Grand Challenges Parallel processing on non-dedicated clusters could benefit from the Autonomic Computing vision Parallel processing on non-dedicated clusters could benefit from the Autonomic Computing vision Aim: to show a general design of services and initial implementation of a system that moves parallel processing on clusters to the computing mainstream using the Autonomic Computing vision Aim: to show a general design of services and initial implementation of a system that moves parallel processing on clusters to the computing mainstream using the Autonomic Computing vision

June 2004COSET’20044 IBM’s Autonomic Computing The name “autonomic” has not caught on everywhere, if only because it’s IBM’s The name “autonomic” has not caught on everywhere, if only because it’s IBM’s –Microsoft – “trustworthy” –Others prefer more generic – “self-managing” Many see “autonomic computing” as one of the basic parts of a revolutionary technology that Many see “autonomic computing” as one of the basic parts of a revolutionary technology that –Will start the new.com boom –Will move parallel computing on clusters to the Computing mainstream

June 2004COSET’20045 IBM’s Autonomic Computing Characteristics of autonomic computing systems Characteristics of autonomic computing systems –knows itself –configures and reconfigures itself under varying and unpredictable conditions –optimizes its working –performs something akin to healing –provides self-protection –knows its surrounding environment –exists in an open (non-hermetic) environment –anticipates the optimized resources needed while keeping its complexity hidden

June 2004COSET’20046 Related Work A number of projects related to Autonomous Computing are mentioned by the IBM website A number of projects related to Autonomous Computing are mentioned by the IBM website While many of the reported projects engage in some aspects of Autonomic Computing none engage in research to develop a system that has all eight of the characteristics required While many of the reported projects engage in some aspects of Autonomic Computing none engage in research to develop a system that has all eight of the characteristics required None of the projects addresses parallel processing, in particular parallel processing on non-dedicated clusters. None of the projects addresses parallel processing, in particular parallel processing on non-dedicated clusters.

June 2004COSET’20047 Design of Autonomic Elements (Services) Providing Autonomic Computing on Non-dedicated Clusters We have proposed and designed a set of autonomic elements that must be provided to develop an autonomic computing environment on a non-dedicated cluster We have proposed and designed a set of autonomic elements that must be provided to develop an autonomic computing environment on a non-dedicated cluster Three component levels Three component levels –Services –Computers –Non-dedicated cluster Note: we have not addressed Note: we have not addressed –Hardware aspects –Administration aspects

June 2004COSET’20048 Cluster Knows Itself A need for resource discovery A need for resource discovery This autonomic element runs on each computer This autonomic element runs on each computer Activities Activities –Acquires knowledge of static parameters of computers  processor type (e.g., speed)  memory size  available software –Acquires knowledge of dynamic parameters of clusters  computers’ load  available memory  communication pattern and volume

June 2004COSET’20049 Resource Discovery Service Design Resource Discovery Communication Pattern & Load Local Communication Load CPU Main Memory Remote Communication Load Computational Load & Parameters Computer i Resource Discovery CPU Main Memory Computation element 1 Computer j Computation element 2 Computation element 1

June 2004COSET’ Cluster Configures and Reconfigures Itself under Varying and Unpredictable Conditions In a non-dedicated cluster there are times when In a non-dedicated cluster there are times when –Some computers are lightly loaded or idle –Some computers cannot be used  owners removed them from a shared pool of resources  are heavy loaded To offer high availability, i.e., to configure and reconfigure itself, the system To offer high availability, i.e., to configure and reconfigure itself, the system –Forms parallel virtual clusters adaptively and dynamically –Forming is based on load and changing resources

June 2004COSET’ Availability Service Design RD Availability Services Virtual Parallel Cluster (t 0 ) Where times t 0 < t 1 < t 2 < t 3 Virtual Parallel Cluster (t 2 ) Virtual Parallel Cluster (t 3 ) RD Virtual Parallel Cluster (t 1 ) RD

June 2004COSET’ Cluster Should Optimize Its Working Application computation elements should be placed optimally Application computation elements should be placed optimally To improve performance there is a need for To improve performance there is a need for –Computation load –Available memory –Communication costs To optimize cluster’s working there is To optimize cluster’s working there is –Static allocation and load balancing –Ability to change performance indices that reflect user objectives –Computation element migration, creation and duplication –Setting of computation priorities of applications

June 2004COSET’ High Performance Service Design Virtual Parallel Cluster C1C1 P1P1 C2C2 P2P2 C3C3 PiPi Migration CnCn Availability Services { where: P 1 → C 1, P 2 → C 2, ……… {P i, P j } → C n } {where, which, when: P i : C n → C 3 } Global Scheduler Static Allocation Load Balancing PjPj

June 2004COSET’ Cluster Should Perform Something Akin To Healing Hardware and software faults can occur Hardware and software faults can occur Failures lead to the termination of computations Failures lead to the termination of computations To provide something akin to healing To provide something akin to healing –Faults are identified and reported –Checkpointing of parallel computation element of applications is provided –Recovery from failures is employed –Migrating applications from faulty computers to healthy computers is carried out automatically –Redundant/replicated services are provided

June 2004COSET’ Self-Healing Service Design Computation Element i Checkpointing (coordinated) Recovery Checkpoint for Computation Element i C1C1 Checkpoint for Compute Elem i Checkpoint for Compute Elem i Disk Compute Elem i after crash recovery C2C2 CjCj CkCk

June 2004COSET’ Clusters Should Provide Self- Protection Computation elements of parallel applications are distributed Computation elements of parallel applications are distributed Computation elements communicate using messages Computation elements communicate using messages They are the subject of passive and active attacks They are the subject of passive and active attacks To provide self-protection: To provide self-protection: –Virus detection and recovery must be offered –Resource protection should be a mandatory service –Encryption, as a countermeasure against passive attacks, should be used –Authentication, as a countermeasure against active attacks, should be used

June 2004COSET’ To Allow a System to Know Its Surrounding Environment and to Prevent a System From Existing in a Hermetic Environment There are applications that require There are applications that require –More computation power –Specialized software –Unique peripheral devices etc Many owners cannot afford such resources Many owners cannot afford such resources Some owners can offer their services and resources to appropriate users Some owners can offer their services and resources to appropriate users

June 2004COSET’ To Allow a System to Know Its Surrounding Environment and to Prevent a System From Existing in a Hermetic Environment To benefit from existing unique resources To benefit from existing unique resources –Resource discovery of other clusters is provided –Advertising services is in place –Systems are able to cooperate –Negotiation is in use –Brokerage of resources and services are used –Resources are shared in a distributed manner –“The move toward a grid” should be in place

June 2004COSET’ Grid-like Service Design Brokerage Services Computational Services Storage/Memory Services Printer Services Information Services Advertisement Exporting Services Withdrawal Services Import Requests Cluster 1 Brokerage Servicess Cluster nCluster 3 Cluster 2 Brokerage Servicess

June 2004COSET’ A Cluster Should Anticipate the Optimized Resources Needed While Keeping Its Complexity Hidden The scarcity of software to assist ordinary programmers limits the harnessing of the computing power of non-dedicated clusters The scarcity of software to assist ordinary programmers limits the harnessing of the computing power of non-dedicated clusters This implies This implies –A programming environment simple to use –Knowledge of resource distribution not needed –Message passing and shared memory programming supported transparently

June 2004COSET’ Easy Programming Service Design Communication Primitives System Services of an Operating System Kernel Services of an Operating System Programming Environment Shared Memory Message Passing or PVM / MPI DSM

June 2004COSET’ The Holos Services for Autonomic Computing Clusters Holos is built to demonstrate that it is possible to develop an autonomic non-dedicated cluster that Holos is built to demonstrate that it is possible to develop an autonomic non-dedicated cluster that –could be routinely employed by ordinary engineers, managers, etc –able to support next generation application software executing on clusters We followed the IBM’s vision recommendations regarding autonomic elements We followed the IBM’s vision recommendations regarding autonomic elements We decided to view autonomic elements as processes We decided to view autonomic elements as processes –Each computer is a multi-process systems with its objectives –A cluster is a set of multi-process systems with its objectives

June 2004COSET’ Holos System Servers Kernel Servers Global Scheduler Execution Server Migration Server Check- point Server Resource Discovery Server DSM Server Broker- age Server IPC Server Process Manage Server Space Manage Server GENESIS Microkernel Parallel Processes MP / PVM / MPI Process DSM Process Holos was developed based on the P2P and microkernel paradigms Holos was developed based on the P2P and microkernel paradigms The microkernel provides services such as The microkernel provides services such as –local IPC –basic paging operations –interrupt handling –context switching Three groups of processes: Three groups of processes: –kernel servers –system servers –application processes Kernel and system servers are stationary, application processes are mobile Kernel and system servers are stationary, application processes are mobile All processes communicate using messages All processes communicate using messages

June 2004COSET’ System Servers Form a Basis of an Autonomic Operating System for Nondedicated Clusters Resource Discovery Server - collects data about computation and communication load Resource Discovery Server - collects data about computation and communication load Availability Server - dynamically and adaptively forms a parallel virtual cluster for the application Availability Server - dynamically and adaptively forms a parallel virtual cluster for the application Global Scheduling Server – maps application processes using static allocation and dynamic load balancing on the computers of the virtual parallel cluster Global Scheduling Server – maps application processes using static allocation and dynamic load balancing on the computers of the virtual parallel cluster

June 2004COSET’ System Servers Form a Basis of an Autonomic Operating System for Nondedicated Clusters Execution Server - coordinates the single, multiple and group creation and duplication of application processes on both local and remote computers Execution Server - coordinates the single, multiple and group creation and duplication of application processes on both local and remote computers Migration Server - coordinates moving application processes to other computers Migration Server - coordinates moving application processes to other computers DSM Server - hides the distributed nature of the cluster’s memory and allows writing code as though using physically shared memory DSM Server - hides the distributed nature of the cluster’s memory and allows writing code as though using physically shared memory

June 2004COSET’ System Servers Form a Basis of an Autonomic Operating System for Nondedicated Clusters Checkpoint Server - coordinates creation of checkpoints for an executing application Checkpoint Server - coordinates creation of checkpoints for an executing application Fault Recovery Server – recovers application processes / applications using checkpoints Fault Recovery Server – recovers application processes / applications using checkpoints IAC Server - supports remote interprocess communication and supports group communication within sets of application processes IAC Server - supports remote interprocess communication and supports group communication within sets of application processes Brokerage Server – supports advertising and sharing services through service exporting, importing and revoking Brokerage Server – supports advertising and sharing services through service exporting, importing and revoking

June 2004COSET’ Holos Possesses the Autonomic Computing Characteristics Autonomic Computing RequirementCooperating Holos Servers –Relationships Among Autonomic Elements To allow a system to know itselfResource Discovery Server A system must configure and reconfigure itself under varying and unpredictable conditions Resource Discover Server, Global Scheduling Server, Migration Server, Execution Server, and Availability Server A system must optimize its workingGlobal Scheduling Server, Migration Server, and Execution Server A system must perform something akin to healingCheckpoint Server, Recovery Server, Migration Server, Global Scheduling Server A system must provide self-protectionCapabilities in the form of System Names A system must know its surrounding environmentResource Discovery Server, and Brokerage Server A system cannot exist in a hermetic environmentInterprocess Communication Server, and Brokerage Server A system must anticipate the optimized resources needed while keeping its complexity hidden (most critical for the user) DSM Server, and Execution Server, DSM Programming Environment, Message Passing Programming Environment, PVM/MPI Programming Environment

June 2004COSET’ Conclusion Autonomic computing has been shown to be a basic part of a revolutionary technology that Autonomic computing has been shown to be a basic part of a revolutionary technology that –Could move parallel computing on non-dedicated clusters to the computing mainstream –(Will start the new.com boom – is to be shown) The development of the Holos cluster operating system demonstrates that it is possible to build an autonomic non-dedicated cluster The development of the Holos cluster operating system demonstrates that it is possible to build an autonomic non-dedicated cluster The Holos cluster operating system has been built from scratch The Holos cluster operating system has been built from scratch