Software Tools for Dynamic Resource Management Irina V. Shoshmina, Dmitry Yu. Malashonok, Sergay Yu. Romanov Institute of High-Performance Computing and.

Slides:



Advertisements
Similar presentations
PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya
Advertisements

Barcelona Supercomputing Center. The BSC-CNS objectives: R&D in Computer Sciences, Life Sciences and Earth Sciences. Supercomputing support to external.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Computer Architecture Introduction to MIMD architectures Ola Flygt Växjö University
PERFORMANCE TESTING OF PVM AND MPI WITH THE MOSIX PROCESS MIGRATION IN A DISTRIBUTED COMPUTING ENVIRONMENT Mr. Ye Myint Naing
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Information Technology Center Introduction to High Performance Computing at KFUPM.
Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.
Distributed Computing Software based solutions to Parallel Computing.
NetSolve Henri Casanova and Jack Dongarra University of Tennessee and Oak Ridge National Laboratory
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
1 Providing a Single System Image: The GENESIS Approach Andrzej M. Goscinski School of Information Technology Deakin University.
1 Distributed Scheduling In Sombrero, A Single Address Space Distributed Operating System Milind Patil.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Institute for High Performance Computing and Information Systems St.Petersburg, Russia Vladimir Korkhov
Cross Cluster Migration Remote access support Adianto Wibisono supervised by : Dr. Dick van Albada Kamil Iskra, M. Sc.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
German National Research Center for Information Technology Research Institute for Computer Architecture and Software Technology German National Research.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Cluster Programming Technology and its Application in Meteorology Computer and Automation Research Institute Hungarian Academy of Sciences Hungarian Meteorological.
© 1998 GENIAS Software GmbH GENIAS Software GmbH GRD Mannheim/1 GRD Success Stories Customer Scenarios for Global Distributed Workload Management Wolfgang.
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing,
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Cluster Workstations. Recently the distinction between parallel and distributed computers has become blurred with the advent of the network of workstations.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
“DECISION” PROJECT “DECISION” PROJECT INTEGRATION PLATFORM CORBA PROTOTYPE CAST J. BLACHON & NGUYEN G.T. INRIA Rhône-Alpes June 10th, 1999.
Comparison of Distributed Operating Systems. Systems Discussed ◦Plan 9 ◦AgentOS ◦Clouds ◦E1 ◦MOSIX.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.
Beowulf Software. Monitoring and Administration Beowulf Watch 
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
UNICOS. When it comes to solving real-world problems, leading-edge hardware is only part of the solution. A complete solution also requires a powerful.
Cluster Software Overview
1 Grid Activity Summary » Grid Testbed » CFD Application » Virtualization » Information Grid » Grid CA.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Ch. 1 Introduction EE692 Parallel and Distribution Computation | Prof. Song.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
Use of Performance Prediction Techniques for Grid Management Junwei Cao University of Warwick April 2002.
Page : 1 SC2004 Pittsburgh, November 12, 2004 DEISA : integrating HPC infrastructures in Europe DEISA : integrating HPC infrastructures in Europe Victor.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Scheduling systems Carsten Preuß
Introduction to Load Balancing:
Distributed System Structures 16: Distributed Structures
Parallel I/O System for Massively Parallel Processors
Support for Adaptivity in ARMCI Using Migratable Objects
Presentation transcript:

Software Tools for Dynamic Resource Management Irina V. Shoshmina, Dmitry Yu. Malashonok, Sergay Yu. Romanov Institute of High-Performance Computing and Information Systems

State of the art Resources: n CONVEX(es) n Parsytec CC/16 n Parsytec CCid n Parsytec Power Mouse System n SPP1600 n SGI OCTANE Workstations n SunUltra 450 n Paritet (intel cluster) Scientific problems: n hydroaerodynamics n plasma n nuclear physics n medicine n biology n chemistry n astronomy

Difficulties n shortage of resources for soluble scientific problems n unsatisfactory management of tasks (the majority of tasks are parallel)

Shortage of resources integrate computational resources of several scientific centres Advantages of integration n increase access and activity of usage of computational resources, n promote an integration of scientific community, n increase the range of resolving scientific and technical problems

Management of tasks optimisation of task distribution on computational nodes Tools n Codine n SunGridEngine n PBS n Condor Disadvantages of tools n weak support of migration of parallel tasks n unsatisfactory load balancing n dependence on versions of PVM and MPI

Main goals of the project n increase of efficiency of use of computing resources n improvement of quality of service of the users Main tasks n migration of parallel tasks n optimisation of distributed resource management n integration resources of several scientific centres

Dynamite software developed by University of Amsterdam in the Esprit project Dynamite advantages n migration and checkpointing of PVM tasks n automatic work-load balancing of PVM tasks (on a cluster of workstations) n migration of dynamically linked tasks n migration of communication end points n reallocation of tasks

Dynamite disadvantages n dependence on the PVM versions n absence of migration of MPI tasks n absence of satisfactory monitoring system n absence of advanced scheduling system n absence of modules of global distribution

Main steps of the project n Migration of MPI and PVM tasks n Checkpointing of parallel tasks n Monitoring n Resource management n Addition architectures

Two-level system Global level Local level

Main problems of migration n migration of PVM tasks n migration of MPI tasks n independence from versions and realisations of PVM and MPI n addition of architectures Migration of PVM and MPI tasks n files n sockets n kernel supported threads and etc.

Checkpointing of parallel tasks n trace development of parallel tasks n migrate parallel tasks at two levels u migrate of a process of a parallel task (local level) u migrate of a parallel task wholly (global level) n process extreme situations

Checkpointing of parallel tasks Global level local level

Monitoring Parameters of n computational resources (loading of processors, memory, network), n tasks and queues, n users

Resource management n distribution of tasks and queues at the moment n long-time scheduling n dynamic load balancing at global and local levels

Integration with Globus Global environment local level Globus local level