1 Performance Evaluation of Load Sharing Policies with PANTS on a Beowulf Cluster James Nichols Mark Claypool Worcester Polytechnic Institute Department.

Slides:



Advertisements
Similar presentations
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Advertisements

Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.
High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.
Beowulf Supercomputer System Lee, Jung won CS843.
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
Development of a Compact Cluster with Embedded CPUs Sritrusta Sukaridhoto, Yoshifumi Sasaki, Koichi Ito and Takafumi Aoki.
Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
Transparent Process Migration for Distributed Applications in a Beowulf Cluster Mark Claypool and David Finkel Computer Science Department Worcester Polytechnic.
User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
Performance Evaluation of Load Sharing Policies on a Beowulf Cluster James Nichols Marc Lemaire Advisor: Mark Claypool.
Figure 1.1 Interaction between applications and the operating system.
CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL:
Adaptive Content Delivery for Scalable Web Servers Authors: Rahul Pradhan and Mark Claypool Presented by: David Finkel Computer Science Department Worcester.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
1 Distributed Systems: Distributed Process Management – Process Migration.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
Providing Policy Control Over Object Operations in a Mach Based System By Abhilash Chouksey
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
G-JavaMPI: A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports Lin Chen, Cho-Li Wang, Francis C. M. Lau and.
Transparent Process Migration: Design Alternatives and the Sprite Implementation Fred Douglis and John Ousterhout.
Guide to Linux Installation and Administration, 2e1 Chapter 10 Managing System Resources.
Windows 2000 Course Summary Computing Department, Lancaster University, UK.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
The Performance of Microkernel-Based Systems
Chapter 1 : The Linux System Part 1 Lecture 1 10/21/
1 University of Maryland Linger-Longer: Fine-Grain Cycle Stealing in Networks of Workstations Kyung Dong Ryu © Copyright 2000, Kyung Dong Ryu, All Rights.
Background: Operating Systems Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
Beowulf Software. Monitoring and Administration Beowulf Watch 
LINUX System : Lecture 7 Bong-Soo Sohn Lecture notes acknowledgement : The design of UNIX Operating System.
The Performance of μ-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presenter: Sunita Marathe.
Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.
Distributed System Services Fall 2008 Siva Josyula
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
DISTRIBUTED COMPUTING
The Performance of Micro-Kernel- Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Tim Hamilton.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
International Conference on Autonomic Computing Governor: Autonomic Throttling for Aggressive Idle Resource Scavenging Jonathan Strickland (1) Vincent.
Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 2.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Parallel IO for Cluster Computing Tran, Van Hoai.
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
System Models Advanced Operating Systems Nael Abu-halaweh.
Operating System Structure Lecture: - Operating System Concepts Lecturer: - Pooja Sharma Computer Science Department, Punjabi University, Patiala.
15/02/2006CHEP 061 Measuring Quality of Service on Worker Node in Cluster Rohitashva Sharma, R S Mundada, Sonika Sachdeva, P S Dhekne, Computer Division,
OpenMosix, Open SSI, and LinuxPMI
Distributed Shared Memory
Chapter 2: System Structures
Introduction to Operating System (OS)
Department of Computer Science University of California, Santa Barbara
Multithreaded Programming
LINUX System : Lecture 7 Lecture notes acknowledgement : The design of UNIX Operating System.
Department of Computer Science University of California, Santa Barbara
Support for Adaptivity in ARMCI Using Migratable Objects
A Virtual Machine Monitor for Utilizing Non-dedicated Clusters
Lecture Topics: 11/1 Hand back midterms
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Presentation transcript:

1 Performance Evaluation of Load Sharing Policies with PANTS on a Beowulf Cluster James Nichols Mark Claypool Worcester Polytechnic Institute Department of Computer Science Worcester, MA

2 Introduction What is a Beowulf cluster? Cluster of inexpensive personal computers networked together via Ethernet Typically run the Linux operating system Load Sharing Share load, decreasing response times and increasing overall throughput Need for expertise in a particular load distribution mechanism such as PVM or MPI

3 Introduction Load Measurement Typically use CPU as the load metric. What about disk and memory load? Or system events like interrupts and context switches? PANTS Application Node Transparency System Removes the need for knowledge about a particular implementation required by some load distribution mechanisms

4 Contributions Propose new load metrics Design benchmarks Evaluate performance There is some benefit to incorporating new types of load metrics into load distributions systems, like PANTS

5 Outline Introduction PANTS Methodology Results Conclusions

6 PANTS PANTS Application Node Transparency System Intercepts exec() system calls By default uses /proc file system to calculate CPU load to classify node as “busy” or “free” Any workload which does not generate CPU load will not be distributed  New load metrics and polices! Early results showed near linear speedup for computationally intensive applications

7 PANTS Algorithm We have implemented a variation of the multi-leader load-balancing algorithm proposed in [FW95] A node is elected to be the leader Leader keeps track of which machines in the cluster are free

8 PANTS Algorithm A random free node is chosen by the leader and returned to a node upon request Some algorithms use broadcast messages to find free nodes and communicate Busy nodes need to receive and process all of the messages PANTS avoids “busy-machine messages” by sending messages only to the leader multicast address

9 PANTS Multicast Communication

10 PANTS Implementation Two major software components: PANTS daemon and prex (PANTS remote execute) C-library object intercepts execve for processing by prex prex queries the PANTS daemon for a node to execute the process on, the daemon handles load measurement and leader communication RSH is used by prex to execute proccess on remote nodes

11 PREX

12 Extensions to PANTS PANTS is compatible with the distributed interprocess communications package DIPC DIPC requires some code modifications, but provides a wide range of IPC primitives Modify PANTS daemon configuration by altering /etc/pantsd.conf Send Unix signals to use new configuration Easily modify thresholds, exponential weighted averaging settings, multicast addresses Wide range of logging options  New load metrics and policies

13 WPI’s Beowulf Cluster Made possible through equipment grants from Compaq and Alpha Processor, Inc Seven 600mhz Alpha machines (EV56) Physical memory from MB 128 MB swap space PCI Ultra-Wide SCSI hard drives 100BaseT Ethernet RedHat 7.1, Linux kernel Shares files over NFS

14 Performance Evaluation of Load Sharing Policies with PANTS on a Beowulf Cluster Introduction PANTS Methodology Results Conclusions

15 Methodology Identified load parameters Implemented ways to measure parameters Built micro benchmarks which stressed each load metric for testing and verification Selected real world benchmark to evaluate performance

16 Load Metrics CPUusage (%) I/Oblocks/sec Context switchesswitches/sec Memorypage operations/sec InterruptsInterrupts/sec Read from /proc/stat

17 Micro-Benchmarks Set of simple benchmarks designed to generate a certain type of workload Verification of our load metrics Determination of thresholds CPU: perform many FLOPS I/O: copy large directory and files Memory: malloc() a block of memory, copy data structures using mmap()

18 Application benchmark: Linux kernel compile Distributed compilation of the Linux kernel Executed by the standard GNU program make Loads I/O and memory resources

19 Application Benchmark: Details Linux kernel version files compiled Mean source file size: 19 KB Marked gcc compiler binaries migrateable Needed to expand relative paths into absolute paths

20 Application Benchmark

21 Application Benchmark

22 Thresholds Obtained idle measurements Iteratively established thresholds MetricIdleThreshold CPU (%)0%95% I/O (blocks/sec)2501,000 Context switches (switches/sec)9506,000 Memory (pages/sec)04,000 Interrupts (interrupts/sec)103K115K

23 Performance Evaluation of Load Sharing Policies with PANTS on a Beowulf Cluster Introduction PANTS Methodology Results Conclusions

24 Micro-benchmarks Results Default Load Metrics

25 Micro-benchmarks Results New Load Metrics

26 Application Benchmark Results

27 I/O Load

28 Results: Compile Time

29 Conclusions PANTS has several attractive features: Transparency Reduced busy node communication Fault tolerance Intelligent load distribution decisions Achieve better throughput and more balanced load distribution when metrics include I/O, memory, interrupts, and context switches.

30 Future Work Use preemptive migration? Include network usage load metric Adaptive thresholds Heuristic based load distribution Migrate certain types of jobs to nodes that perform well when processing certain types of workloads

31 Questions? Acknowledgements Jeffrey Moyer, Kevin Dickson, Chuck Homic, Bryan Villamin, Michael Szelag, David Terry, Jennifer Waite, Seth Chandler, David Finkel, Alpha Processor, Inc. and Compaq Computer Corporation

32 Performance Evaluation of Load Sharing Policies with PANTS on a Beowulf Cluster James Nichols Mark Claypool Worcester Polytechnic Institute Department of Computer Science Worcester, MA