UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

Slides:

Advertisements

Similar presentations

Barcelona Supercomputing Center. The BSC-CNS objectives: R&D in Computer Sciences, Life Sciences and Earth Sciences. Supercomputing support to external.

Advertisements

Paradyn/Condor Week 2004 MATE: Monitoring, Analysis and Tuning Environment Anna Morajko, Tomàs Margalef and Emilio Luque Universitat Autònoma de Barcelona.

Performance Testing - Kanwalpreet Singh.

7 april SP3.1: High-Performance Distributed Computing The KOALA grid scheduler and the Ibis Java-centric grid middleware Dick Epema Catalin Dumitrescu,

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.

Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.

Using Parallel Genetic Algorithm in a Predictive Job Scheduling

Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.

1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

Self-Correlating Predictive Information Tracking for Large-Scale Production Systems Zhao, Tan, Gong, Gu, Wambolt Presented by: Andrew Hahn.

1 Peer-To-Peer-Based Resource Discovery In Global Grids: A Tutorial Rajiv Ranjan, Aaron Harwood And Rajkumar Buyya, The University Of Melbbourne IEEE Communications.

1 PLuSH – Mesh Tree Fast and Robust Wide-Area Remote Execution Mikhail Afanasyev ‧ Jose Garcia ‧ Brian Lum.

EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

Inferring the Topology and Traffic Load of Parallel Programs in a VM environment Ashish Gupta Resource Virtualization Winter Quarter Project.

Inferring the Topology and Traffic Load of Parallel Programs in a VM environment Ashish Gupta Peter Dinda Department of Computer Science Northwestern University.

GRID COMPUTING & GRID SCHEDULERS - Neeraj Shah. Definition A ‘Grid’ is a collection of different machines where in all of them contribute any combination.

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

Network Support for Cloud Services Lixin Gao, UMass Amherst.

Word Wide Cache Distributed Caching for the Distributed Enterprise.

Maintaining a Microsoft SQL Server 2008 Database SQLServer-Training.com.

SensIT PI Meeting, January 15-17, Self-Organizing Sensor Networks: Efficient Distributed Mechanisms Alvin S. Lim Computer Science and Software Engineering.

Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.

Agent-based Device Management in RFID Middleware Author ： Zehao Liu, Fagui Liu, Kai Lin Reporter ：郭瓊雯.

AUTOBUILD Build and Deployment Automation Solution.

Improving Network I/O Virtualization for Cloud Computing.

1 High-Level Carrier Requirements for Cross Layer Optimization Dave McDysan Verizon.

Trace Generation to Simulate Large Scale Distributed Application Olivier Dalle, Emiio P. ManciniMar. 8th, 2012.

Transparent Grid Enablement Using Transparent Shaping and GRID superscalar I. Description and Motivation II. Background Information: Transparent Shaping.

High Throughput Computing on P2P Networks Carlos Pérez Miguel

What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.

A Proposal of Application Failure Detection and Recovery in the Grid Marian Bubak 1,2, Tomasz Szepieniec 2, Marcin Radecki 2 1 Institute of Computer Science,

1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.

The Grid System Design Liu Xiangrui Beijing Institute of Technology.

Example: Sorting on Distributed Computing Environment Apr 20,

Distributed Computing Systems CSCI 4780/6780. Distributed System A distributed system is: A collection of independent computers that appears to its users.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

Distributed Computing Systems CSCI 4780/6780. Geographical Scalability Challenges Synchronous communication –Waiting for a reply does not scale well!!

UAB Dynamic Tuning of Master/Worker Applications Anna Morajko, Paola Caymes Scutari, Tomàs Margalef, Eduardo Cesar, Joan Sorribes and Emilio Luque Universitat.

1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.

Virtual Private Grid (VPG) : A Command Shell for Utilizing Remote Machines Efficiently Kenji Kaneda, Kenjiro Taura, Akinori Yonezawa Department of Computer.

April 14, 2004 The Distributed Performance Consultant: Automated Performance Diagnosis on 1000s of Processors Philip C. Roth Computer.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and.

Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.

Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.

Making a DSM Consistency Protocol Hierarchy-Aware: An Efficient Synchronization Scheme Gabriel Antoniu, Luc Bougé, Sébastien Lacour IRISA / INRIA & ENS.

An Overview of Scientific Workflows: Domains & Applications Laboratoire Lorrain de Recherche en Informatique et ses Applications Presented by Khaled Gaaloul.

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.

MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.

Miquel Angel Senar Unitat d’Arquitectura de Computadors i Sistemes Operatius Universitat Autònoma de Barcelona Self-Adjusting.

Improving System Availability in Distributed Environments Sam Malek with Marija Mikic-Rakic Nels.

CEPBA-Tools experiences with MRNet and Dyninst Judit Gimenez, German Llort, Harald Servat

Dynamic Tuning of Parallel Programs with DynInst Anna Morajko, Tomàs Margalef, Emilio Luque Universitat Autònoma de Barcelona Paradyn/Condor Week, March.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

ConicIT & Compuware’s Strobe synergistic solution Automatic detection and analysis of applications problems.

Hadoop Aakash Kag What Why How 1.

Introduction to Distributed Platforms

Introduction to Load Balancing:

Dynamic Deployment of VO Specific Condor Scheduler using GT4

CompSci 725 Presentation by Siu Cho Jun, William.

Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.

Automatic Performance Tuning: Automatic Development of Tunlets

湖南大学-信息科学与工程学院-计算机与科学系

Presentation transcript:

UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat Autònoma de Barcelona Paradyn Week 2006 March 2006

2 Outline Introduction Multicluster Systems Applications on Wide Systems MATE New Requirements Design Conclusions

3 Introduction System performance  New problems require more computation power. Performance is a key issue.  New wide systems are built over the available resources and the user does not have total control of where the application will run.  It became more difficult to reach high performance and efficiency for these wide systems.

4 Introduction (II)  To reach performance goals, users need to find and solve bottlenecks.  Dynamic Monitoring and Tuning is a promising approach.  With dynamic systems’ properties, efficient resource use is hard to reach even for expert users.

5 Multicluster Systems  New systems are built using existing resources. Examples are NOW and HNOW linked with multistage network interconnections.  Intra cluster communications have different latencies than inter cluster communications.  Generally multiclusters built of clusters (homogenous or heterogeneous) interconnected by WAN.

6 Multicluster Systems (II) Each cluster can have its own scheduler and can be exposed either through a head node or by all nodes

7 Applications on Wide Systems Hierarchical Master/Worker Applications  Raise the possibility of performance bottlenecks Load imbalance problems Inefficient resource use Non-deterministic inter cluster bandwidth Worker Master Sub Master Sub Master explores data locality Common data are transmitted once Cluster A Cluster B

8 Applications on Wide Systems (II) Hierarchical Master/Worker Applications  Sub master is seen as a high processing node by the master.  Work distribution from master to sub master should be based on: Available bandwidth Computing power  These characteristics may have dynamic behavior.

9 MATE  Monitoring, Analysis and Tuning Environment Dynamic automatic tuning of parallel/distributed applications. Modifications Instrumentation User TuningMonitoring Tool Solution Problem / Performance analysis Performance data Application development Application Execution Source Events DynInst

10 Machine 3 Machine 2Machine 1 MATE (II) Analyzer AC instr. events modif. events DMLib Task 1 Task 2 Task 3 instr. AC Application Controller - AC Dynamic Monitoring Library - DMLib Analyzer

11 MATE (III)  Each tuning technique is implemented in MATE as a “tunlet”, a C/C++ library dynamically loaded to the Analyzer process. measure points – what events are needed performance model – how to determine bottlenecks and solutions tuning actions/points/synchronization - what to change, where, when Analyzer DTAPI Tunlet Performance model Measure points Tuning point, action, sync Tunlet Performance model Measure points Tuning point, action, sync

12 New Requirements Transparent process tracking  AC should follow application process to any cluster. Lower inter cluster instrumentation communication overhead  Inter cluster communications generally have high latency and lower bandwidth.

13 Transparent process tracking System Service  Machine or Cluster can have MATE enabled as daemon that detects startup of new processes. MATE Enabled Machine AC MATE Enabled Machine AC Task n startup detection MATE Enabled Machine DMLib AC Task n attach control receives Analyzer information Analyzer subscription DESIGN

14 new ‘Task’ Transparent process tracking Application plug-in  AC can be binary packaged with application binary. DMLib AC Task DMLib AC Task Remote Machine DMLib Remote Machine AC Task n detects Dyninst create control Analyzer subscription Job submission new ‘Task’ create DESIGN (II)

15 Lower communication overhead Smart event collection  Total application trace may generate much overhead. Event aggregation  Remote trace events should be aggregated to trace event abstractions, saving bandwidth. Inter Cluster Trace Event Routing DESIGN (III)

16 Analyzer Approaches Centralized  Requires tunlets modification to distinguish instrumentation data of local application processes. Hierarchical  Requires tunlets dismembering into local tunlets and global tunlets. Distributed  Requires that tunlets instances located on different Analyzer instances cooperate to tune an application.

17 Machine B3Machine B1 Machine B2 Machine A3 Machine A2 Machine A1 Lower communication overhead (II) Centralized Analyzer Approach Analyzer AC Task 1 Task 2 Task 3 AC Task 1 Task 4 Task 3 AC Task 2 Event Router Cluster BCluster A DESIGN (IV)

18 Machine A4 Global Analyzer Machine B2 Local Performance Model Analysis Hierarchical Analyzer Approach Abstract Events Machine B3Machine B1 Machine A3 Machine A2 Machine A1 Local Analyzer AC Task 1 Task 2 Task 3 AC Task 1 Task 4 Task 3 AC Cluster BCluster A Local Analyzer DESIGN (V)

19 Distributed Monitoring, Analysis and Tuning Environment Distributed Analyzer Approach Cluster ACluster B Machine B2 Machine B3Machine B1 Machine A3 Machine A2 Machine A1 Analyzer AC Task 1 Task 2 Task 3 AC Task 1 Task 4 Task 3 AC Cluster BCluster A Analyzer Tunlet instances cooperation DESIGN (VI)

20 Conclusions and future work  Conclusions Interference of instrumentation information on inter cluster communication should be minimal. Process tracking enables MATE for multicluster systems. Centralized Analyzer approach benefits tunlet developer but does not scale. Distributed Analyzer approach scales but requires different model based analysis.

21 Conclusions and future work (II)  Future Work Development of new tunlets for distributed and hierarchical Analyzer approach. Tuning based only of local instrumentation data. Semantics of aggregation for Instrumentation events. Patterns of distributed tunlets cooperation. Scenarios of distributed Analyzer cooperation in multiclusters.

22 Thank you…