20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Dennis Hoppe (HLRS) ATOM: A near-real time Monitoring.

Slides:



Advertisements
Similar presentations
Websydian Anne-Marie Arnvig Manager, Websydian Communications & Relations.
Advertisements

Websydian products.
It’s easy being “RealGreen”. Fuji Xerox Hong Kong uses its own eco solutions to achieve business targets, social responsibility objectives and a sustainable.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Abstract HyFS: A Highly Available Distributed File System Jianqiang Luo, Mochan Shrestha, Lihao Xu Department of Computer Science, Wayne State University.
A Java Architecture for the Internet of Things Noel Poore, Architect Pete St. Pierre, Product Manager Java Platform Group, Internet of Things September.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
Automated Analysis and Code Generation for Domain-Specific Models George Edwards Center for Systems and Software Engineering University of Southern California.
June 21, PROOF - Parallel ROOT Facility Maarten Ballintijn, Rene Brun, Fons Rademakers, Gunter Roland Bring the KB to the PB.
FI-WARE GEs Backend 1 FI-Star SEFI-Ware GE specificationFI-Ware GEi used Back-End Connectivity Service Name: S3C Extended: Yes Rationale of extension:
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
Understanding and Managing WebSphere V5
Abstract Cloud data center management is a key problem due to the numerous and heterogeneous strategies that can be applied, ranging from the VM placement.
Clarity on the performance of IT Metricus at a Glance Metricus Metricus has been acknowledged for breaking new ground on IT performance management and.
DYNAMICS CRM AS AN xRM DEVELOPMENT PLATFORM Jim Novak Solution Architect Celedon Partners, LLC
Software to Data model Lenos Vacanas, Stelios Sotiriadis, Euripides Petrakis Technical University of Crete (TUC), Greece Workshop.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Virtual Machine Hosting for Networked Clusters: Building the Foundations for “Autonomic” Orchestration Based on paper by Laura Grit, David Irwin, Aydan.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
DORII Joint Research Activities DORII Joint Research Activities Status and Progress 4 th All-Hands-Meeting (AHM) Alexey Cheptsov on.
DCS Overview MCS/DCS Technical Interchange Meeting August, 2000.
Office 365 Platform Flexible Tools Understand different provisioning options and their advantages and disadvantages…
Bright Cluster Manager Advanced cluster management made easy Dr Matthijs van Leeuwen CEO Bright Computing Mark Corcoran Director of Sales Bright Computing.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Web Services Igor Wasinski Olumide Asojo Scott Hannan.
This project is partially funded by European Commission under the 7th Framework Programme - Grant agreement no ECO 2 Clouds team Barbara Pernici,
material assembled from the web pages at
ASG - Towards the Adaptive Semantic Services Enterprise Harald Meyer WWW Service Composition with Semantic Web Services
Architecting Web Services Unit – II – PART - III.
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
DR Software: Essential Foundational Elements and Platform Components UCLA Smart Grid Energy Research Center (SMERC) Industry Partners Program (IPP) Meeting.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
OMIS Approach to Grid Application Monitoring Bartosz Baliś Marian Bubak Włodzimierz Funika Roland Wismueller.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Net-Centric Software and Systems I/UCRC A Framework for QoS and Power Management for Mobile Devices in Service Clouds Project Lead: I-Ling Yen, Farokh.
07/09/04 Johan Muskens ( TU/e Computer Science, System Architecture and Networking.
BalticGrid-II Project The Second BalticGrid-II All-Hands Meeting, Riga, May, Joint Research Activity Enhanced Application Services on Sustainable.
Centre d’Excellence en Technologies de l’Information et de la Communication Evolution dans la gestion d’infrastructure de type Cloud (SDI)
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Overview and Comparison of Software Tools for Power Management in Data Centers Msc. Enida Sheme Acad. Neki Frasheri Polytechnic University of Tirana Albania.
Paul Graham Software Architect, EPCC PCP – The P robes C oordination P rotocol A secure, robust framework.
Survey of Tools to Support Safe Adaptation with Validation Alain Esteva-Ramirez School of Computing and Information Sciences Florida International University.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
CIS 375—Web App Dev II ASP.NET 1 Getting Started.
Lecture 21: Component-Based Software Engineering
XMC Cat: An Adaptive Catalog for Scientific Metadata Scott Jensen and Beth Plale School of Informatics and Computing Indiana University-Bloomington Current.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
Presented by The Harness Workbench: Unified and Adaptive Access to Diverse HPC Platforms Christian Engelmann Computer Science Research Group Computer Science.
Net-Centric Software and Systems I/UCRC A Framework for QoS and Power Management for Mobile Devices in Service Clouds Project Lead: I-Ling Yen, Farokh.
Leveraging SDN for The 5G Networks: Trends, Prospects and Challenges ADVISOR: 林甫俊教授 Presenter: Jimmy DATE: 2016/3/21 1.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Internet of Things. Creating Our Future Together.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Cluman: Advanced Cluster Management for Large-scale Infrastructures.
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
An Open Data Platform in the framework of the EGI-LifeWatch Competence Centre Fernando Aguilar Jesús Marco
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Structured Container Delivery Oscar Renalias Accenture Container Lead (NOTE: PASTE IN PORTRAIT AND SEND BEHIND FOREGROUND GRAPHIC FOR CROP)
Self-Contained Systems
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
Module 01 ETICS Overview ETICS Online Tutorials
Automated Analysis and Code Generation for Domain-Specific Models
Salesforce.com Salesforce.com is the world leader in on-demand customer relationship management (CRM) services Manages sales, marketing, customer service,
DBOS DecisionBrain Optimization Server
Presentation transcript:

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Dennis Hoppe (HLRS) ATOM: A near-real time Monitoring Framework for HPC and Embedded Systems

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: WHY? 2

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: It’s all about saving energy Energy consumption is a major challenge in HPC (Exascale Challenge) [Ashby et al., 2010] – Energy consumption must be a design goal in future algorithm design – Standardization of interfaces and APIs to collect energy consumption data (cf. PAPI) is needed – Use of fine-grained measurement tools to evaluate energy saving effects on performance and vice versa Greening of the HPC domain will become as important the greening movement of the automotive domain 3

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: High Demand throughout European Projects EXCESS [EXCESS, 2013] – Build an energy-aware programming framework DreamCloud [DreamCloud, 2013] – Enable dynamic resource allocation to satisfy performance guarantees and minimizing energy consumption  Predict performance and energy consumption of applications at run-time for further optimizations  Employ monitoring to retrieve detailed application profiles at run-time and for post-processing 4

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Exploiting Monitoring Data in DreamCloud 5

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Properties of Monitoring Solutions Timeliness Granularity Extensibility Architecture Scalability Adaptability Data Storage Visualization Predictability Non-Intrusiveness 6 [Aceta et al., 2013], Katsaros et al., 2011], [Telesca et al., 2014]

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Requirements Analysis (Selection) Key PropertyZabbixNagiosOpenNMS Architecture Non-Intrusiveness  Scalability Timeliness( )  Granularity  Extensibility Data Storage  Visualization Adaptability  Predictability  7

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Requirements Analysis (Selection) Key PropertyZabbixNagiosOpenNMS Architecture Non-Intrusiveness  Scalability Timeliness( )  Granularity  Extensibility Data Storage  Visualization Adaptability  Predictability  8 None of the existing monitoring solutions satisfies the requirements imposed by current projects!  Towards a novel monitoring framework None of the existing monitoring solutions satisfies the requirements imposed by current projects!  Towards a novel monitoring framework

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: WHAT? 9

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Key Features of ATOM Analyzing the system’s run-time context Low-intrusive, highly scalable architecture Flexible, language independent plug-in system Light-weight and easy-to-grasp user library Integration with PBS resource manager for on-demand monitoring of applications Interactive web-based front-end for data exploration and analysis 10

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Design Decisions 11 Key PropertyATOM Architecture-agent-based (producer-consumer principle) -Implementation using the programming language C Non-Intrusiveness-a minimal impact on the application’s performance at run-time Scalability-allow high update rates of metrics while being low-intrusive -allow online analytics (planned) Timeliness-allow high update rates of metrics while being low-intrusive Granularity-monitoring at different levels (infrastructure, applications, …) Extensibility-easy-to-use plug-in system via RESTful API Data Storage-efficient data storage, export, and analysis at run-time Visualization-provide basic visualization functionality Adaptability-platform-specific plug-ins support -re-configure plug-ins at run-time Predictability-via plug-in system (outlook)

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: HOW? 12

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: ATOM Deployment on the HLRS/EXCESS Cluster 13 Used for software development, testing, profiling, evaluations within HLRS and for external project partners Cluster is highly configurable and extensible; current power consumption is roughly between 0.5 and 2.0 kW Power measurement framework integrated with PBS system; no further performance overhead is induced while profiling applications

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: HLRS Power and Performance Measurement System 14

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: ATOM Architecture 15 – MONITOR: ATOM monitoring server – ACTOR: ATOM metric collector – Rickshaw (D3.js) – NodeJS – Elasticsearch

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: EXTENSIBILITY 16

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: List of Plug-ins Performance Metrics – PAPI-C – Infiniband – NVIDIA SMI – /proc/meminfo – /proc/vmstat – Iostat Energy Metrics – RAPL – Likwid – hw_power (external measurement system) 17

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Plug-in Development Plug-ins are in general language-independent, as long as they satisfy the communication interface (RESTful API). We currently have implemented two types of plug-ins: – Shell-based plug-ins Good for prototyping Needs to handle configuration on its own, i.e., – update frequency – enable/disable plug-in at run-time Induces extra performance costs – C-based plug-ins Initial extra cost of implementation Simple configuration and integration with the monitoring framework 18

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Shell-based Plugin (/proc/vmstat) 19

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Supporting Application-specific Metrics Provided plug-ins based on Shell-scripting and C applications measure the impact of an application onto the infrastructure (cf. PAPI-C, RAPL) To capture application-specific data, we need code instrumentation! – ATOM user library available in C and Python Our API is based on Application Response Measurement (ARM) standard for monitoring applications [Elarde et al., 2000] : – init() – start() – update() – stop() 20

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: ATOM User Library 21

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: DATA ANALYSIS [MF.EXCESS-PROJECT.EU] 22

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Summary of Experiments 23

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Visualization of Metric Data 24

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: ATOM Export API Applications = Workflows & Tasks (Register and retrieve workflows) – GET /mf/workflows – PUT /mf/workflows/:wid – GET /mf/workflows/:wid Experiments (Retrieve information on experiments) – GET /mf/experiments – GET /mf/experiments/:eid Application Profiles (Retrieve application data) – GET /mf/profiles/:wid – GET /mf/profiles/:wid/:tid – GET /mf/profiles/:wid/:tid/:eid Energy Profiles (HLRS power measurement system) – GET /mf/energy/:wid/:eid – GET /mf/energy/:wid/:tid/:eid 25 Legend: wid =Workflow ID tid =Task ID eid =Experiment ID

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: SUMMARY 26

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Preliminary Experimental Results 27

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Take Away Messages ATOM – is a light-weight, and easy to use monitoring framework focusing on HPC and embedded system support – has fundamental performance and energy metric support, that can be easily extended by a user-friendly plug-in system – offers users various interfaces to explore the profiling data (i.e. front-end, RESTful service, C and Python libraries) 28

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: References [Aceto et al., 2013] – Cloud Monitoring: A Survey, Computer Networks 57(9), [Ashby et al., 2010] – The Opportunities and Challenges of Exascale Computing, Summary Report of the Advanced Scientific Computing Advisory Committee (ASCAC) Subcommittee at the US Department of Energy Office of Science, [DreamCloud, 2013] – [Elarde et al., 2000] – Performance analysis of application response measurement (ARM) version 2.0 measurement agent software implementations, Performance, Computing, and Communications Conference, [EXCESS, 2013] – [Katsaros et al., 2011] – Monitoring: A fundamental Process to provide QoS Guarantees in Cloud based Platforms, Cloud Computing: Methodology, System, and Applications, [Telesca et al., 2014] – System Performance Monitoring of the ALICE Data Acquisition System with Zabbix, Journal of Physics,