Download presentation
Presentation is loading. Please wait.
Published byDustin Brown Modified over 9 years ago
1
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Dennis Hoppe (HLRS) fEEDBACk@PODC ATOM: A near-real time Monitoring Framework for HPC and Embedded Systems
2
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: WHY? fEEDBACk@PODC 2
3
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: It’s all about saving energy Energy consumption is a major challenge in HPC (Exascale Challenge) [Ashby et al., 2010] – Energy consumption must be a design goal in future algorithm design – Standardization of interfaces and APIs to collect energy consumption data (cf. PAPI) is needed – Use of fine-grained measurement tools to evaluate energy saving effects on performance and vice versa Greening of the HPC domain will become as important the greening movement of the automotive domain fEEDBACk@PODC 3
4
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: High Demand throughout European Projects EXCESS [EXCESS, 2013] – Build an energy-aware programming framework DreamCloud [DreamCloud, 2013] – Enable dynamic resource allocation to satisfy performance guarantees and minimizing energy consumption Predict performance and energy consumption of applications at run-time for further optimizations Employ monitoring to retrieve detailed application profiles at run-time and for post-processing fEEDBACk@PODC 4
5
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Exploiting Monitoring Data in DreamCloud fEEDBACk@PODC 5
6
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Properties of Monitoring Solutions Timeliness Granularity Extensibility Architecture Scalability Adaptability Data Storage Visualization Predictability Non-Intrusiveness fEEDBACk@PODC 6 [Aceta et al., 2013], Katsaros et al., 2011], [Telesca et al., 2014]
7
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Requirements Analysis (Selection) Key PropertyZabbixNagiosOpenNMS Architecture Non-Intrusiveness Scalability Timeliness( ) Granularity Extensibility Data Storage Visualization Adaptability Predictability fEEDBACk@PODC 7
8
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Requirements Analysis (Selection) Key PropertyZabbixNagiosOpenNMS Architecture Non-Intrusiveness Scalability Timeliness( ) Granularity Extensibility Data Storage Visualization Adaptability Predictability fEEDBACk@PODC 8 None of the existing monitoring solutions satisfies the requirements imposed by current projects! Towards a novel monitoring framework None of the existing monitoring solutions satisfies the requirements imposed by current projects! Towards a novel monitoring framework
9
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: WHAT? fEEDBACk@PODC 9
10
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Key Features of ATOM Analyzing the system’s run-time context Low-intrusive, highly scalable architecture Flexible, language independent plug-in system Light-weight and easy-to-grasp user library Integration with PBS resource manager for on-demand monitoring of applications Interactive web-based front-end for data exploration and analysis fEEDBACk@PODC 10
11
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Design Decisions fEEDBACk@PODC 11 Key PropertyATOM Architecture-agent-based (producer-consumer principle) -Implementation using the programming language C Non-Intrusiveness-a minimal impact on the application’s performance at run-time Scalability-allow high update rates of metrics while being low-intrusive -allow online analytics (planned) Timeliness-allow high update rates of metrics while being low-intrusive Granularity-monitoring at different levels (infrastructure, applications, …) Extensibility-easy-to-use plug-in system via RESTful API Data Storage-efficient data storage, export, and analysis at run-time Visualization-provide basic visualization functionality Adaptability-platform-specific plug-ins support -re-configure plug-ins at run-time Predictability-via plug-in system (outlook)
12
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: HOW? fEEDBACk@PODC 12
13
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: ATOM Deployment on the HLRS/EXCESS Cluster fEEDBACk@PODC 13 Used for software development, testing, profiling, evaluations within HLRS and for external project partners Cluster is highly configurable and extensible; current power consumption is roughly between 0.5 and 2.0 kW Power measurement framework integrated with PBS system; no further performance overhead is induced while profiling applications
14
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: HLRS Power and Performance Measurement System fEEDBACk@PODC 14
15
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: ATOM Architecture fEEDBACk@PODC 15 – MONITOR: ATOM monitoring server – ACTOR: ATOM metric collector – Rickshaw (D3.js) – NodeJS – Elasticsearch
16
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: EXTENSIBILITY fEEDBACk@PODC 16
17
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: List of Plug-ins Performance Metrics – PAPI-C – Infiniband – NVIDIA SMI – /proc/meminfo – /proc/vmstat – Iostat Energy Metrics – RAPL – Likwid – hw_power (external measurement system) fEEDBACk@PODC 17
18
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Plug-in Development Plug-ins are in general language-independent, as long as they satisfy the communication interface (RESTful API). We currently have implemented two types of plug-ins: – Shell-based plug-ins Good for prototyping Needs to handle configuration on its own, i.e., – update frequency – enable/disable plug-in at run-time Induces extra performance costs – C-based plug-ins Initial extra cost of implementation Simple configuration and integration with the monitoring framework fEEDBACk@PODC 18
19
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Shell-based Plugin (/proc/vmstat) fEEDBACk@PODC 19
20
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Supporting Application-specific Metrics Provided plug-ins based on Shell-scripting and C applications measure the impact of an application onto the infrastructure (cf. PAPI-C, RAPL) To capture application-specific data, we need code instrumentation! – ATOM user library available in C and Python Our API is based on Application Response Measurement (ARM) standard for monitoring applications [Elarde et al., 2000] : – init() – start() – update() – stop() fEEDBACk@PODC 20
21
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: ATOM User Library fEEDBACk@PODC 21
22
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: DATA ANALYSIS [MF.EXCESS-PROJECT.EU] fEEDBACk@PODC 22
23
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Summary of Experiments fEEDBACk@PODC 23
24
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Visualization of Metric Data fEEDBACk@PODC 24
25
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: ATOM Export API Applications = Workflows & Tasks (Register and retrieve workflows) – GET /mf/workflows – PUT /mf/workflows/:wid – GET /mf/workflows/:wid Experiments (Retrieve information on experiments) – GET /mf/experiments – GET /mf/experiments/:eid Application Profiles (Retrieve application data) – GET /mf/profiles/:wid – GET /mf/profiles/:wid/:tid – GET /mf/profiles/:wid/:tid/:eid Energy Profiles (HLRS power measurement system) – GET /mf/energy/:wid/:eid – GET /mf/energy/:wid/:tid/:eid fEEDBACk@PODC 25 Legend: wid =Workflow ID tid =Task ID eid =Experiment ID
26
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: SUMMARY fEEDBACk@PODC 26
27
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Preliminary Experimental Results fEEDBACk@PODC 27
28
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Take Away Messages ATOM – is a light-weight, and easy to use monitoring framework focusing on HPC and embedded system support – has fundamental performance and energy metric support, that can be easily extended by a user-friendly plug-in system – offers users various interfaces to explore the profiling data (i.e. front-end, RESTful service, C and Python libraries) fEEDBACk@PODC 28
29
20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: References [Aceto et al., 2013] – Cloud Monitoring: A Survey, Computer Networks 57(9), 2013. [Ashby et al., 2010] – The Opportunities and Challenges of Exascale Computing, Summary Report of the Advanced Scientific Computing Advisory Committee (ASCAC) Subcommittee at the US Department of Energy Office of Science, 2010. [DreamCloud, 2013] – www.dreamcloud-project.eu [Elarde et al., 2000] – Performance analysis of application response measurement (ARM) version 2.0 measurement agent software implementations, Performance, Computing, and Communications Conference, 2000. [EXCESS, 2013] – www.excess-project.eu [Katsaros et al., 2011] – Monitoring: A fundamental Process to provide QoS Guarantees in Cloud based Platforms, Cloud Computing: Methodology, System, and Applications, 2011. [Telesca et al., 2014] – System Performance Monitoring of the ALICE Data Acquisition System with Zabbix, Journal of Physics, 2014. fEEDBACk@PODC 29
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.