Presentation is loading. Please wait.

Presentation is loading. Please wait.

20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Dennis Hoppe (HLRS) ATOM: A near-real time Monitoring.

Similar presentations


Presentation on theme: "20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Dennis Hoppe (HLRS) ATOM: A near-real time Monitoring."— Presentation transcript:

1 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Dennis Hoppe (HLRS) fEEDBACk@PODC ATOM: A near-real time Monitoring Framework for HPC and Embedded Systems

2 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: WHY? fEEDBACk@PODC 2

3 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: It’s all about saving energy Energy consumption is a major challenge in HPC (Exascale Challenge) [Ashby et al., 2010] – Energy consumption must be a design goal in future algorithm design – Standardization of interfaces and APIs to collect energy consumption data (cf. PAPI) is needed – Use of fine-grained measurement tools to evaluate energy saving effects on performance and vice versa Greening of the HPC domain will become as important the greening movement of the automotive domain fEEDBACk@PODC 3

4 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: High Demand throughout European Projects EXCESS [EXCESS, 2013] – Build an energy-aware programming framework DreamCloud [DreamCloud, 2013] – Enable dynamic resource allocation to satisfy performance guarantees and minimizing energy consumption  Predict performance and energy consumption of applications at run-time for further optimizations  Employ monitoring to retrieve detailed application profiles at run-time and for post-processing fEEDBACk@PODC 4

5 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Exploiting Monitoring Data in DreamCloud fEEDBACk@PODC 5

6 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Properties of Monitoring Solutions Timeliness Granularity Extensibility Architecture Scalability Adaptability Data Storage Visualization Predictability Non-Intrusiveness fEEDBACk@PODC 6 [Aceta et al., 2013], Katsaros et al., 2011], [Telesca et al., 2014]

7 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Requirements Analysis (Selection) Key PropertyZabbixNagiosOpenNMS Architecture Non-Intrusiveness  Scalability Timeliness( )  Granularity  Extensibility Data Storage  Visualization Adaptability  Predictability  fEEDBACk@PODC 7

8 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Requirements Analysis (Selection) Key PropertyZabbixNagiosOpenNMS Architecture Non-Intrusiveness  Scalability Timeliness( )  Granularity  Extensibility Data Storage  Visualization Adaptability  Predictability  fEEDBACk@PODC 8 None of the existing monitoring solutions satisfies the requirements imposed by current projects!  Towards a novel monitoring framework None of the existing monitoring solutions satisfies the requirements imposed by current projects!  Towards a novel monitoring framework

9 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: WHAT? fEEDBACk@PODC 9

10 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Key Features of ATOM Analyzing the system’s run-time context Low-intrusive, highly scalable architecture Flexible, language independent plug-in system Light-weight and easy-to-grasp user library Integration with PBS resource manager for on-demand monitoring of applications Interactive web-based front-end for data exploration and analysis fEEDBACk@PODC 10

11 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Design Decisions fEEDBACk@PODC 11 Key PropertyATOM Architecture-agent-based (producer-consumer principle) -Implementation using the programming language C Non-Intrusiveness-a minimal impact on the application’s performance at run-time Scalability-allow high update rates of metrics while being low-intrusive -allow online analytics (planned) Timeliness-allow high update rates of metrics while being low-intrusive Granularity-monitoring at different levels (infrastructure, applications, …) Extensibility-easy-to-use plug-in system via RESTful API Data Storage-efficient data storage, export, and analysis at run-time Visualization-provide basic visualization functionality Adaptability-platform-specific plug-ins support -re-configure plug-ins at run-time Predictability-via plug-in system (outlook)

12 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: HOW? fEEDBACk@PODC 12

13 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: ATOM Deployment on the HLRS/EXCESS Cluster fEEDBACk@PODC 13 Used for software development, testing, profiling, evaluations within HLRS and for external project partners Cluster is highly configurable and extensible; current power consumption is roughly between 0.5 and 2.0 kW Power measurement framework integrated with PBS system; no further performance overhead is induced while profiling applications

14 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: HLRS Power and Performance Measurement System fEEDBACk@PODC 14

15 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: ATOM Architecture fEEDBACk@PODC 15 – MONITOR: ATOM monitoring server – ACTOR: ATOM metric collector – Rickshaw (D3.js) – NodeJS – Elasticsearch

16 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: EXTENSIBILITY fEEDBACk@PODC 16

17 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: List of Plug-ins Performance Metrics – PAPI-C – Infiniband – NVIDIA SMI – /proc/meminfo – /proc/vmstat – Iostat Energy Metrics – RAPL – Likwid – hw_power (external measurement system) fEEDBACk@PODC 17

18 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Plug-in Development Plug-ins are in general language-independent, as long as they satisfy the communication interface (RESTful API). We currently have implemented two types of plug-ins: – Shell-based plug-ins Good for prototyping Needs to handle configuration on its own, i.e., – update frequency – enable/disable plug-in at run-time Induces extra performance costs – C-based plug-ins Initial extra cost of implementation Simple configuration and integration with the monitoring framework fEEDBACk@PODC 18

19 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Shell-based Plugin (/proc/vmstat) fEEDBACk@PODC 19

20 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Supporting Application-specific Metrics Provided plug-ins based on Shell-scripting and C applications measure the impact of an application onto the infrastructure (cf. PAPI-C, RAPL) To capture application-specific data, we need code instrumentation! – ATOM user library available in C and Python Our API is based on Application Response Measurement (ARM) standard for monitoring applications [Elarde et al., 2000] : – init() – start() – update() – stop() fEEDBACk@PODC 20

21 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: ATOM User Library fEEDBACk@PODC 21

22 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: DATA ANALYSIS [MF.EXCESS-PROJECT.EU] fEEDBACk@PODC 22

23 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Summary of Experiments fEEDBACk@PODC 23

24 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Visualization of Metric Data fEEDBACk@PODC 24

25 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: ATOM Export API Applications = Workflows & Tasks (Register and retrieve workflows) – GET /mf/workflows – PUT /mf/workflows/:wid – GET /mf/workflows/:wid Experiments (Retrieve information on experiments) – GET /mf/experiments – GET /mf/experiments/:eid Application Profiles (Retrieve application data) – GET /mf/profiles/:wid – GET /mf/profiles/:wid/:tid – GET /mf/profiles/:wid/:tid/:eid Energy Profiles (HLRS power measurement system) – GET /mf/energy/:wid/:eid – GET /mf/energy/:wid/:tid/:eid fEEDBACk@PODC 25 Legend: wid =Workflow ID tid =Task ID eid =Experiment ID

26 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: SUMMARY fEEDBACk@PODC 26

27 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Preliminary Experimental Results fEEDBACk@PODC 27

28 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Take Away Messages ATOM – is a light-weight, and easy to use monitoring framework focusing on HPC and embedded system support – has fundamental performance and energy metric support, that can be easily extended by a user-friendly plug-in system – offers users various interfaces to explore the profiling data (i.e. front-end, RESTful service, C and Python libraries) fEEDBACk@PODC 28

29 20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: References [Aceto et al., 2013] – Cloud Monitoring: A Survey, Computer Networks 57(9), 2013. [Ashby et al., 2010] – The Opportunities and Challenges of Exascale Computing, Summary Report of the Advanced Scientific Computing Advisory Committee (ASCAC) Subcommittee at the US Department of Energy Office of Science, 2010. [DreamCloud, 2013] – www.dreamcloud-project.eu [Elarde et al., 2000] – Performance analysis of application response measurement (ARM) version 2.0 measurement agent software implementations, Performance, Computing, and Communications Conference, 2000. [EXCESS, 2013] – www.excess-project.eu [Katsaros et al., 2011] – Monitoring: A fundamental Process to provide QoS Guarantees in Cloud based Platforms, Cloud Computing: Methodology, System, and Applications, 2011. [Telesca et al., 2014] – System Performance Monitoring of the ALICE Data Acquisition System with Zabbix, Journal of Physics, 2014. fEEDBACk@PODC 29


Download ppt "20.07.2015 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Dennis Hoppe (HLRS) ATOM: A near-real time Monitoring."

Similar presentations


Ads by Google