Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automated Grid Monitoring for LHCb Experiment through HammerCloud Bradley Dice Valentina Mancinelli.

Similar presentations


Presentation on theme: "Automated Grid Monitoring for LHCb Experiment through HammerCloud Bradley Dice Valentina Mancinelli."— Presentation transcript:

1 Automated Grid Monitoring for LHCb Experiment through HammerCloud Bradley Dice Valentina Mancinelli

2 1 WLCG, http://wlcg.web.cern.ch/ 170 sites 42 countries 30 PB/year WLCG WLCG 1

3 HammerCloud Distributed Analysis Testing System 2 2 HammerCloud v4, https://twiki.cern.ch/twiki/bin/viewauth/IT/HammerCloud

4 Why Testing?

5  Between 5% and 10% of jobs fail 3  Intermittent failures?  Systemic problems?  Need testing to diagnose 3 J. Elmsheuser, F. Legger, R. Medrano Llamas, G. Sciacca, and D. van der Ster, J. Phys. Conf. Ser. 396, 032066 (2012).

6 Why Testing?  Between 5% and 10% of jobs fail 3  Intermittent failures?  Systemic problems?  Need testing to diagnose  Purpose of HammerCloud:  Validates grid health  Helps test new sites  Verifies correct operation of new software  Allows performance comparisons 3 J. Elmsheuser, F. Legger, R. Medrano Llamas, G. Sciacca, and D. van der Ster, J. Phys. Conf. Ser. 396, 032066 (2012).

7

8

9 Project Overview  Use HammerCloud LHCb to…  Test LHCb data storage access  Test new releases of user analysis programs  Report data to Resource Status System

10 Project Overview  Use HammerCloud LHCb to…  Test LHCb data storage access  Test new releases of user analysis programs  Report data to Resource Status System  Tools  Django/Python (web interface)  Ganga (job submission)  OpenStack/Puppet (virtual machines, system management)

11 Levels of HammerCloud: Front EndBack EndGrid Tests

12 Front End  User interface shows list of current and past tests and offers management tools

13 Front End  User interface shows list of current and past tests and offers management tools  Data visualizations categorize errors and the sites they affect (right)

14 Back End  The test manager interfaces between Ganga (to submit grid jobs) and Django (to display data)

15 Back End  The test manager interfaces between Ganga (to submit grid jobs) and Django (to display data)  The backend produces data visualizations, e.g. jobs by status: complete, running, schedule, or failed (right)

16 Back End  The test manager interfaces between Ganga (to submit grid jobs) and Django (to display data)  The backend produces data visualizations, e.g. jobs by status: complete, running, schedule, or failed (right)  HammerCloud sites automatically update to match the WLCG topology  Reports data via a REST API to DIRAC Resource Status System

17 Grid Tests (Getting Results)  Detecting and classifying data access failure is the key purpose of HammerCloud

18 Grid Tests (Getting Results)  Detecting and classifying data access failure is the key purpose of HammerCloud  Grid metrics like Time to Start (right) give an indication of site load

19 Grid Tests (Getting Results)  Detecting and classifying data access failure is the key purpose of HammerCloud  Grid metrics like Time to Start (right) give an indication of site load  Analyzing logs to determine reasons for failure / failover

20 Future Work  New testing architecture: the LHCb “mesh”  More useful data visualizations and metrics  Provide grid site status information to RSS (Resource Status System) via REST API  Long-term plan: Testing as a Service 4 4 R. M. Llamas, et. al., J. Phys. Conf. Ser. 513, 062031 (2014).

21 At CERN, I…  Experienced global-scale computing  Learned the inner workings of the Grid  Improved understanding of Django framework  Engaged in a variety of cultural activities & scientific studies  Refined my career interests  Had an amazing summer!

22 Thank you for your time.


Download ppt "Automated Grid Monitoring for LHCb Experiment through HammerCloud Bradley Dice Valentina Mancinelli."

Similar presentations


Ads by Google