Presentation is loading. Please wait.

Presentation is loading. Please wait.

Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK Dan Protopopescu Glasgow, UK.

Similar presentations


Presentation on theme: "Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK Dan Protopopescu Glasgow, UK."— Presentation transcript:

1 Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK Dan Protopopescu Glasgow, UK

2 MonALISA Is a distributed service able to:  collect any type of information from different systems  analyze this information in real time  take automated decisions and perform actions based on it  optimize work flows in complex environments Read more at http://monalisa.caltech.edu

3 Uses  Monitoring distributed computing, i.e. GRIDs  Optimizing flow in complex system (VRVS, optics cable networks)  ALICE also uses ML for monitoring online reconstruction  Some benchmark figures for the service:  ~ 800k monitored parameters at 50k updates/second  > 10k running (alien) jobs monitored simultaneously  > 100 WAN links We are proposing ML as a high level monitoring and possible control system along with (or on top of) existing slow controls systems as epics, pvss etc.

4 Advantages  MonALISA is simple to install, configure and use  ApMon APIs are available in C, C++, Java, Python and Perl  ROOT plugin allows macros to send data directly to MonaLISA  Can easily interface with (or sit on top of) any existing or future slow controls subsystem (epics, pvss)  Data is stored in a standard PgSQL (or MySQL) database that can be accessed by other applications, independently of ML  Automatic data summarizing  Several data repositories (and hence DBs) can exist (local and remote)  Easy access via WebService (WS) from service and/or repository  Fully supported by development team; work is being done in this direction

5 Capabilities Based on monitored information, actions can be taken in:  ML Service  ML Repository Actions can be triggered by:  Values above/below given thresholds  Absence/presence of values  Correlations between several values Possible actions types:  External command  Plain event logging  Annotation of repository charts; RSS feeds  Email  Instant messaging

6 Components Service Repository LUS/Proxies ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions GUI

7 Service setup Service Repository LUS ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions ML Service setup: wget http://nuclear.gla.ac.uk/~protopop/ML/MonaLisa.tar.gz tar -zxvf MonaLisa.tar.gz cd MonaLisa/./install.sh cd../MonaLisa/Service/CMD/./MLD start ML Service setup: wget http://nuclear.gla.ac.uk/~protopop/ML/MonaLisa.tar.gz tar -zxvf MonaLisa.tar.gz cd MonaLisa/./install.sh cd../MonaLisa/Service/CMD/./MLD start

8 Repository setup Service Repository LUS ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions ML Repository setup: wget http://nuclear.gla.ac.uk/~protopop/ML/MLrepository.tgz tar -zxvf MLrepository.tgz [configure it] cd MLrepository./start.sh ML Repository setup: wget http://nuclear.gla.ac.uk/~protopop/ML/MLrepository.tgz tar -zxvf MLrepository.tgz [configure it] cd MLrepository./start.sh

9 ApMon setup Service Repository LUS/Proxies ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions ApMon setup: wget http://nuclear.gla.ac.uk/~protopop/ML/ApMon_perl.tar.gzhttp://nuclear.gla.ac.uk/~protopop/ML/ApMon_perl.tar.gz tar -xzvf ApMon_perl.tar.gz cd ApMon_perl [create your script, say mysend.pl] perl mysend.pl ApMon setup: wget http://nuclear.gla.ac.uk/~protopop/ML/ApMon_perl.tar.gzhttp://nuclear.gla.ac.uk/~protopop/ML/ApMon_perl.tar.gz tar -xzvf ApMon_perl.tar.gz cd ApMon_perl [create your script, say mysend.pl] perl mysend.pl

10 Simple monitoring script Service Repository LUS ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions [monalisa@glasgow]$ cat mysend.pl use ApMon; my $apm = new ApMon({"glasgow.jlab.org:8884" => {"sys_monitoring" => 0, "general_info" => 0}}); my @pair; while (1) {# loop forever # get values from somewhere @pair = getmypar(“pspec_logic_ai_0”); $apm->sendParameters(”Detector", “MOR”, @pair); sleep (20); } [monalisa@glasgow]$ cat mysend.pl use ApMon; my $apm = new ApMon({"glasgow.jlab.org:8884" => {"sys_monitoring" => 0, "general_info" => 0}}); my @pair; while (1) {# loop forever # get values from somewhere @pair = getmypar(“pspec_logic_ai_0”); $apm->sendParameters(”Detector", “MOR”, @pair); sleep (20); }

11 Time history Service Repository LUS ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions Time history example: [monalisa@glasgow]$ cat mor.properties page=hist Farms=JlabML Clusters=Detector Nodes=MOR Functions=pspec_logic_ai_0 ylabel=Tagger rate title=MOR annotation.groups=2 Time history example: [monalisa@glasgow]$ cat mor.properties page=hist Farms=JlabML Clusters=Detector Nodes=MOR Functions=pspec_logic_ai_0 ylabel=Tagger rate title=MOR annotation.groups=2

12 Web interface

13 Java GUI

14 Application control Key Keystore  ML Clients  TCP based subscribe mechanism serialized, compressed objects with optional encryption  ML Proxies  Application commands are encrypted  ML Services  Standard and/or user’s sensors and/or application modules ML Service ApMon Your Application Your custom Java client GUI client ML Repository Your mon module Your custom view App MonC bash Your application Your app module LUS

15 Alert-based Actions MySQL daemon is automatically restarted when it runs out of memory Trigger: threshold on VSZ memory usage ALICE Production jobs queue is automatically kept full by the automatic resubmission Trigger: threshold on the number of aliprod waiting jobs Administrators are kept up-to-date on the services’ status Trigger: presence/absence of monitored information via instant messaging, RSS feeds, toolbar alerts etc.

16 Summary  MonALISA is a very promising tool for online experiment monitoring and interfacing with a variety of slow control subsystems; GlueX are seriously considering ML for this task  Easy to configure, understand and use  Experience from Grid monitoring and more  Support from the developers group for implementation of new modules/features  Online experiment monitoring tests of CLAS@Jlab were recently carried on; demo repository is at http://mlr1.gla.ac.uk:7002 http://mlr1.gla.ac.uk:7002

17 More examples / Extras

18 Integrated Pie Charts

19 History Plots, Annotations

20 AliEn Services Monitoring  AliEn services  Periodically checked  PID check + SOAP call  Simple functional tests  SE space usage  Efficiency

21 Job Network Traffic Monitoring  Based on the xrootd transfer from every job  Aggregated statistics for  Sites (incoming, outgoing, site to site, internal)  Storage Elements (incoming, outgoing)  Of  Read and written files  Transferred MB/s

22 Individual Job Tracking  Based on AliEn shell cmds.  top, ps, spy, jobinfo, masterjob  Using the GUI ML Client  Status, resource usage, per job

23 Head Node Monitoring  Machine parameters, real-time & history, load, memory & swap usage, processes, sockets

24 MonALISA in AliEn  The MonALISA framework is used as a primary monitoring tool for the ALICE Grid since 2004  Presently the system is used for monitoring of all (identified) services, jobs and network parameters necessary for the Grid operation and debugging  The number of concurrently monitored and stored parameters today is ~ 300.000 in 75 ML Services  The add-on tools for automatic events notification allow for more efficient reaction to problems  The framework design and flexibility answers all requirements for a monitoring system  The accumulated information allows to construct and implement automated decision making algorithms, thus increasing further the efficiency of the Grid operations


Download ppt "Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK Dan Protopopescu Glasgow, UK."

Similar presentations


Ads by Google