Download presentation
Presentation is loading. Please wait.
Published byHorace Potter Modified over 9 years ago
1
Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK Dan Protopopescu Glasgow, UK
2
MonALISA Is a distributed service able to: collect any type of information from different systems analyze this information in real time take automated decisions and perform actions based on it optimize work flows in complex environments Read more at http://monalisa.caltech.edu
3
Uses Monitoring distributed computing, i.e. GRIDs Optimizing flow in complex system (VRVS, optics cable networks) ALICE also uses ML for monitoring online reconstruction Some benchmark figures for the service: ~ 800k monitored parameters at 50k updates/second > 10k running (alien) jobs monitored simultaneously > 100 WAN links We are proposing ML as a high level monitoring and possible control system along with (or on top of) existing slow controls systems as epics, pvss etc.
4
Advantages MonALISA is simple to install, configure and use ApMon APIs are available in C, C++, Java, Python and Perl ROOT plugin allows macros to send data directly to MonaLISA Can easily interface with (or sit on top of) any existing or future slow controls subsystem (epics, pvss) Data is stored in a standard PgSQL (or MySQL) database that can be accessed by other applications, independently of ML Automatic data summarizing Several data repositories (and hence DBs) can exist (local and remote) Easy access via WebService (WS) from service and/or repository Fully supported by development team; work is being done in this direction
5
Capabilities Based on monitored information, actions can be taken in: ML Service ML Repository Actions can be triggered by: Values above/below given thresholds Absence/presence of values Correlations between several values Possible actions types: External command Plain event logging Annotation of repository charts; RSS feeds Email Instant messaging
6
Components Service Repository LUS/Proxies ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions GUI
7
Service setup Service Repository LUS ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions ML Service setup: wget http://nuclear.gla.ac.uk/~protopop/ML/MonaLisa.tar.gz tar -zxvf MonaLisa.tar.gz cd MonaLisa/./install.sh cd../MonaLisa/Service/CMD/./MLD start ML Service setup: wget http://nuclear.gla.ac.uk/~protopop/ML/MonaLisa.tar.gz tar -zxvf MonaLisa.tar.gz cd MonaLisa/./install.sh cd../MonaLisa/Service/CMD/./MLD start
8
Repository setup Service Repository LUS ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions ML Repository setup: wget http://nuclear.gla.ac.uk/~protopop/ML/MLrepository.tgz tar -zxvf MLrepository.tgz [configure it] cd MLrepository./start.sh ML Repository setup: wget http://nuclear.gla.ac.uk/~protopop/ML/MLrepository.tgz tar -zxvf MLrepository.tgz [configure it] cd MLrepository./start.sh
9
ApMon setup Service Repository LUS/Proxies ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions ApMon setup: wget http://nuclear.gla.ac.uk/~protopop/ML/ApMon_perl.tar.gzhttp://nuclear.gla.ac.uk/~protopop/ML/ApMon_perl.tar.gz tar -xzvf ApMon_perl.tar.gz cd ApMon_perl [create your script, say mysend.pl] perl mysend.pl ApMon setup: wget http://nuclear.gla.ac.uk/~protopop/ML/ApMon_perl.tar.gzhttp://nuclear.gla.ac.uk/~protopop/ML/ApMon_perl.tar.gz tar -xzvf ApMon_perl.tar.gz cd ApMon_perl [create your script, say mysend.pl] perl mysend.pl
10
Simple monitoring script Service Repository LUS ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions [monalisa@glasgow]$ cat mysend.pl use ApMon; my $apm = new ApMon({"glasgow.jlab.org:8884" => {"sys_monitoring" => 0, "general_info" => 0}}); my @pair; while (1) {# loop forever # get values from somewhere @pair = getmypar(“pspec_logic_ai_0”); $apm->sendParameters(”Detector", “MOR”, @pair); sleep (20); } [monalisa@glasgow]$ cat mysend.pl use ApMon; my $apm = new ApMon({"glasgow.jlab.org:8884" => {"sys_monitoring" => 0, "general_info" => 0}}); my @pair; while (1) {# loop forever # get values from somewhere @pair = getmypar(“pspec_logic_ai_0”); $apm->sendParameters(”Detector", “MOR”, @pair); sleep (20); }
11
Time history Service Repository LUS ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions Time history example: [monalisa@glasgow]$ cat mor.properties page=hist Farms=JlabML Clusters=Detector Nodes=MOR Functions=pspec_logic_ai_0 ylabel=Tagger rate title=MOR annotation.groups=2 Time history example: [monalisa@glasgow]$ cat mor.properties page=hist Farms=JlabML Clusters=Detector Nodes=MOR Functions=pspec_logic_ai_0 ylabel=Tagger rate title=MOR annotation.groups=2
12
Web interface
13
Java GUI
14
Application control Key Keystore ML Clients TCP based subscribe mechanism serialized, compressed objects with optional encryption ML Proxies Application commands are encrypted ML Services Standard and/or user’s sensors and/or application modules ML Service ApMon Your Application Your custom Java client GUI client ML Repository Your mon module Your custom view App MonC bash Your application Your app module LUS
15
Alert-based Actions MySQL daemon is automatically restarted when it runs out of memory Trigger: threshold on VSZ memory usage ALICE Production jobs queue is automatically kept full by the automatic resubmission Trigger: threshold on the number of aliprod waiting jobs Administrators are kept up-to-date on the services’ status Trigger: presence/absence of monitored information via instant messaging, RSS feeds, toolbar alerts etc.
16
Summary MonALISA is a very promising tool for online experiment monitoring and interfacing with a variety of slow control subsystems; GlueX are seriously considering ML for this task Easy to configure, understand and use Experience from Grid monitoring and more Support from the developers group for implementation of new modules/features Online experiment monitoring tests of CLAS@Jlab were recently carried on; demo repository is at http://mlr1.gla.ac.uk:7002 http://mlr1.gla.ac.uk:7002
17
More examples / Extras
18
Integrated Pie Charts
19
History Plots, Annotations
20
AliEn Services Monitoring AliEn services Periodically checked PID check + SOAP call Simple functional tests SE space usage Efficiency
21
Job Network Traffic Monitoring Based on the xrootd transfer from every job Aggregated statistics for Sites (incoming, outgoing, site to site, internal) Storage Elements (incoming, outgoing) Of Read and written files Transferred MB/s
22
Individual Job Tracking Based on AliEn shell cmds. top, ps, spy, jobinfo, masterjob Using the GUI ML Client Status, resource usage, per job
23
Head Node Monitoring Machine parameters, real-time & history, load, memory & swap usage, processes, sockets
24
MonALISA in AliEn The MonALISA framework is used as a primary monitoring tool for the ALICE Grid since 2004 Presently the system is used for monitoring of all (identified) services, jobs and network parameters necessary for the Grid operation and debugging The number of concurrently monitored and stored parameters today is ~ 300.000 in 75 ML Services The add-on tools for automatic events notification allow for more efficient reaction to problems The framework design and flexibility answers all requirements for a monitoring system The accumulated information allows to construct and implement automated decision making algorithms, thus increasing further the efficiency of the Grid operations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.