Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK Dan Protopopescu Glasgow, UK.

Slides:



Advertisements
Similar presentations
TCP Monitor and Auto Tuner. Need Analysis Enable monitoring of TCP Connections Enable maximum bandwidth utilization No such utility available in MONALISA.
Advertisements

NetWatcher NetGuarder NetWatcher Introduction - Boost Enterprise Efficiency Dramatically
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
1 Generic logging layer for the distributed computing by Gene Van Buren Valeri Fine Jerome Lauret.
MONITORING WITH MONALISA Costin Grigoras. M ONITORING WITH M ON ALISA What is MonALISA ? MonALISA communication architecture Monitoring modules ApMon.
Nada Abdulla Ahmed.  SmoothWall Express is an open source firewall distribution based on the GNU/Linux operating system. Designed for ease of use, SmoothWall.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Background Info The UK Mirror Service provides mirror copies of data and programs from many sources all over the world. This enables users in the UK to.
October 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
© 2004, The Trustees of Indiana University 1 OneStart Workflow Basics Brian McGough, Manager, Systems Integration, UITS Ryan Kirkendall, Lead Developer.
Maintaining and Updating Windows Server 2008
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Customized cloud platform for computing on your terms !
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
September 2005 Iosif Legrand 1 End User Agents: extending the "intelligence" to the edge in Distributed Service Systems Iosif Legrand California Institute.
Josh Riggs Utilizing Open Source Network Monitoring.
WITSML Service Platform - Enterprise Drilling Information
Informix IDS Administration with the New Server Studio 4.0 By Lester Knutsen My experience with the beta of Server Studio and the new Informix database.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
1 Ramiro Voicu, Iosif Legrand, Harvey Newman, Artur Barczyk, Costin Grigoras, Ciprian Dobre, Alexandru Costan, Azher Mughal, Sandor Rozsa Monitoring and.
Monitoring, Accounting and Automated Decision Support for the ALICE Experiment Based on the MonALISA Framework.
February 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology February 2006 February 2006 An Agent Based, Dynamic Service System to Monitor,
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
1 / 22 AliRoot and AliEn Build Integration and Testing System.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
1 Implementing Monitoring and Reporting. 2 Why Should Implement Monitoring? One of the biggest complaints we hear about firewall products from almost.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Site operations Outline Central services VoBox services Monitoring Storage and networking 4/8/20142ALICE-USA Review - Site Operations.
Computing Division Requests The following is a list of tasks about to be officially submitted to the Computing Division for requested support. D0 personnel.
ABone Architecture and Operation ABCd — ABone Control Daemon Server for remote EE management On-demand EE initiation and termination Automatic EE restart.
Overview of ALICE monitoring Catalin Cirstoiu, Pablo Saiz, Latchezar Betev 23/03/2007 System Analysis Working Group.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
Monitoring with MonALISA Costin Grigoras. What is MonALISA ?  Caltech project started in 2002
Interfacing EPICS and MonALISA Peter Zumbruch Experiment control systems group GSI (KS/EE)
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
ClearQuest XML Server with ClearCase Integration Northwest Rational User’s Group February 22, 2007 Frank Scholz Casey Stewart
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
April 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
+ AliEn site services and monitoring Miguel Martinez Pedreira.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
Institute for the Protection and Security of the Citizen HAZAS – Hazard Assessment ECCAIRS Technical Course Provided by the Joint Research Centre - Ispra.
October 2006 Iosif Legrand 1 Iosif Legrand California Institute of Technology An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
TF meeting – July 13, 2006 Support for taking actions in MonALISA Costin Grigoras.
MONITORING WITH MONALISA Costin Grigoras. M ON ALISA COMMUNICATION ARCHITECTURE MonALISA software components and the connections between them Data consumers.
1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
Maintaining and Updating Windows Server 2008 Lesson 8.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Federating Data in the ALICE Experiment
California Institute of Technology
INFNGRID Monitoring Group report
ALICE Monitoring
GWE Core Grid Wizard Enterprise (
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Monitoring of the infrastructure from the VO perspective
Presentation transcript:

Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK Dan Protopopescu Glasgow, UK

MonALISA Is a distributed service able to:  collect any type of information from different systems  analyze this information in real time  take automated decisions and perform actions based on it  optimize work flows in complex environments Read more at

Uses  Monitoring distributed computing, i.e. GRIDs  Optimizing flow in complex system (VRVS, optics cable networks)  ALICE also uses ML for monitoring online reconstruction  Some benchmark figures for the service:  ~ 800k monitored parameters at 50k updates/second  > 10k running (alien) jobs monitored simultaneously  > 100 WAN links We are proposing ML as a high level monitoring and possible control system along with (or on top of) existing slow controls systems as epics, pvss etc.

Advantages  MonALISA is simple to install, configure and use  ApMon APIs are available in C, C++, Java, Python and Perl  ROOT plugin allows macros to send data directly to MonaLISA  Can easily interface with (or sit on top of) any existing or future slow controls subsystem (epics, pvss)  Data is stored in a standard PgSQL (or MySQL) database that can be accessed by other applications, independently of ML  Automatic data summarizing  Several data repositories (and hence DBs) can exist (local and remote)  Easy access via WebService (WS) from service and/or repository  Fully supported by development team; work is being done in this direction

Capabilities Based on monitored information, actions can be taken in:  ML Service  ML Repository Actions can be triggered by:  Values above/below given thresholds  Absence/presence of values  Correlations between several values Possible actions types:  External command  Plain event logging  Annotation of repository charts; RSS feeds   Instant messaging

Components Service Repository LUS/Proxies ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions GUI

Service setup Service Repository LUS ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions ML Service setup: wget tar -zxvf MonaLisa.tar.gz cd MonaLisa/./install.sh cd../MonaLisa/Service/CMD/./MLD start ML Service setup: wget tar -zxvf MonaLisa.tar.gz cd MonaLisa/./install.sh cd../MonaLisa/Service/CMD/./MLD start

Repository setup Service Repository LUS ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions ML Repository setup: wget tar -zxvf MLrepository.tgz [configure it] cd MLrepository./start.sh ML Repository setup: wget tar -zxvf MLrepository.tgz [configure it] cd MLrepository./start.sh

ApMon setup Service Repository LUS/Proxies ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions ApMon setup: wget tar -xzvf ApMon_perl.tar.gz cd ApMon_perl [create your script, say mysend.pl] perl mysend.pl ApMon setup: wget tar -xzvf ApMon_perl.tar.gz cd ApMon_perl [create your script, say mysend.pl] perl mysend.pl

Simple monitoring script Service Repository LUS ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions cat mysend.pl use ApMon; my $apm = new ApMon({"glasgow.jlab.org:8884" => {"sys_monitoring" => 0, "general_info" => 0}}); while (1) {# loop forever # get values from = getmypar(“pspec_logic_ai_0”); $apm->sendParameters(”Detector", sleep (20); } cat mysend.pl use ApMon; my $apm = new ApMon({"glasgow.jlab.org:8884" => {"sys_monitoring" => 0, "general_info" => 0}}); while (1) {# loop forever # get values from = getmypar(“pspec_logic_ai_0”); $apm->sendParameters(”Detector", sleep (20); }

Time history Service Repository LUS ApMon Web Server ApMon Actions based on aggregated information Actions based on aggregated information Actions based on local information Actions based on local information Quick actions Time history example: cat mor.properties page=hist Farms=JlabML Clusters=Detector Nodes=MOR Functions=pspec_logic_ai_0 ylabel=Tagger rate title=MOR annotation.groups=2 Time history example: cat mor.properties page=hist Farms=JlabML Clusters=Detector Nodes=MOR Functions=pspec_logic_ai_0 ylabel=Tagger rate title=MOR annotation.groups=2

Web interface

Java GUI

Application control Key Keystore  ML Clients  TCP based subscribe mechanism serialized, compressed objects with optional encryption  ML Proxies  Application commands are encrypted  ML Services  Standard and/or user’s sensors and/or application modules ML Service ApMon Your Application Your custom Java client GUI client ML Repository Your mon module Your custom view App MonC bash Your application Your app module LUS

Alert-based Actions MySQL daemon is automatically restarted when it runs out of memory Trigger: threshold on VSZ memory usage ALICE Production jobs queue is automatically kept full by the automatic resubmission Trigger: threshold on the number of aliprod waiting jobs Administrators are kept up-to-date on the services’ status Trigger: presence/absence of monitored information via instant messaging, RSS feeds, toolbar alerts etc.

Summary  MonALISA is a very promising tool for online experiment monitoring and interfacing with a variety of slow control subsystems; GlueX are seriously considering ML for this task  Easy to configure, understand and use  Experience from Grid monitoring and more  Support from the developers group for implementation of new modules/features  Online experiment monitoring tests of were recently carried on; demo repository is at

More examples / Extras

Integrated Pie Charts

History Plots, Annotations

AliEn Services Monitoring  AliEn services  Periodically checked  PID check + SOAP call  Simple functional tests  SE space usage  Efficiency

Job Network Traffic Monitoring  Based on the xrootd transfer from every job  Aggregated statistics for  Sites (incoming, outgoing, site to site, internal)  Storage Elements (incoming, outgoing)  Of  Read and written files  Transferred MB/s

Individual Job Tracking  Based on AliEn shell cmds.  top, ps, spy, jobinfo, masterjob  Using the GUI ML Client  Status, resource usage, per job

Head Node Monitoring  Machine parameters, real-time & history, load, memory & swap usage, processes, sockets

MonALISA in AliEn  The MonALISA framework is used as a primary monitoring tool for the ALICE Grid since 2004  Presently the system is used for monitoring of all (identified) services, jobs and network parameters necessary for the Grid operation and debugging  The number of concurrently monitored and stored parameters today is ~ in 75 ML Services  The add-on tools for automatic events notification allow for more efficient reaction to problems  The framework design and flexibility answers all requirements for a monitoring system  The accumulated information allows to construct and implement automated decision making algorithms, thus increasing further the efficiency of the Grid operations