ALICE Monitoring Costin.Grigoras@cern.ch.

Slides:



Advertisements
Similar presentations
ALICE G RID SERVICES IP V 6 READINESS
Advertisements

Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
ALICE DATA ACCESS MODEL Outline ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures.
ALICE data access WLCG data WG revival 4 October 2013.
1. There are different assistant software tools and methods that help in managing the network in different things such as: 1. Special management programs.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK Dan Protopopescu Glasgow, UK.
Publication and Protection of Site Sensitive Information in Grids Shreyas Cholia NERSC Division, Lawrence Berkeley Lab Open Source Grid.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
N EWS OF M ON ALISA SITE MONITORING
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
PHENIX and the data grid >400 collaborators Active on 3 continents + Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
Site operations Outline Central services VoBox services Monitoring Storage and networking 4/8/20142ALICE-USA Review - Site Operations.
PPDG update l We want to join PPDG l They want PHENIX to join NSF also wants this l Issue is to identify our goals/projects Ingredients: What we need/want.
Monitoring with MonALISA Costin Grigoras. What is MonALISA ?  Caltech project started in 2002
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
1 R. Voicu 1, I. Legrand 1, H. Newman 1 2 C.Grigoras 1 California Institute of Technology 2 CERN CHEP 2010 Taipei, October 21 st, 2010 End to End Storage.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
PART1 Data collection methodology and NM paradigms 1.
Storage discovery in AliEn
HPDC Grid Monitoring Workshop June 25, 2007 Grid monitoring from the VO/user perspectives Shava Smallen.
How to integrate portals with EGI accounting system R.Graciani EGI TF 2012.
Federating Data in the ALICE Experiment
Daniele Bonacorsi Andrea Sciabà
Data Formats and Impact on Federated Access
Monitoring Windows Server 2012
Monitoring Evolution and IPv6
Monitoring Storage Systems for Oracle Enterprise Manager 12c
WLCG Network Discussion
WP18, High-speed data recording Krzysztof Wrona, European XFEL
ALICE internal and external network
Overview of the Belle II computing
POW MND section.
Chapter 2: System Structures
Update on Plan for KISTI-GSDC
Offline shifter training tutorial
New monitoring applications in the dashboard
A Messaging Infrastructure for WLCG
Job workflow Pre production operations:
Artem Trunov and EKP team EPK – Uni Karlsruhe
Storage elements discovery
Simulation use cases for T2 in ALICE
TYPES OF SERVER. TYPES OF SERVER What is a server.
Offline shifter training tutorial
Monitoring Of XRootD Federation
Monitoring Storage Systems for Oracle Enterprise Manager 12c
A portal interface to myGrid workflow technology
Publishing ALICE data & CVMFS infrastructure monitoring
Get to know SysKit Monitor
Monitoring of the infrastructure from the VO perspective
Initial job submission and monitoring efforts with JClarens
Chapter 8 I/O.
Presentation transcript:

ALICE Monitoring Costin.Grigoras@cern.ch

Summary Monitoring infrastructure Information sources Collecting and storing Presentation Feedback to production system, dashboard and SAM 05.07.2013 ALICE Monitoring

Monitoring infrastructure All components instrumented with ApMon Distributed MonALISA collectors, one per site in each VoBox also doing data aggregation Central collector for long term data archival and presentation and specialized collectors for sites and other interested parties 05.07.2013 ALICE Monitoring

Information sources Jobs: ~real time monitoring of job resource usage and of host health (CPU and Wall time and specs, memory, traffic, disk IO ...) Services: VoBoxes and Central Services in detail AliEn services, box health, inter-site connectivity tests also controlling the DNS balancing 05.07.2013 ALICE Monitoring

Information sources (2) Storage: all xrootd data nodes, all xrdcp/xrd3cp transfers, health, occupancy Networking: tracepath and available bw per stream between all sites, kernel buffer sizes, IPv6 readiness Other data: production accounting, I/O eff. of sites vs storages, SE functional status, experiment data taking and electronic logbook ... 05.07.2013 ALICE Monitoring

Collecting and storing Centrally collecting as much as possible aggregated data only (per site, user, masterjob, production etc) details are quickly forgotten and old data is further aggregated 7 years of data in some 370GB Same database for monitoring data, accounting, production management, experiment data taking info, detector QA info and many more allowing for quick production assessment and control plus all kinds of correlations we have or haven't imagined yet 05.07.2013 ALICE Monitoring

Presentation A single web interface for all operations with separate sections for the interested parties: site accounting and alerts for sites real time view of users' jobs, following their progress, memory usage, with access to logs, Grid catalogue browser, file editor etc managers set up meta production requests and follow their automatic progress users can also join organized analysis trains by a few clicks 200K pages per day, 0.1s avg response time 05.07.2013 ALICE Monitoring

Presentation (2) 05.07.2013 ALICE Monitoring

Feedback to production system Monitoring data is most useful when automatic actions can be implemented based on it storage replica selection based on SE distance and status (both read and write) automatic production management (job submission and resubmission, following complex meta processing chains) remote services control, deploying Grid-wide software upgrades central services load balancing alerts on any event with open subscription, annotating downtime periods on the time series display 05.07.2013 ALICE Monitoring

Feedback to/from dashboard and SAM Some part of the monitoring information has high relevance for 'general' monitoring case in point - SAM The reverse is also the case - instead of gathering information ourselves, external sources can provide meaningful data case in point - job information from BDII Other cases can be identified and consolidated for more harmonious picture our hope is that this TF will be able to do just that 05.07.2013 ALICE Monitoring

Summary ALICE has a vertically integrated monitoring system raw and aggregated data is open for anybody to consume dedicated exports are in place for filling common monitoring databases Aiming to a tighter integration with the common tools in order to offer a clearer picture to the users 05.07.2013 ALICE Monitoring