A Messaging Infrastructure for WLCG

Slides:



Advertisements
Similar presentations
Database Architectures and the Web
Advertisements

CERN IT Department CH-1211 Genève 23 Switzerland t Messaging System for the Grid as a core component of the monitoring infrastructure for.
Condor Project Computer Sciences Department University of Wisconsin-Madison Asynchronous Notification in Condor By Vidhya Murali.
Messaging Technologies Group: Yuzhou Xia Yi Tan Jianxiao Zhai.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Client Server Technologies Middleware Technologies Ganesh Panchanathan Alex Verstak.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 4 Communication.
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
CERN IT Department CH-1211 Genève 23 Switzerland t MSG status update Messaging System for the Grid First experiences
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Overlook of Messaging.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
James Casey, CERN, IT-GT-TOM 1 st ROC LA Workshop, 6 th October 2010 Grid Infrastructure Monitoring.
CERN IT Department CH-1211 Genève 23 Switzerland t Brief introduction to Messaging Systems Daniel Rodrigues.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
Agile SOA Agile EAI How do we achieve agility in Enterprise Integration?
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
CERN IT Department CH-1211 Genève 23 Switzerland t MSG Status update Daniel Rodrigues.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Julia Andreeva on behalf of the MND section MND review.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Import XRootD monitoring data from MonALISA Sergey Belov, JINR, Dubna DNG section meeting,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
APEL Architecture Alison Packer. Overview Grid jobs accounting tool APEL Client software - installed in sites (CEs, gLite- APEL node) APEL Server accepts.
AMSA TO 4 Advanced Technology for Sensor Clouds 09 May 2012 Anabas Inc. Indiana University.
CH-1211 Genève 23 Job efficiencies at CERN Review of job efficiencies at CERN status report James Casey, Daniel Rodrigues, Ulrich Schwickerath.
Monitoring the efficiency of user jobs
MQ Series Cross Platform Dominant Messaging sw – 70% of market
Outline Introduction and motivation, The architecture of Tycho,
Daniele Bonacorsi Andrea Sciabà
WLCG IPv6 deployment strategy
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
Chapter 9: The Client/Server Database Environment
John Gordon STFC OMB 26 July 2011
The Client/Server Database Environment
ALICE Monitoring
Open Source distributed document DB for an enterprise
POW MND section.
Messaging at CERN Lionel Cons – CERN IT/CM 18 Jan 2017
Evolution of SAM in an enhanced model for monitoring the WLCG grid
Introduction to Grid Technology
The Client/Server Database Environment
Chapter 9: The Client/Server Database Environment
CHAPTER 3 Architectures for Distributed Systems
Database Architectures and the Web
Monitoring Of XRootD Federation
Cristina del Cano Novales STFC - RAL
Monitoring in EGEE Automatisierung & Regionalisierung im Hinblick auf EGI Torsten Antoni (KIT), James Casey (CERN), Sabine Reißer (KIT)
#01 Client/Server Computing
Monitoring of the infrastructure from the VO perspective
Inventory of Distributed Computing Concepts
Messaging Services and Client Software
Message Queuing.
MQ Series Cross Platform Dominant Messaging sw – 70% of market
J2EE Lecture 13: JMS and WebSocket
User Accounting Integration Spreading the Net.
#01 Client/Server Computing
Presentation transcript:

A Messaging Infrastructure for WLCG James Casey, Lionel Cons, Wojciech Lapka, Massimo Paladin, Konstantin Skaburskas Grid Technology Group Information Technology Department, CERN CHEP 2010, Taipei

Message-Oriented Middleware A Messaging Infrastructure for WLCG - 2

Message-Oriented Middleware A Messaging Infrastructure for WLCG - 3

WLCG infrastructure is complex... A Messaging Infrastructure for WLCG - 4

Message-Oriented Middleware What is a messaging system? Standardized, asynchronous and scalable communication between distributed entities Broker-mediated communication Messaging is for applications what IM is for people Aims of messaging systems Simplify the WLCG infrastructure and make it more effective Allow sharing of the information between distributed components A Messaging Infrastructure for WLCG - 5

Point-to-Point Messaging Queue Receiver 1 Sender Receiver 2 Each message received by only one receiver Guaranteed message delivery A Messaging Infrastructure for WLCG - 6

Publish/Subscribe Messaging Topic Subscriber 1 Publisher Subscriber 2 Each message received by all available subscribers Message delivery not guaranteed (except durable subscribers) A Messaging Infrastructure for WLCG - 7

Messaging protocols Java Message Service (JMS) “is a messaging standard that allows application components to create, send, receive, and read messages” STOMP (Streaming Text Orientated Messaging Protocol) Simple text based protocol designed for messaging systems Supported by most messaging systems Clients available in all major programming languages OpenWire JMS compliant Native Apache ActiveMQ protocol Clients available in C, C++, C#, Java A Messaging Infrastructure for WLCG - 8

Apache ActiveMQ ActiveMQ is a top-level Apache Project Mature open-source product Written in Java (implements JMS semantics) Excellent cross-language interoperation Enterprise features (Persistence, Failover, Clustering, ...) Many large reference customers – Vodafone, GE, FAA, (CERN) CERN has industry support contract with Progress Software Good performance characteristics 1K – 20K messages per second depending on features used A Messaging Infrastructure for WLCG - 9

Network of Brokers A Messaging Infrastructure for WLCG - 10

Broker monitoring We use Nagios to monitor Broker monitoring Brokers Functionality testing Broker monitoring OS: Filesystem full, processes running, socket counts, open file counts ActiveMQ statistcs: Store usage, JVM stats, queues with stalled messages Nagios sends and receives test messages A Messaging Infrastructure for WLCG - 11

Broker monitoring A Messaging Infrastructure for WLCG - 12

Broker monitoring A Messaging Infrastructure for WLCG - 13

Live broker monitoring A Messaging Infrastructure for WLCG - 14

Common messaging use cases Gather Wide set of nodes send back data which is collected in a DB for further processing Broadcast Sensors publishing information that can be used by external alarm systems Task Queues Tasks sent to the queue and executed on distributed worker nodes Remote Procedure Call (RPC) Distributed query A Messaging Infrastructure for WLCG - 15

CERN Control Room LHC monitoring High availability required (no messages = beam dumped) Achieved by simple configuration 41M messages per day 370 GB incoming data & 2,6 TB outgoing data 7-10 k consumers, 7 k topics A Messaging Infrastructure for WLCG - 16

Service Availability Monitoring (SAM) Monitoring of reliability and availability of the distributed computing infrastructure 27 national level Nagios servers Will grow to ~40 in next 3 months 4 experiment Nagios servers (Alice, Atlas, CMS, LHCb) Data received also from OSG 700,000 test results/day long-lived and short-lived connections A Messaging Infrastructure for WLCG - 17

Service Availability Monitoring (SAM) Data Warehouse GGUS Alarm System ActiveMQ Nagios Nagios RSV Site Site Site Site Site Site Site Site Site A Messaging Infrastructure for WLCG - 18

SAM – CE tests ActiveMQ Nagios WMS CE CE CE Worker Nodes A Messaging Infrastructure for WLCG - 19

Site Wide Area Testing (SWAT) Distributed Tests Consumer Database Distributed Tests Distributed Tests Summary Exporter SAM ActiveMQ Web frontend 2.6M messages per day short-lived connections A Messaging Infrastructure for WLCG - 20

APEL – replacement of R-GMA transport mechanism APEL nodes Consumer Database APEL tests Nagios ActiveMQ 6k messages per day Encrypted messages x509 authentication and authorisation A Messaging Infrastructure for WLCG - 21

BDII update prototype „Designing the Next Generation Grid Information Systems” by Laurence Field (Today: 17:30) A Messaging Infrastructure for WLCG - 22

ATLAS DDM DQ2 Tracing data Data send on every DQ2 command – files transferred, performance data, … 1-1.5M messages/day of ~2-3KB Clients on WN, VOBOX, … Short-lived connections Low reliability requirements – can accept some loss A Messaging Infrastructure for WLCG - 23

Catalogue sync demonstrator LFC ActiveMQ SE SE SE SE LFC -> SE: propagation of changes in the file permissions SE -> LFC: propagation of information about lost files A Messaging Infrastructure for WLCG - 24

CERN batch service Use case: Analyze usage of resources (CPU, IO, Network, ...) by the experiment jobs on 35000 CPU cores Solution: Measure live (per minute) per-job usage statistics, and collect them via messaging for subsequent analysis and aggregation. Expected rate: ~600 messages per second A Messaging Infrastructure for WLCG - 25

Links http://activemq.apache.org/ https://tomtools.cern.ch/confluence/display/MIG http://activemq.apache.org/ „Enterprise Integration Patterns”, http://www.eaipatterns.com/ Contact us: tom-developers@cern.ch A Messaging Infrastructure for WLCG - 26

Messaging is a key technology for WLCG Summary Messaging is a key technology for WLCG A Messaging Infrastructure for WLCG - 27