Messaging at CERN Lionel Cons – CERN IT/CM 18 Jan 2017

Slides:



Advertisements
Similar presentations
Felix Ehm CERN BE-CO. Content  Introduction  JMS in the Controls System  Deployment and Operation  Conclusion.
Advertisements

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Big Data Challenges in Application Performance Management Tilmann Rabl Hans-Arno Jacobsen Serge Mankovskii XLDB.
SaaS, PaaS & TaaS By: Raza Usmani
CERN IT Department CH-1211 Genève 23 Switzerland t Messaging System for the Grid as a core component of the monitoring infrastructure for.
An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Performance and Exception Monitoring Project Tim Smith CERN/IT.
Netflix Data Pipeline with Kafka Allen Wang & Steven Wu.
MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG MADES - A Multi-Layered, Adaptive, Distributed Event Store Tilmann Rabl Mohammad Sadoghi Kaiwen Zhang Hans-Arno.
JMS Compliance in NaradaBrokering Shrideep Pallickara, Geoffrey Fox Community Grid Computing Laboratory Indiana University.
IoTCloud Platform – Connecting Sensors to Cloud Services Supun Kamburugamuve, Geoffrey C. Fox {skamburu, School of Informatics and Computing.
Module 9: Preparing to Administer a Server. Overview Introduction to Administering a Server Configuring Remote Desktop to Administer a Server Managing.
CERN IT Department CH-1211 Genève 23 Switzerland t MSG status update Messaging System for the Grid First experiences
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Overlook of Messaging.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
IBM Bluemix Ecosystem Development Hands on Workshop Section 1 - Overview.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Import XRootD monitoring data from MonALISA Sergey Belov, JINR, Dubna DNG section meeting,
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
CERN News on Grid and openlab François Fluckiger, Manager, CERN openlab for DataGrid Applications.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Cluman: Advanced Cluster Management for Large-scale Infrastructures.
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
Red Hat Enterprise Linux Presenter name Title, Red Hat Date.
COMPUTER NETWORKS Quizzes 5% First practical exam 5% Final practical exam 10% LANGUAGE.
Component 8/Unit 1bHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 1b Elements of a Typical.
Apache Kafka A distributed publish-subscribe messaging system
Condor Week 09Condor WAN scalability improvements1 Condor Week 2009 Condor WAN scalability improvements A needed evolution to support the CMS compute model.
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
CASTOR: possible evolution into the LHC era
Accessing the VI-SEEM infrastructure
Security on OpenStack 11/7/2013
Module 9: Preparing to Administer a Server
By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani
Replicated LevelDB on JBoss Fuse
CO HW Monitoring Architecture
MQTT – Accessing MQ from anywhere
NGIs – Turkish Case : TR-Grid
An Introduction to Cloud Computing
Next Generation of Post Mortem Event Storage and Analysis
BDII Performance Tests
Readiness of ATLAS Computing - A personal view
Development and deployment trends
A Messaging Infrastructure for WLCG
HEPiX Fall 2017 CERN project Follow-up
Ákos Frohner EGEE'08 September 2008
Cristina del Cano Novales STFC - RAL
John Gordon (STFC) APEL PT Leader
University of Technology
Grid Canada Testbed using HEP applications
Messaging Services and Client Software
Evolution of messaging systems and event driven architecture
New Types of Accounting Beyond CPU
DGAS Today and tomorrow
Module 9: Preparing to Administer a Server
Qualifying Exam Jaliya Ekanayake.
Exploit the massive Volunteer Computing resource for HEP computation
Presentation transcript:

Messaging at CERN Lionel Cons – CERN IT/CM 18 Jan 2017

In the Beginning… messaging started at CERN ~10 years ago goal was to simplify grid middleware initiator was the Operations Automation Team (OAT) of the Enabling Grids for E-Science in Europe (EGEE) project driving force has been the European Middleware Initiative (EMI) project ☞ “Using ActiveMQ at CERN for the Large Hadron Collider” (FUSE day, 2010) messaging proved to be useful so its use grew… 18 Jan 2017 Messaging at CERN

Main Use Case messaging is used to decouple information producers from consumers using different software stacks managed by different teams only sharing the “schema” of the JSON payload published to topic, consumed from virtual queues much more WAN than LAN code change could take months to get deployed mostly STOMP with very few OpenWire or AMQP frequent use of X.509 authentication (grid) 18 Jan 2017 Messaging at CERN

STOMP CERN pushed for standardization (STOMP 1.2) advantages drawbacks supported by most brokers decent client libraries available for all languages lightweight (e.g. can publish from PL/SQL) drawbacks none for us? 18 Jan 2017 Messaging at CERN

Other Use Case messaging is used inside an application control on which messaging solution is used (and how!) is very limited applications used at CERN: Celery with RabbitMQ MCollective with ActiveMQ OpenStack with RabbitMQ … 18 Jan 2017 Messaging at CERN

MCollective Use Case MCollective needs a network of brokers initial requirements are challenging: 2 data centers 1000 km apart 30k concurrent connections 300k subscriptions worked with Puppet Labs to reduce the number of subscriptions: now only 150k works fine (except an abnormally large number of connection timeout warnings) must scale with the growth of nodes we manage 18 Jan 2017 Messaging at CERN

IT Managed Messaging Services 17 different clusters (test and production) 44 brokers 267 applications average message rates: ~1k Hz in and ~5kHz out ~6k destinations (topics and queues) ~25k concurrent connections ~120k subscriptions all run Red Hat A-MQ 6 18 Jan 2017 Messaging at CERN

Messaging Monitoring all log files analyzed 103 different metrics collected messaging metrics collected through Jolokia 1350 checks every minute e.g. per-client messages received per second too low/high expert system named Metis using Esper time aggregations like min/max/average other aggregations like “all brokers in a cluster” hysteresis patterns … 18 Jan 2017 Messaging at CERN

Accelerators Controls (1/2) transport data from middle tier servers to GUIs and storage system but also log messages, infrastructure monitoring & audit information broad usage pattern in message size & frequency criticality service: No JMS, No Beam ☞ “Large Scale Messaging with ActiveMQ for Particle Accelerators at CERN” (CamelOne 2012) courtesy of Felix Ehm (BE/CO) 18 Jan 2017 Messaging at CERN

Accelerators Controls (2/2) 25 brokers organized in dedicated services: 5 HA clusters and 15 single instances running on physical machines up to: >270GB per day, 8k messages per second currently: Apache ActiveMQ 5.12.2 middleware team reviewing the usage of JMS: the idea is to simplify the environment and extend the use of ØMQ which is already being used for Remote Device Access (RDA) courtesy of Felix Ehm (BE/CO) 18 Jan 2017 Messaging at CERN

Pushing the Limits creative network of brokers topology (MCollective) peaks of several thousands of messages per second (from a single client) some large messages, up to 100MB sometimes huge backlogs (several days) while some consumers are down tens of new connections per second X.509 authentication (JAAS) with ~70k entries 18 Jan 2017 Messaging at CERN

Possible Evolutions (1/2) Red Hat A-MQ 6.x: end of full support: Jan 2018 Apache ActiveMQ 5.x: will be supported as long as it is widely used Red Hat A-MQ 7.x: only one alpha and one beta released so far beta1 based on Artemis 1.3.0 Apache ActiveMQ 6.x (aka Artemis): latest version is 1.5.1 major changes coming with 2.0.0 ~50 unresolved bugs in Jira important for our use cases 18 Jan 2017 Messaging at CERN

Possible Evolutions (2/2) ØMQ: speed and low latency already used in accelerators controls CERN is even quoted in ØMQ’s “Learn the Basics” Kafka: high volumes and scalability big overlap with traditional messaging seen more as complement rather than replacement RabbitMQ: another widely used messaging broker 18 Jan 2017 Messaging at CERN

Summary messaging started at CERN ~10 years ago one use was the simplification of grid middleware messaging was also used in accelerators controls it also spread to several very different areas messaging still is widely used at CERN some use cases are quite challenging some overlapping technologies are appearing messaging at CERN will evolve to adapt to changes both in requirements and solutions 18 Jan 2017 Messaging at CERN