Experience in CMS with Analytic Tools for Data Flow and HLT Monitoring

Slides:

Advertisements

Similar presentations

The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.

Advertisements

Amaze business, make your devs happy

17th February, 2000 by Maciej Korzeniowski (CERN-IT-IA-MI) 1 Oracle Discoverer Product Presentation  This is an ad hoc query and analysis tool for.

Evaluation of NoSQL databases for DIRAC monitoring and beyond

LHCbPR V2 Sasha Mazurov, Amine Ben Hammou, Ben Couturier 5th LHCb Computing Workshop

Graph databases …the other end of the NoSQL spectrum. Material taken from NoSQL Distilled and Seven Databases in Seven Weeks.

SaaS, PaaS & TaaS By: Raza Usmani

Enterprise Search. Search Architecture Configuring Crawl Processes Advanced Crawl Administration Configuring Query Processes Implementing People Search.

Implementing search with free software An introduction to Solr By Mick England.

CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.

Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć – sematext.com.

Elasticsearch in Dashboard Data Management Applications David Tuckett IT/SDC 30 August 2013 (Appendix 11 November 2013)

Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.

HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.

1 Alice DAQ Configuration DB

Introduction to the Adapter Server Rob Mace June, 2008.

ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.

ALICE, ATLAS, CMS & LHCb joint workshop on

1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.

Database Concepts Track 3: Managing Information using Database.

CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.

Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.

CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities IT Monitoring CERN IT-CF HEPiX Fall 2013.

Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.

Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CF Monitoring: Lemon, LAS, SLS I.Fedorko(IT/CF) IT-Monitoring.

CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.

UNICOS LHCLoggingDB Josef Hofer EN/ICE/SCD. Agenda The LHC Logging Database Purpose of the LHCLogging component Basic concepts Advanced concepts Logging.

Quattor tutorial Introduction German Cancio, Rafael Garcia, Cal Loomis.

Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.

Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.

Orion Contextbroker PROF. DR. SERGIO TAKEO KOFUJI PROF. MS. FÁBIO H. CABRINI PSI – 5120 – TÓPICOS EM COMPUTAÇÃO EM NUVEM

Modern web tools and midas Ben Smith TRIUMF Midas workshop – July 2015 Ben Smith - Modern web tools and Midas 1 15/07/15.

The purpose of a CPU is to process data Custom written software is created for a user to meet exact purpose Off the shelf software is developed by a software.

Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.

Jean-Philippe Baud, IT-GD, CERN November 2007

and Big Data Storage Systems

Jacek Otwinowski (Data Preparation Group)

WLCG Workshop 2017 [Manchester] Operations Session Summary

By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani

!CHAOS: a Cloud of Controls

Database Replication and Monitoring

Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng

API (Application Program Interface)

Online remote monitoring facilities for the ATLAS experiment

Section 6 Object Storage Gateway (RADOS-GW)

CMS High Level Trigger Configuration Management

Searching and Indexing

MongoDB Er. Shiva K. Shrestha ME Computer, NCIT

The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.

IBM Data Server Gateway for OData

Representational State Transfer

NOSQL databases and Big Data Storage Systems

Introduction to Cloud Computing

#01 Client/Server Computing

Big Data - in Performance Engineering

Ashutosh Rana Rahul Nori 7/17/2018

Building a Database on S3

Database Environment Transparencies

Overview of big data tools

Cloud Web Filtering Platform

another noSql customization for the HDB++ archiving system

Project Goals Collect and permanently store the data flowing around ONAP system into several Big Data storages, each in different category. Also serve.

Introduction to Elasticsearch with basics of Lucene May 2014 Meetup

Rafał Kuć – Sematext sematext.com

Academic & More Group 4 谢知晖王逸雄郭嘉宋程若愚.

NoSQL Overview + Elasticsearch Quick Dive

Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20

CloudAnt: Database as a Service (DBaaS)

#01 Client/Server Computing

Database management systems

Presentation transcript:

Experience in CMS with Analytic Tools for Data Flow and HLT Monitoring Dainius Šimelevičius – Vilnius University (LT) on behalf of CMS DAQ Group ALICE, ATLAS, CMS & LHCb Second Joint Workshop on DAQ@LHC 12-14 April 2016, Château de Bossey, Switzerland

Main Authors Emilio Meschi Srećko Morović Luciano Orsini Dainius Šimelevičius

Introduction

Data Flow and HLT Monitoring Monitoring characteristics DAQ and HLT are independent from monitoring Monitoring runs in parallel with the DAQ and HLT processes Shares resources with DAQ and HTL processes Monitoring functions Collection of data from multiple sources Transformation/Reduction (e.g., sum, average) Presentation (e.g., raw data, HTML or JSON formatted data) Storage Users Shifters DAQ on-call experts Control System (e.g., perform automatic actions) Diagnostic System (e.g., expert systems)

The Concept This is a follow-up work based on the presentation made in DAQ@LHC 2013 by Luciano Orsini “Dataflow Monitoring: Future Perspectives” http://indico.cern.ch/event/217480/session/1/contribution/35/material/s lides/0.pdf The presentation highlighted the approach chosen to support CMS DAQ monitoring based on SaaS (Software as a Service) Current architecture is service oriented (XaaS) Indicated the cloud architecture well-suited for running the monitoring services Subsequent natural step would be to use off the shelf software as opposed to custom development Reducing maintenance and support effort

Where We’re Going Custom components Custom protocols Off The Shelf component Standard protocols

Proposed Approach Monitoring as an IT service Cloud Administrator (sysadmin) – manage resources, provide the cloud Cloud Facilitator (monitoring expert) – developers, sysadmins Requires certain expertise Consumers/Users (sub-detectors and/or central DAQ) – using the services in the cloud Cloud service could be private, IT or commercial Currently P5 resources Tested Openstack in IT Fully decoupled cloud services and DAQ resources No plugins or specialized programming but configuration only What do we have today? Why replace this with Escaped? Less to maintain – less manpower. -> current DAQ system requires heavy maintenance. Due to lack of man power and time, the system can no longer be fully supported. Escaped provides a new way of monitoring that while in some ways similar to the current system

Elasticsearch Elasticsearch chosen as a good candidate NoSQL database Rank 11 in http://db-engines.com/en/ranking NoSQL database Search server based on Apache Lucene RESTful API with JSON over HTTP Scalable architecture Document search throughout the cluster with a single query

Elasticsearch

Elasticsearch Terminology Index Type 1 JSON document: { “name“: „ferol2“, “slotNumber“: 2, “rate“: 1.2, “busy“: true, “values“: [4.5, 2.3, 12.1] } Document 1 Document 2 Type 2 Document 3 Flashlists are structures used for monitoring. They translate nicely into a JSON document Document 4 Fields

RESTful API with JSON over HTTP CRUD (create-retrieve-update-delete) part of API allows using Elasticsearch as a NoSQL database curl -XPUT 'http://[hostname]:9200/flashlists/hostInfo/1’ -d '{"context": "http://bu-c2f16-23-01.cms:9999", ”cpuUsage": 9.68}' curl -XGET 'http://[hostname]:9200/flashlists/hostInfo/1' curl -XPOST 'http://[hostname]:9200/flashlists/hostInfo/1/_update’ -d '{"doc": {” cpuUsage": 13.73}}' curl -XDELETE 'http://[hostname]:9200/flashlists/hostInfo/1' Management of indices curl -XPUT 'http://[hostname]:9200/shelflist' Data searching curl -XGET 'http://[hostname]:9200/flashlists/hostInfo/_search' -d '{"query": {"range": {"cpuUsage": {"gt": 10}}}}'

Elasticsearch Scalability Elasticsearch cluster Elasticsearch node 1 Elasticsearch node 2 Index 1 Index 1 Shard 1 (primary) Shard 1 (replica) Shard 2 (primary) Shard 2 (replica) Index 2 Index 2 Shard 1 (primary) Shard 1 (replica) Shard 2 (primary) Shard 2 (replica)

Limitations Does not allow fields of different types for the same name within the same index Elasticsearch only provides a limited number of data types Most integer type map to a single type rather than specialized

Elasticsearch for DAQ and HLT Monitoring

Scenario for DAQ and HLT Monitoring Index creation Map creation Document creation Search for the document which was created last Documents are removed automatically by Elasticsearch according to TTL (Time To Live) values preset for the document types We don’t use read, update or delete operations of the API. Search instead of read. Data are not updated, only added. ttl (time to live) to replace delete

Elasticsearch Usage Patterns DAQ monitoring HLT monitoring Data location In memory On disk Data injector C++ program Python script Source nodes ~200 ~1000 Index operation rate ~3.4kHz ~5kHz Data persistency 4 seconds to 2 minutes; some data stored permanently Data stored permanently Indices 2 One index per run + 3 Types 24 8 + 15 Fields 678 (16000 including sub-detectors) 87 + 180

DAQ Monitoring Option 1 Cluster …

DAQ Monitoring Option 2 Cluster Round-robin DNS …

HLT monitoring … Cluster 1 Cluster 2 Part of the data is periodically copied to Cluster 2 for permanent storage Round-robin DNS …

Using IT with Openstack CERN IT .CMS Monitoring Sources Load Balancer Clients Cloud Web Service …

DAQMon

F3Mon GUI

Summary Elasticsearch is highly scalable and robust system Elasticsearch nicely fits requirements of DAQ and HLT monitoring Working to bring Elasticsearch based DAQ monitoring to production Future perspective is to use Elasticsearch for other purposes besides monitoring: exception storing, log storing, etc. Include all sub-detector monitoring Persistency of all data (e.g. 24h, 48h, 1 week) for play-back

Thank you for your attention