Next Generation of Post Mortem Event Storage and Analysis

Slides:



Advertisements
Similar presentations
Chapter 19: Network Management Business Data Communications, 5e.
Advertisements

Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Evaluation of NoSQL databases for DIRAC monitoring and beyond
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Interpreting the data: Parallel analysis with Sawzall LIN Wenbin 25 Mar 2014.
An Example Use Case Scenario
Improving Network I/O Virtualization for Cloud Computing.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Engr. M. Fahad Khan Lecturer Software Engineering Department University Of Engineering & Technology Taxila.
IMDGs An essential part of your architecture. About me
TE-MPE-EP, RD, 06-Dec QPS Data Transmission after LS1 R. Denz, TE-MPE-EP TIMBER PM WinCC OA Tsunami warning:
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
VMware vSphere Configuration and Management v6
== Enovatio Delivers a Scalable Project Management Solution Minus Large Upfront Infrastructure Costs, Thanks to the Powerful Microsoft Azure Platform MICROSOFT.
The Million Point PI System – PI Server 3.4 The Million Point PI System PI Server 3.4 Jon Peterson Rulik Perla Denis Vacher.
Andrea Valassi (CERN IT-DB)CHEP 2004 Poster Session (Thursday, 30 September 2004) 1 HARP DATA AND SOFTWARE MIGRATION FROM TO ORACLE Authors: A.Valassi,
Accelerator Data Analysis Framework: Infrastructure Improvements for Increased Analysis Performance Serhiy Boychenko, TE-MPE, CERN, 26/11/2015 Acknowledgements:
European Organization For Nuclear Research CERN Accelerator Logging Service Overview Focus on Data Extraction for Offline Analysis Ronny Billen & Chris.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
MPE Workshop 14/12/2010 Post Mortem Project Status and Plans Arkadiusz Gorzawski (on behalf of the PMA team)
Gorilla: A Fast, Scalable, In-Memory Time Series Database
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
 TE-MPE-PE Clean code workshop – R.Heil, M.Koza, K.Krol Introduction to the MPE software process Raphaela Heil TE-MPE-PE Clean code workshop - 9 th July.
Software architectures and tools for highly distributed applications Voldemaras Žitkus.
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
Integration of Oracle and Hadoop: hybrid databases affordable at scale
Managing State Chapter 13.
Organizations Are Embracing New Opportunities
RF acceleration and transverse damper systems
Integration of Oracle and Hadoop: hybrid databases affordable at scale
Central Satellite Data Repository Supporting Research and Development
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Hadoop Aakash Kag What Why How 1.
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
The Five Secrets of Project Scheduling A PMO Approach
Introduction to Distributed Platforms
Efficient data maintenance in GlusterFS using databases
Distributed Network Traffic Feature Extraction for a Real-time IDS
Cultivating Software Quality In Cloud Via Load Testing Tools
Optimizing the Migration of Virtual Computers
Cisco Data Virtualization
What is Fibre Channel? What is Fibre Channel? Introduction
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
STORM & GPFS on Tier-2 Milan
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
the CERN Electrical network protection system
Conditions Data access using FroNTier Squid cache Server
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Computing Infrastructure for DAQ, DM and SC
Large Scale Test of a storage solution based on an Industry Standard
Amazon AWS Solution Architect Associate Exam Dumps For Full Exam Info Visit This Link:
Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.
ExaO: Software Defined Data Distribution for Exascale Sciences
CLUSTER COMPUTING.
Distributed File Systems
Cloud computing mechanisms
AWS Cloud Computing Masaki.
Plans for the renovation of the Post Mortem infrastructure
PLANNING A SECURE BASELINE INSTALLATION
Harrison Howell CSCE 824 Dr. Farkas
ONAP Architecture Principle Review
Presentation transcript:

Next Generation of Post Mortem Event Storage and Analysis Serhiy Boychenko, on behalf of TE-MPE-MS Databases Future Workshop, CERN 29/05/2017

Databases Future Workshop, CERN Outlines What is Post Mortem? Why does it need improvements? How the storage can be improved? 29/05/2017 Databases Future Workshop, CERN

Databases Future Workshop, CERN Outlines What is Post Mortem? Why does it need improvements? How the storage can be improved? 29/05/2017 Databases Future Workshop, CERN

Post Mortem for the Accelerators The Post Mortem system allows the storage and analysis of transient data recordings from accelerator equipment systems Post Mortem data is complementary to Logging data Data buffers of shorter length (few seconds to minutes) are acquired with high frequency (KHz to GHz) around relevant events such as beam dumps, injection/extraction or powering events Post Mortem data is vital for the analysis and understanding of the performance and protection systems of the machines Correct transmission and storage of Post Mortem data has to be guaranteed with high reliability 29/05/2017 Databases Future Workshop, CERN

Post Mortem Architecture Courtesy: Matthias Poeschl, Jean-Christophe Garnier 29/05/2017 Databases Future Workshop, CERN

Databases Future Workshop, CERN Post Mortem Use Cases Event Name Amount of Files Dump Size Frequency Global thousands of PMD files hundreds of MB several times per day Powering hundreds of PMD files dozens of MB XPOC couple of MB several times per hour IQC ~30 PMD files SPSQC ~20 PMD files ~250KB few seconds intervals (every SPS cycle) Variable file size: 1KB-12.7MB (compressed data) Variable load: depends on the operation mode of the accelerators 29/05/2017 Databases Future Workshop, CERN

Databases Future Workshop, CERN Post Mortem Users Continuously used during operation cycles for validation of accelerator safety Essential source of the data for analysis orchestrated through the AccTesting framework (mostly during hardware and beam commissioning phases) Many users access PM data for ad-hoc queries through LabView or other data analysis tools 29/05/2017 Databases Future Workshop, CERN

Databases Future Workshop, CERN Post Mortem in Numbers Operating since 2008 (originally proposed in 2002) Deployed on 3 nodes (2 storage nodes with RAID1+0, 1 analysis node with 24 GB of RAM for in-memory processing) Storage already contains 20+ millions files (raw data, event data and analysis results) Total storage size to data is in the order of ~10TB Frequent traffic bursts ~1.0MB/second (incoming) and ~12MB/second (outgoing) 29/05/2017 Databases Future Workshop, CERN

Databases Future Workshop, CERN Outlines What is Post Mortem? Why does it need improvements? How the storage can be improved? 29/05/2017 Databases Future Workshop, CERN

Databases Future Workshop, CERN Shortcomings Static load distribution Data consistency and integrity Limited Write performance in case of simultaneous events Direct data access to raw data storage via NFS Decentralization (CALS + PM integration) 29/05/2017 Databases Future Workshop, CERN

Databases Future Workshop, CERN Requirements Backward compatibility for all users Horizontal scalability High level of reliability for data ingestion Flexibility for integration of the new the new use cases/extensions to additional accelerators 29/05/2017 Databases Future Workshop, CERN

Databases Future Workshop, CERN Outlines What is Post Mortem? Why does it need improvements? How the storage can be improved? 29/05/2017 Databases Future Workshop, CERN

Databases Future Workshop, CERN The New Architecture Courtesy: Matthias Poeschl, Jean-Christophe Garnier 29/05/2017 Databases Future Workshop, CERN

Data Retrieval Layer - PM REST API Abstract storage implementation details from users (potential for common API with CALS) Intend usage of cache for increased performance for low latency use-cases (IQC, XPOC) and to unload data storage layer Enable access to the data using standard serialization formats Provide advanced data filtering capabilities Enhance the data retrieval layer with scaling and load-balancing features Detailed monitoring of the Post Mortem system usage Already up and running! (http://pm-api-pro.cern.ch/) 29/05/2017 Databases Future Workshop, CERN

Databases Future Workshop, CERN Data Collection Layer Support dynamic load distribution Enable horizontal scalability Increase service maintainability and availability Provide multiple and pluggable protocols for data collection Kafka architecture studied by CALS team is an option (research is ongoing) 29/05/2017 Databases Future Workshop, CERN

Databases Future Workshop, CERN Data Storage Layer Distributed storage solution High availability with advanced fault tolerance Consistency ensured by underlying implementation Flexibility to support different file formats High throughput and low latency Research is mostly finished. Developments are being planned. 29/05/2017 Databases Future Workshop, CERN

Databases Future Workshop, CERN Data Storage Layer Serialization format study Multiple serialization formats have been studied: JSON, BSON, Avro, … Multiple compression techniques evaluated: Deflate, Gzip2, Snappy, xz, … Avro was presenting the best results with Deflate Storage solution evaluation Multiple storage systems were compared using representative sets of PM data: CEPH, MongoDB, HDFS, GlusterFS GlusterFS was performing best for separated write/read workloads CEPH was performing best for mixed workloads, especially with small files 29/05/2017 Databases Future Workshop, CERN

NEXTGEN CALS Integration Low latency use cases (IQC, XPOC, SPSQC) still prevent the full integration with NEXTGEN CALS infrastructure Very small file size (in large quantities) might impact the query execution time significantly Remaining use cases, data collection, storage and retrieval API might be shared to provide the user the best possible data 29/05/2017 Databases Future Workshop, CERN

Thank you for attention! Questions are welcome! Besides the authors stated in INDICO, additional credit goes to former members of the team: Mateusz Koza, Roberto Orlandi, Matthias Poeschl.