Plans for the renovation of the Post Mortem infrastructure

Slides:

Advertisements

Similar presentations

The Big Data Ecosystem at LinkedIn

Advertisements

Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.

Introduction to Backend James Kahng. Install Node.js.

An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.

Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.

Elad Hayun Agenda What's New in Hyper-V 2012 Storage Improvements Networking Improvements VM Mobility Improvements.

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

Scale-out databases for CERN use cases Strata Hadoop World London 6 th of May,2015 Zbigniew Baranowski, CERN IT-DB.

How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.

Cloud MapReduce ： a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.

Software Engineer, #MongoDBDays.

 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 

HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.

Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.

1 Alice DAQ Configuration DB

Cassandra - A Decentralized Structured Storage System

© Hortonworks Inc HDFS: Hadoop Distributed FS Steve Loughran, ATLAS workshop, June 2013.

Introduction to Database Management Systems. Information Instructor: Csilla Farkas Office: Swearingen 3A43 Office Hours: M,T,W,Th,F 2:30 pm – 3:30 pm,

Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.

Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.

CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.

CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.

Stairway to the cloud or can we take the highway? Taivo Liik.

Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.

Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.

CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.

Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.

Accelerator Data Analysis Framework: Infrastructure Improvements for Increased Analysis Performance Serhiy Boychenko, TE-MPE, CERN, 26/11/2015 Acknowledgements:

Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.

Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.

BIG DATA/ Hadoop Interview Questions.

Data Analytics and Hadoop Service in IT-DB Visit of Cloudera - April 19 th, 2016 Luca Canali (CERN) for IT-DB.

Testing the Zambeel Aztera Chris Brew FermilabCD/CSS/SCS Caveat: This is very much a work in progress. The results presented are from jobs run in the last.

CCD-410 Cloudera Certified Developer for Apache Hadoop (CCDH) Cloudera.

Log-Structured Memory for DRAM-Based Storage Stephen Rumble and John Ousterhout Stanford University.

1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.

Virtualization of Infrastructure as a Service (IaaS): Redundancy Mechanism of the Controller Node in OpenStack Cloud Computing Platform BY Shahed murshed.

ADVANCED HOSTING Adrian Newby, CTO.

Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.

Integration of Oracle and Hadoop: hybrid databases affordable at scale

and Big Data Storage Systems

Abstract Title: Experience with running Owncloud on virtualized infrastructure (Openstack/Ceph) (continuation of talk at C3 2016) Owncloud is.

Integration of Oracle and Hadoop: hybrid databases affordable at scale

BigData - NoSQL Hadoop - Couchbase

Cassandra - A Decentralized Structured Storage System

Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.

Hadoop and Analytics at CERN IT

Section 6 Object Storage Gateway (RADOS-GW)

INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER

CS122B: Projects in Databases and Web Applications Winter 2017

Running virtualized Hadoop, does it make sense?

MongoDB Er. Shiva K. Shrestha ME Computer, NCIT

Next Generation of Post Mortem Event Storage and Analysis

Collecting heterogeneous data into a central repository

Operational & Analytical Database

Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.

Kubernetes Container Orchestration

The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.

Storage Systems for Managing Voluminous Data

Big Data - in Performance Engineering

Hadoop for SQL Server Pros

Btrfs Filesystem Chris Mason.

Overview of big data tools

DriveScale Log Collection Method of Procedure

Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper

Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.

Presentation transcript:

Plans for the renovation of the Post Mortem infrastructure TE-MPE-TM 82 15/09/2016 Matthias Pöschl – TE-MPE-MS

Matthias Pöschl – TE-MPE-MS Agenda Current Post Mortem Architecture Shortcomings Improvements New Post Mortem Storage New File Format Collaboration with CALS 2.0 Shared data and infrastructure New Post Mortem Architecture Conclusion 15/09/2016 Matthias Pöschl – TE-MPE-MS

Current Archtitecture 15/09/2016 Matthias Pöschl – TE-MPE-MS

Matthias Pöschl – TE-MPE-MS Shortcomings Direct user access to underlaying filesystem and file format Outdated data collection stack Manual load balancing Very limited horizontal scaling Unfit for future use-cases with strict time constraints 15/09/2016 Matthias Pöschl – TE-MPE-MS

Matthias Pöschl – TE-MPE-MS Improvements Update storage technology Allow for dynamic load balancing Update data collection stack and file format User access only through REST API 15/09/2016 Matthias Pöschl – TE-MPE-MS

New Post Mortem Storage Benchmark of Ceph, MongoDB and GlusterFS Use of real Post Mortem Data (all of 2015) Constraints: Technology has to handle drive and node failures, avoid data inconsistencies and degrade gracefully Three replicas of each object have to be stored Acknowledge of write only after all three copies have been written Adding more nodes should increase capacity and throughput by a known factor (“linear scaling”) 15/09/2016 Matthias Pöschl – TE-MPE-MS

New Post Mortem Storage GlusterFS performed best in read-only and write-only benchmarks, very closely followed by Ceph Ceph showed best mixed-workload performance: Technology Time to complete Objects/s Throughput NFS 26.3h 340 8,018 KB/s Ceph 20.4h 438 12,154 KB/s MongoDB 23.5h 381 8,973 KB/s GlusterFS 20.7h 432 10,187 KB/s Explain Testbench, Servers, HDD rpms, … 15/09/2016 Matthias Pöschl – TE-MPE-MS

New Post Mortem Storage Meeting with Dan van der Ster and Herve Rousseau from IT-ST-FDO to discuss Ceph IT has had good experiences with Ceph Biggest Ceph test so far with a ~30PB cluster IT is running the VMs on top of Ceph Not a single byte lost in +5 years of operation IT sees no problem in using it for Post Mortem IT is willing to provide support and assistance 15/09/2016 Matthias Pöschl – TE-MPE-MS

New Post Mortem Storage Test with test Ceph cluser provided by IT Explain variations in graph -> Filesize Benchmark Time to complete Objects/s Throughput Import 12.9h 345 8,135 KB/s Read 3.5h 1269 30,080 KB/s Read/Write 16.8h 795 12,552 KB/s 15/09/2016 Matthias Pöschl – TE-MPE-MS

Matthias Pöschl – TE-MPE-MS New file format Aspirations on using Apache Avro as file format for storage and to serve the users “Raw” RDA-data will still be stored for safety Avro offers many useful features Partial data retrieval (specific signals from a dump) Efficient and fast compression Self-describing schema Libraries for almost every programming language Allows direct conversion to JSON 15/09/2016 Matthias Pöschl – TE-MPE-MS

Collaboration with CALS 2.0 The Logging Service team is updating their whole infrastructure Oracle cluster will be replaced with Hadoop 15/09/2016 Matthias Pöschl – TE-MPE-MS

Collaboration with CALS 2.0 New CALS design will allow BigData queries “Show me the biggest deviations from the mean of a certain BLM in sector 67 in the year 2015 when dumping at 6.5 TeV” Idea: Feed the Logging Service the high resolution Post Mortem data to make this kind of queries even more valuable for the users 15/09/2016 Matthias Pöschl – TE-MPE-MS

Collaboration with CALS 2.0 Meetings with Chris Roderick, Jakub Wozniak and Marcin Sobieszek from BE-CO-DS to evaluate common use of technologies and services as well as details of data ingestion Result: CALS might also use Avro for their data ingestion Possible shared use of a Kafka cluster Shared data storage not (yet) possible 15/09/2016 Matthias Pöschl – TE-MPE-MS

New Post Mortem Architecture 15/09/2016 Matthias Pöschl – TE-MPE-MS

Matthias Pöschl – TE-MPE-MS Conclusion Ceph cluster for Post Mortem data seems feasible and suitable to tackle future use cases No common storage of CALS and PM (yet ) different timing constraints and data sizes different preferred storage technologies But: Data and infrastructure can be shared between both systems, providing a good trade-off for the users 15/09/2016 Matthias Pöschl – TE-MPE-MS

Matthias Pöschl – TE-MPE-MS Conclusion Avro allows efficient data storage, convenience for the users and easy integration with CALS 2.0 All data access through a REST API, serving uncompressed JSON compressed Avro 15/09/2016 Matthias Pöschl – TE-MPE-MS