Database Workshop Report

Slides:

Advertisements

Similar presentations

An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.

Advertisements

CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.

Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.

1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.

Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*

Online School Management System Supervisor Name: Ashraful Islam Juwel Lecturer of Asian University of Bangladesh Submitted By: Bikash Chandra SutrodhorID.

Data Analytics and Hadoop Service in IT-DB Visit of Cloudera - April 19 th, 2016 Luca Canali (CERN) for IT-DB.

Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.

Eric Grancher CERN IT department Overview of Database Technologies Computing and Astroparticle Physics 2 nd ASPERA Workshop /1.

Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.

OMOP CDM on Hadoop Reference Architecture

Systems Analysis and Design in a Changing World, Fifth Edition

Connected Infrastructure

Monitoring Evolution and IPv6

Business System Development

WLCG Workshop 2017 [Manchester] Operations Session Summary

In quest of the operational database for real-time environmental monitoring and early warning systems Bartosz Baliś, Marian Bubak, Daniel Harezlak, Piotr.

Data Platform and Analytics Foundational Training

Backup and Recovery for Hadoop: Plans, Survey and User Inputs

Big Data Enterprise Patterns

Manufacturing Productivity Solutions

Connected Living Connected Living What to look for Architecture

IT Services Katarzyna Dziedziniewicz-Wojcik IT-DB.

Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.

Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.

Presented by Li Gang Accelerator Control Group

Data Analytics and CERN IT Hadoop Service

Hadoop and Analytics at CERN IT

Future Database Challenges

Computing models, facilities, distributed computing

Database Futures Workshop Rapid Summary

Future Archiver (librarian) for WinCC OA Control Systems

The “Understanding Performance!” team in CERN IT

Methodology: Aspects: cost models, modelling of system, understanding of behaviour & performance, technology evolution, prototyping  Develop prototypes.

Running virtualized Hadoop, does it make sense?

Next Generation of Post Mortem Event Storage and Analysis

FUTURE ICT CHALLENGES IN SCIENTIFIC COMPUTING

Operational & Analytical Database

Connected Living Connected Living What to look for Architecture

New Big Data Solutions and Opportunities for DB Workloads

Couchbase Server is a NoSQL Database with a SQL-Based Query Language

Distributed Service Bundles

WLCG Service Interventions

Connected Infrastructure

Thoughts on Computing Upgrade Activities

CCNET Managed Services

Systems Analysis – ITEC 3155 Evaluating Alternatives for Requirements, Environment, and Implementation.

Data Analytics and CERN IT Hadoop Service

Data Analytics and CERN IT Hadoop Service

Hyper-V Cloud Proof of Concept Kickoff Meeting <Customer Name>

Database Futures Workshop cern. ch/conferenceDisplay

WLCG Collaboration Workshop;

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.

Ministry of Higher Education

Data Analytics and CERN IT Hadoop Service

Data Analytics – Use Cases, Platforms, Services

Big Data - in Performance Engineering

Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.

Data Lifecycle Review and Outlook

MES Migration HGP Asia Knowledge Day 2017

Data Security for Microsoft Azure

WIS Strategy – WIS 2.0 Submitted by: Matteo Dell’Acqua(CBS) (Doc 5b)

TEMPLATE NOTES Our datasheet and mini-case study templates are formatted specifically for consistency of branding at Microsoft. Please do not alter font.

Technical Capabilities

IBM Power Systems.

Database Futures Workshop cern. ch/conferenceDisplay

Reliability information databases and feasibility within accelerator community Heinrich Humer (AIT Austrian Institute of Technology) Alexander Preinerstorfer.

The Database World of Azure

Presentation transcript:

Database Workshop Report Eva Dafonte Perez IT-DB

Workshop Aims Database Futures Workshop Aims 29th-30th May, 2nd edition https://indico.cern.ch/event/615499/overview 74 participants registered Aims Discuss future requirements in the database area Identify common needs between user communities Evaluate new trends & technologies Understand how services should evolve/improve to fulfil new requirements Aprox 35 attended, between 5 and 10 vidyo

Relational Model 1970 1980 1990 2000 2010 InfluxDB Hadoop Relational Databases (DBMS) Distributed Databases Time Series Databases (TSDB) Object Oriented Databases (OODBMS) NoSQL Big Data Object Relational Databases (ORDBMS) Analytics Relational Model ElasticSearch PostgreSQL Oracle (RSI) Hadoop MySQL InfluxDB 1970 1980 1990 2000 2010 InfluxDB Hadoop Service Oracle version 2.3 (accelerator control) ElasticSearch Database On Demand MySQL, PostgreSQL Database Futures workshop (1st edition)

Tracks Requirements for run3&4 Implementations & Technologies Data volumes, update and access rates; application criticality; HA, replication needs, business continuity, cloud services, virtualization; machine learning; development and deployment process Implementations & Technologies Various database engines and technologies, innovative database applications; comparisons Going beyond relational Technologies and use cases: Big Data, Hadoop, Spark, … Requirements for run3&4 Major database users will present their current database application areas and likely future needs, highlighting requirements in terms of Data volumes, update and access rates; Application criticality; High Availability: replication needs, business continuity, cloud services, virtualization; Machine learning; and development and deployment process---e.g. strongly managed or favouring ease of development by end-users.  Implementations & Technologies Presentations, or suggestions for presentations, are solicited to cover the various database engines and technologies and on interesting and/or innovative database applications, preferably focussing on an explanation of the technology chosen with reasons---and, if possible, comparisons with rejected technologies.  Going beyond relational Presentations, or suggestions for presentations, are solicited to cover the various technologies and use cases: Big Data, Hadoop, Spark, ...

Requirements for Run3&4 Traditional applications will still be key elements in the HL-LHC era Increase of database applications to index events and analysis applications Based on relational databases and NoSQL technologies Overall the relational model is valid Oracle is the preferred solution for critical applications Exceptions: ALICE (“zoo of solutions”), LHCb (mainly MySQL) Cost-effective platform in terms of functionality and performance Expertise Support is a key factor

Requirements for Run3&4 Run3&4  larger insert and update rates = database workload increase Advance Oracle 12c features (in-memory, new partitioning, …) Migration to Oracle 12cR2 in LS2 Powerful hardware to improve response times More difficulties to schedule interventions Move towards zero downtime Fast-switching standby services Alternatives to Oracle for smaller projects to facilitate Collaboration with other institutes Open sourcing The evolution of hardware components, in particular new types of memory, is expected to have considerable impact on database applications. This is seen as an opportunity to help solve some of the challenges. Another aspect of the hardware evolution is that servers are increasingly capable of handling load for several applications and are well suited for consolidation efforts. This is of course beneficial to reduce cost, but comes with additional challenges typical of multi-tenant systems, such as resource management, isolation of workloads and security.

Implementations & Technologies New systems/evolution – move towards NoSQL solutions New accelerator logging service (NXCALS) Next generation archiver Next generation for Post Mortem event storage and analysis Conditions data management system for HEP experiments CMS Big Data project The Logging Service stores accelerator beam & equipment data on-line, to be kept beyond the lifetime of the LHC (>20y). Used to analyse and improve behaviour of accelerators & their sub-systems over time. In 2016, after 13 years in service and with a throughput of >1.5 TB / day - acknowledged that the service cannot satisfy new demands to analyse huge data sets in an efficient manner. A new system “NXCALS” is under development based on “Big Data” technologies and in collaboration with IT. Current “CALS” system is foreseen to be turned off during 2019 (LS2). Archiver in a control system: Essential element of SCADA (Supervision) layer – Stores the history of values and alarms (into a “database”) – Allows the operator to see the evolution of the process (historical trends) – Diagnostics and postmortem analysis “Event Screen” – Still in research & development phase The Post Mortem system allows the storage and analysis of transient data recordings from accelerator equipment systems, Post Mortem data is complementary to Logging data --- Research concluded, starting development phase. Conditions DB infrastructure, good experience with COOL but there are limitations (i.e. Caching is problematic), looking for common solutions ((Also for non-LHC experiments, in particular Belle II and NA62) CMS is working together with CERN openlab and Intel on the CMS Big Data Reduction Facility. The goal is to reduce 1 PB of official CMS data to 1 TB of ntuple output for analysis. We are presenting the progress of this 2-year project with first results of scaling up Spark-based HEP analysis. We are also presenting studies on using Apache Spark for a CMS Dark Matter physics search, investigating Spark’s feasibility, usability and performance compared to the traditional ROOT-based analysis.

Implementations & Technologies Motivation Scale out Enable data analytics Newer technologies more appropriate to solve specific use cases No antagonism SQL vs NoSQL anymore Risks in the medium term Less interest / disappear Difficult to maintain

Implementations & Technologies Provenance LHCb bookkeeping, CMS analysis, … Integrate origin / meta information important for further analysis Database on Demand Supports MySQL, PostgreSQL (relational) and InfluxDB (time series) Backup & Recovery, HA, Monitoring updates Working to offer instances in TN Help to use different DBMS Open source tools available to facilitate migration DBoD team can be contacted

Going beyond relational Data Analytics Hadoop, Spark, Sqoop, Impala, Hbase, Hive and Pig Centralised Elasticsearch service Distributed, RESTful search and analytics engine Hadoop and ElasticSearch becoming critical to ATLAS Growing interest on Time Series databases Easier analysis Improved storage and ingestion rates InfluxDB use cases: DBoD monitoring IT monitoring Streams processing Kafka pilot service use cases: Accelerator logging service Computing infrastructure monitoring

In general… Positive feedback on database services by IT Fruitful discussions Synergies (even overlaps) Collaboration Scope to optimise resources for all Next similar workshop in 2019 (LS2) Given the dynamic nature of the technologies Many projects in development

Thank you!