Database Workshop Report

Database Workshop Report
Eva Dafonte Perez IT-DB

Workshop Aims Database Futures Workshop Aims
29th-30th May, 2nd edition 74 participants registered Aims Discuss future requirements in the database area Identify common needs between user communities Evaluate new trends & technologies Understand how services should evolve/improve to fulfil new requirements Aprox 35 attended, between 5 and 10 vidyo

Relational Model 1970 1980 1990 2000 2010 InfluxDB Hadoop
Relational Databases (DBMS) Distributed Databases Time Series Databases (TSDB) Object Oriented Databases (OODBMS) NoSQL Big Data Object Relational Databases (ORDBMS) Analytics Relational Model ElasticSearch PostgreSQL Oracle (RSI) Hadoop MySQL InfluxDB 1970 1980 1990 2000 2010 InfluxDB Hadoop Service Oracle version 2.3 (accelerator control) ElasticSearch Database On Demand MySQL, PostgreSQL Database Futures workshop (1st edition)

Tracks Requirements for run3&4 Implementations & Technologies
Data volumes, update and access rates; application criticality; HA, replication needs, business continuity, cloud services, virtualization; machine learning; development and deployment process Implementations & Technologies Various database engines and technologies, innovative database applications; comparisons Going beyond relational Technologies and use cases: Big Data, Hadoop, Spark, … Requirements for run3&4 Major database users will present their current database application areas and likely future needs, highlighting requirements in terms of Data volumes, update and access rates; Application criticality; High Availability: replication needs, business continuity, cloud services, virtualization; Machine learning; and development and deployment process---e.g. strongly managed or favouring ease of development by end-users.  Implementations & Technologies Presentations, or suggestions for presentations, are solicited to cover the various database engines and technologies and on interesting and/or innovative database applications, preferably focussing on an explanation of the technology chosen with reasons---and, if possible, comparisons with rejected technologies.  Going beyond relational Presentations, or suggestions for presentations, are solicited to cover the various technologies and use cases: Big Data, Hadoop, Spark, ...

Requirements for Run3&4 Traditional applications will still be key elements in the HL-LHC era Increase of database applications to index events and analysis applications Based on relational databases and NoSQL technologies Overall the relational model is valid Oracle is the preferred solution for critical applications Exceptions: ALICE (“zoo of solutions”), LHCb (mainly MySQL) Cost-effective platform in terms of functionality and performance Expertise Support is a key factor

Requirements for Run3&4 Run3&4  larger insert and update rates = database workload increase Advance Oracle 12c features (in-memory, new partitioning, …) Migration to Oracle 12cR2 in LS2 Powerful hardware to improve response times More difficulties to schedule interventions Move towards zero downtime Fast-switching standby services Alternatives to Oracle for smaller projects to facilitate Collaboration with other institutes Open sourcing The evolution of hardware components, in particular new types of memory, is expected to have considerable impact on database applications. This is seen as an opportunity to help solve some of the challenges. Another aspect of the hardware evolution is that servers are increasingly capable of handling load for several applications and are well suited for consolidation efforts. This is of course beneficial to reduce cost, but comes with additional challenges typical of multi-tenant systems, such as resource management, isolation of workloads and security.

Implementations & Technologies
New systems/evolution – move towards NoSQL solutions New accelerator logging service (NXCALS) Next generation archiver Next generation for Post Mortem event storage and analysis Conditions data management system for HEP experiments CMS Big Data project The Logging Service stores accelerator beam & equipment data on-line, to be kept beyond the lifetime of the LHC (>20y). Used to analyse and improve behaviour of accelerators & their sub-systems over time. In 2016, after 13 years in service and with a throughput of >1.5 TB / day - acknowledged that the service cannot satisfy new demands to analyse huge data sets in an efficient manner. A new system “NXCALS” is under development based on “Big Data” technologies and in collaboration with IT. Current “CALS” system is foreseen to be turned off during 2019 (LS2). Archiver in a control system: Essential element of SCADA (Supervision) layer – Stores the history of values and alarms (into a “database”) – Allows the operator to see the evolution of the process (historical trends) – Diagnostics and postmortem analysis “Event Screen” – Still in research & development phase The Post Mortem system allows the storage and analysis of transient data recordings from accelerator equipment systems, Post Mortem data is complementary to Logging data --- Research concluded, starting development phase. Conditions DB infrastructure, good experience with COOL but there are limitations (i.e. Caching is problematic), looking for common solutions ((Also for non-LHC experiments, in particular Belle II and NA62) CMS is working together with CERN openlab and Intel on the CMS Big Data Reduction Facility. The goal is to reduce 1 PB of official CMS data to 1 TB of ntuple output for analysis. We are presenting the progress of this 2-year project with first results of scaling up Spark-based HEP analysis. We are also presenting studies on using Apache Spark for a CMS Dark Matter physics search, investigating Spark’s feasibility, usability and performance compared to the traditional ROOT-based analysis.

Motivation Scale out Enable data analytics Newer technologies more appropriate to solve specific use cases No antagonism SQL vs NoSQL anymore Risks in the medium term Less interest / disappear Difficult to maintain

Provenance LHCb bookkeeping, CMS analysis, … Integrate origin / meta information important for further analysis Database on Demand Supports MySQL, PostgreSQL (relational) and InfluxDB (time series) Backup & Recovery, HA, Monitoring updates Working to offer instances in TN Help to use different DBMS Open source tools available to facilitate migration DBoD team can be contacted

Going beyond relational
Data Analytics Hadoop, Spark, Sqoop, Impala, Hbase, Hive and Pig Centralised Elasticsearch service Distributed, RESTful search and analytics engine Hadoop and ElasticSearch becoming critical to ATLAS Growing interest on Time Series databases Easier analysis Improved storage and ingestion rates InfluxDB use cases: DBoD monitoring IT monitoring Streams processing Kafka pilot service use cases: Accelerator logging service Computing infrastructure monitoring

In general… Positive feedback on database services by IT
Fruitful discussions Synergies (even overlaps) Collaboration Scope to optimise resources for all Next similar workshop in 2019 (LS2) Given the dynamic nature of the technologies Many projects in development

Thank you!

Database Workshop Report

Similar presentations

Presentation on theme: "Database Workshop Report"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Database Workshop Report

Similar presentations

Presentation on theme: "Database Workshop Report"— Presentation transcript:

Similar presentations

About project

Feedback