Future Database Challenges

Slides:



Advertisements
Similar presentations
1 Vladimir Knežević Microsoft Software d.o.o.. 80% Održavanje 80% Održavanje 20% New Cost Reduction Keep Business Up & Running End User Productivity End.
Advertisements

1 DB2 Access Recording Services Auditing DB2 on z/OS with “DBARS” A product developed by Software Product Research.
1 Storage Today Victor Hatridge – CIO Nashville Electric Service (615)
A Fast Growing Market. Interesting New Players Lyzasoft.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
Data Deduplication in Virtualized Environments Marc Crespi, ExaGrid Systems
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
STEALTH Content Store for SharePoint using Caringo CAStor  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"
CERN Physics Database Services and Plans Maria Girone, CERN-IT
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
Lecture III: Challenges for software engineering with the cloud CS 4593 Cloud-Oriented Big Data and Software Engineering.
LIMPOPO DEPARTMENT OF ECONOMIC DEVELOPMENT, ENVIRONMENT AND TOURISM The heartland of southern Africa – development is about people! 2015 ICT YOUTH CONFERENCE.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Database 12.2 and Oracle Enterprise Manager 13c Liana LUPSA.
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
Eric Grancher CERN IT department Overview of Database Technologies Computing and Astroparticle Physics 2 nd ASPERA Workshop /1.
READ ME FIRST Use this template to create your Partner datasheet for Azure Stack Foundation. The intent is that this document can be saved to PDF and provided.
SUSE Linux Enterprise Server for SAP Applications
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
Integration of Oracle and Hadoop: hybrid databases affordable at scale
Tago Tago IoT DAY GRAIN BIN LEVEL? The epicenter of middleware
Introduction to Oracle Forms Developer and Oracle Forms Services
1 DB2 Access Recording Services Auditing DB2 on z/OS with “DBARS” A product developed by Software Product Research.
Monitoring Evolution and IPv6
Organizations Are Embracing New Opportunities
Integration of Oracle and Hadoop: hybrid databases affordable at scale
IT Services Katarzyna Dziedziniewicz-Wojcik IT-DB.
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Data Analytics and CERN IT Hadoop Service
Apache Spot (Incubating)
Hadoop and Analytics at CERN IT
Oracle Database In-Memory feature at CERN
STEC Solutions for NetApp Environments
Business Critical Application Platform
Introduction to Oracle Forms Developer and Oracle Forms Services
Methodology: Aspects: cost models, modelling of system, understanding of behaviour & performance, technology evolution, prototyping  Develop prototypes.
Boomerang Adds Smart Calendar Assistant and Reminders to Office 365 That Increase Productivity and Simplify Meeting Scheduling OFFICE 365 APP BUILDER.
Database Workshop Report
Sebastian Solbach Consulting Member of Technical Staff
Introduction to Oracle Forms Developer and Oracle Forms Services
New Big Data Solutions and Opportunities for DB Workloads
Azure Hybrid Use Benefit Overview
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
HPE Persistent Memory Microsoft Ignite 2017
Thoughts on Computing Upgrade Activities
Installation and database instance essentials
Data Analytics and CERN IT Hadoop Service
IWRITER 365 Offers Seamless, Easy-to-Use Solution for Using, Designing, Managing, and Sharing All Your Company Templates in Microsoft Office 365 OFFICE.
Data Analytics and CERN IT Hadoop Service
Vision for CERN IT Department
Introduction.
Business Critical Application Platform
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
Upgrading to Microsoft SQL Server 2014
Database Management System (DBMS)
Data Analytics and CERN IT Hadoop Service
Data Analytics – Use Cases, Platforms, Services
Kickoff Presentation Date of Presentation Presenter Name
Data Lifecycle Review and Outlook
Built on the Powerful Microsoft Office 365 Platform, My Intranet Boosts Efficiency with Support of Daily Tasks, Internal Communications and Collaboration.
BluVault Provides Secure and Cost-Effective Cloud Endpoint Backup and Recovery Using Power of Microsoft OneDrive Business and Microsoft Azure OFFICE 365.
IBM Power Systems.
DATS International Portfolio.
Presentation transcript:

Future Database Challenges Katarzyna Dziedziniewicz-Wojcik aa

Future Database Challenges Agenda Where are we now? Community requirement challenges Technical challenges Short overview of what is the current status of DB services 08/12/2016 Future Database Challenges

Future Database Challenges Agenda Where are we now? Community requirement challenges Technical challenges 08/12/2016 Future Database Challenges

Future Database Challenges Databases 3 main pillars Oracle For critical, transactional load Administered by DBA team (CERN IT-DB) DBoD - Database as a Service Different database engines MySQL, PostgreSQL, InfluxDB Instance owners have the most of DBA rights Hadoop For sequential workload I will refer to all the 3 pillars as database services. 08/12/2016 Future Database Challenges

Questions? 20/04/2016 DB Services

Future Database Challenges Size more than trippled In the last 4 years. 08/12/2016 Future Database Challenges

Future Database Challenges Agenda Where are we now? Community requirement challenges Technical challenges What is our challenge To respond to challenges and needs stated by HEP projects by adopting our service to the needs of our users Constant evolution By providing consultancy to facilitate the choice of optimal solution Your projects and needs are our challenges The presentation will focus on examples of challenges and how we see the service evolution to respond to them 08/12/2016 Future Database Challenges

Time series workload challenge Workload assumptions Billions of individual data points High write and read throughput 250k rec/s from LHC sensors Mostly an insert/append workload Large deletes Multiple use-cases Solution: Currently Influx DB Large scale time series solution is to be developed LHCb Monitoring Sensors data from LHC And this will lead us to a long term challenge 08/12/2016 Future Database Challenges

High ingest rate challenge Accelerator logging 2.0 Data coming from different sources Possibly 250k records/s Data Latency <~30sec No data losses acceptable Temporary data store in the ingestion layer Provide data transformation features Enhancing by adding some context Filtering Different data sources Stores acquisition device/property data Mainly NUMERIC timeseries, not text logs… Temp store – to prevent data loss in case storage layer is not avaialable 08/12/2016 Future Database Challenges

Future Database Challenges Data ingestion Short term: Provisioning of Kafka Long term: Find the best solution covering all requirements Multitude of coming and going solutions Assess both enterprise and open source possibilities Make sure that APIs/interfaces are as stable as possible 08/12/2016 Future Database Challenges

High frequency query challenge ALICE file catalog Currently Single instance of MySQL Few TBs of data, ~8.5B rows Read access rate – 11500 Hz Run 3 10X increase in data size and insert rate Robust solution is needed Used to find files containing certain events in CASTOR and grid What could be a possible solution? 08/12/2016 Future Database Challenges

Future Database Challenges Memory evolution NVMe Flash over PCI express High bandwidth at low latency 3D XPoint Upcoming NV technology Faster and more stable than traditional PCM Density 4-times one of DRAM An option for huge In-Memory databases, SPARK analysis Imagine an event index in memory Oracle is using flash for in-memory databases for their engineered systems. We can imagine running in-memory. OS database using NV memory as storage. 08/12/2016 Future Database Challenges

Physics analysis in DB type of workload Ongoing projects in experiments CMS Big Data project Speed up HEP analysis 100 times Process an input sample of 1PB within 5 hours SWAN Project Cloud HEP data analysis Analyse data without the need to install any software using Jupyter notebook interface Access experiments' and user data in the CERN cloud 08/12/2016 Future Database Challenges

Future Database Challenges Support for Spark Our goal: To expand support for Spark based physics analysis Spark addresses use cases for machine learning at scale Pilot in the pipeline using private cloud at CERN collaboration with OpenStack team Future: test public cloud? Analyse data without the need to install any software using Jupyter notebook interface Access experiments' and user data in the CERN cloud 08/12/2016 Future Database Challenges

Optimizing data placement Ongoing investigations in CMS Intelligent dataset placement across sites 70 sites ~20 Petabytes of data produces/year For HL-LHC Event size x3 Event number x5 Reduction of transfer and storage costs A base for a study on optimal job scheduling 08/12/2016 Future Database Challenges

Future Database Challenges Machine learning Linking machine learning with big data Optimizing algorithms for HEP needs library to integrate Keras + Spark The analytics infrastructure will be evolving Reviewing application and architecture choices on short cycles (~2 years?) New components will are evaluated More in Luca Canali’s talk tomorrow Possible field of cooperation with Oracle Labs 08/12/2016 Future Database Challenges

Future Database Challenges Agenda Where are we now? Community requirement challenges Technical challenges 08/12/2016 Future Database Challenges

Introduction of Oracle In-Memory Huge performance boost for full table scans Deployment is transparent to applications Limited impact on DML operations Challenges: Selection of proper workload Requires machines with loads of memory Already mentioned NV memory solutions will be evaluated Impact on DML will be diminished, when 12.2 will be in place. In-mem structures saved to TBS. 08/12/2016 Future Database Challenges

High availability challenge Ensuring disaster recovery Increasing backup capabilities Providing sufficient and flexible computing resource Resource expansion for peak usage Optimizing cost of operations Cost: hardware, licences, manpower – needed for cloud migration vs need for on-premise support 08/12/2016 Future Database Challenges

Future Database Challenges Cloud One of the options Oracle cloud is being currently tested Key factors Feasibility of using cloud with complex environments Experiment’s and technical network, SSO… Performance/latency Effort needed for data migration Cost Models differ between companies and even regions We are having a look at other options. 08/12/2016 Future Database Challenges

Changing technologies The change is more rapid than ever Which technologies will stay on the market? We have to provide consistent interface to the developers community Oracle Big Data discovery example Background analytics engine change, but interfaces remain Key challenge Select proper technologies for all 3 pillars of the service Combine commercial and open source As you have notices I have not mentioned many technologies as solutions to our challenges. 08/12/2016 Future Database Challenges

Future Database Challenges Summary Data size and ingest requirements will grow Possibly from 5bln records/day to 21bln in a few years for accelerator logging Number and size of experiment event will also grow Systems we plan today need to answer to future requirements New technologies are at hand We need to be prepared for evolving needs 08/12/2016 Future Database Challenges

Future Database Challenges Acknowledgments I would like to thank The Experiments and BE department for coming up with challenges and input Colleagues from IT-DB for solutions and help in preparing this presentation 08/12/2016 Future Database Challenges