Download presentation
Presentation is loading. Please wait.
2
Future Database Challenges
Katarzyna Dziedziniewicz-Wojcik aa
3
Future Database Challenges
Agenda Where are we now? Community requirement challenges Technical challenges Short overview of what is the current status of DB services 08/12/2016 Future Database Challenges
4
Future Database Challenges
Agenda Where are we now? Community requirement challenges Technical challenges 08/12/2016 Future Database Challenges
5
Future Database Challenges
Databases 3 main pillars Oracle For critical, transactional load Administered by DBA team (CERN IT-DB) DBoD - Database as a Service Different database engines MySQL, PostgreSQL, InfluxDB Instance owners have the most of DBA rights Hadoop For sequential workload I will refer to all the 3 pillars as database services. 08/12/2016 Future Database Challenges
6
Questions? 20/04/2016 DB Services
7
Future Database Challenges
Size more than trippled In the last 4 years. 08/12/2016 Future Database Challenges
8
Future Database Challenges
Agenda Where are we now? Community requirement challenges Technical challenges What is our challenge To respond to challenges and needs stated by HEP projects by adopting our service to the needs of our users Constant evolution By providing consultancy to facilitate the choice of optimal solution Your projects and needs are our challenges The presentation will focus on examples of challenges and how we see the service evolution to respond to them 08/12/2016 Future Database Challenges
9
Time series workload challenge
Workload assumptions Billions of individual data points High write and read throughput 250k rec/s from LHC sensors Mostly an insert/append workload Large deletes Multiple use-cases Solution: Currently Influx DB Large scale time series solution is to be developed LHCb Monitoring Sensors data from LHC And this will lead us to a long term challenge 08/12/2016 Future Database Challenges
10
High ingest rate challenge
Accelerator logging 2.0 Data coming from different sources Possibly 250k records/s Data Latency <~30sec No data losses acceptable Temporary data store in the ingestion layer Provide data transformation features Enhancing by adding some context Filtering Different data sources Stores acquisition device/property data Mainly NUMERIC timeseries, not text logs… Temp store – to prevent data loss in case storage layer is not avaialable 08/12/2016 Future Database Challenges
11
Future Database Challenges
Data ingestion Short term: Provisioning of Kafka Long term: Find the best solution covering all requirements Multitude of coming and going solutions Assess both enterprise and open source possibilities Make sure that APIs/interfaces are as stable as possible 08/12/2016 Future Database Challenges
12
High frequency query challenge
ALICE file catalog Currently Single instance of MySQL Few TBs of data, ~8.5B rows Read access rate – Hz Run 3 10X increase in data size and insert rate Robust solution is needed Used to find files containing certain events in CASTOR and grid What could be a possible solution? 08/12/2016 Future Database Challenges
13
Future Database Challenges
Memory evolution NVMe Flash over PCI express High bandwidth at low latency 3D XPoint Upcoming NV technology Faster and more stable than traditional PCM Density 4-times one of DRAM An option for huge In-Memory databases, SPARK analysis Imagine an event index in memory Oracle is using flash for in-memory databases for their engineered systems. We can imagine running in-memory. OS database using NV memory as storage. 08/12/2016 Future Database Challenges
14
Physics analysis in DB type of workload
Ongoing projects in experiments CMS Big Data project Speed up HEP analysis 100 times Process an input sample of 1PB within 5 hours SWAN Project Cloud HEP data analysis Analyse data without the need to install any software using Jupyter notebook interface Access experiments' and user data in the CERN cloud 08/12/2016 Future Database Challenges
15
Future Database Challenges
Support for Spark Our goal: To expand support for Spark based physics analysis Spark addresses use cases for machine learning at scale Pilot in the pipeline using private cloud at CERN collaboration with OpenStack team Future: test public cloud? Analyse data without the need to install any software using Jupyter notebook interface Access experiments' and user data in the CERN cloud 08/12/2016 Future Database Challenges
16
Optimizing data placement
Ongoing investigations in CMS Intelligent dataset placement across sites 70 sites ~20 Petabytes of data produces/year For HL-LHC Event size x3 Event number x5 Reduction of transfer and storage costs A base for a study on optimal job scheduling 08/12/2016 Future Database Challenges
17
Future Database Challenges
Machine learning Linking machine learning with big data Optimizing algorithms for HEP needs library to integrate Keras + Spark The analytics infrastructure will be evolving Reviewing application and architecture choices on short cycles (~2 years?) New components will are evaluated More in Luca Canali’s talk tomorrow Possible field of cooperation with Oracle Labs 08/12/2016 Future Database Challenges
18
Future Database Challenges
Agenda Where are we now? Community requirement challenges Technical challenges 08/12/2016 Future Database Challenges
19
Introduction of Oracle In-Memory
Huge performance boost for full table scans Deployment is transparent to applications Limited impact on DML operations Challenges: Selection of proper workload Requires machines with loads of memory Already mentioned NV memory solutions will be evaluated Impact on DML will be diminished, when 12.2 will be in place. In-mem structures saved to TBS. 08/12/2016 Future Database Challenges
20
High availability challenge
Ensuring disaster recovery Increasing backup capabilities Providing sufficient and flexible computing resource Resource expansion for peak usage Optimizing cost of operations Cost: hardware, licences, manpower – needed for cloud migration vs need for on-premise support 08/12/2016 Future Database Challenges
21
Future Database Challenges
Cloud One of the options Oracle cloud is being currently tested Key factors Feasibility of using cloud with complex environments Experiment’s and technical network, SSO… Performance/latency Effort needed for data migration Cost Models differ between companies and even regions We are having a look at other options. 08/12/2016 Future Database Challenges
22
Changing technologies
The change is more rapid than ever Which technologies will stay on the market? We have to provide consistent interface to the developers community Oracle Big Data discovery example Background analytics engine change, but interfaces remain Key challenge Select proper technologies for all 3 pillars of the service Combine commercial and open source As you have notices I have not mentioned many technologies as solutions to our challenges. 08/12/2016 Future Database Challenges
23
Future Database Challenges
Summary Data size and ingest requirements will grow Possibly from 5bln records/day to 21bln in a few years for accelerator logging Number and size of experiment event will also grow Systems we plan today need to answer to future requirements New technologies are at hand We need to be prepared for evolving needs 08/12/2016 Future Database Challenges
24
Future Database Challenges
Acknowledgments I would like to thank The Experiments and BE department for coming up with challenges and input Colleagues from IT-DB for solutions and help in preparing this presentation 08/12/2016 Future Database Challenges
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.