Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Research Center “Kurchatov Institute”

Similar presentations


Presentation on theme: "National Research Center “Kurchatov Institute”"— Presentation transcript:

1 National Research Center “Kurchatov Institute”
The Laboratory of BigData Technologies for mega-science projects Data management in heterogeneous metadata storage and access infrastructures Marina Golosova

2 Outline ETL subsystem Development framework for ETL subsystems
29/09/2017 Marina Golosova, NEC 2017

3 ETL subsystem ETL = (E)xtract, (T)ransform, (L)oad Data source
Transformation Data sink Load Extract Transformation Data sink 1 Extract Data source Extract Transformation Data sink 2 29/09/2017 Marina Golosova, NEC 2017

4 ETL subsystem ETL = (E)xtract, (T)ransform, (L)oad Data source
Transformation Data sink Load Extract Transformation Data sink 1 Data source Extract Transformation Data sink 2 29/09/2017 Marina Golosova, NEC 2017

5 Subject area (non)specific components
ETL supervisor Failure handling: restart process reprocess data Ensure data delivery Run every X min Loader Transformation Data sink 1 Transformation Transformation Extractor Data source Parallelization coordinator Loader Transformation Data sink 2 Transformation Transformation 29/09/2017 Marina Golosova, NEC 2017

6 ETL subsystem development framework
Tasks: process supervising (run, stop, restart, …) data delivery between processes (exactly once, …) parallelization management Project: the framework based on Apache Kafka: data delivery via Kafka topics and producer/consumer API process management with Kafka Streams library parallelization management via topics partitioning and Kafka Streams application configuration 29/09/2017 Marina Golosova, NEC 2017

7 DKB ETL subsystem Kafka extentions:
topology constructor (config files instead of Java code) external process adapters for Processing/Source/Sink (allow running any executable as a topology node) primitive data transfer protocol (to be improved) 29/09/2017 Marina Golosova, NEC 2017

8 Acknowledgements Many thanks to the wonderful people who helped me through the work: Maria Grigorieva Alexei Klimentov Torre Wenaus Eugene Ryabinkin ATLAS collaboration The work was supported by the Russian Ministry of Science and Education under contract №14.Z 29/09/2017 Marina Golosova, NEC 2017


Download ppt "National Research Center “Kurchatov Institute”"

Similar presentations


Ads by Google