Download presentation
Presentation is loading. Please wait.
1
National Research Center “Kurchatov Institute”
The Laboratory of BigData Technologies for mega-science projects Data management in heterogeneous metadata storage and access infrastructures Marina Golosova
2
Outline ETL subsystem Development framework for ETL subsystems
29/09/2017 Marina Golosova, NEC 2017
3
ETL subsystem ETL = (E)xtract, (T)ransform, (L)oad Data source
Transformation Data sink Load Extract Transformation Data sink 1 Extract Data source Extract Transformation Data sink 2 29/09/2017 Marina Golosova, NEC 2017
4
ETL subsystem ETL = (E)xtract, (T)ransform, (L)oad Data source
Transformation Data sink Load Extract Transformation Data sink 1 Data source Extract Transformation Data sink 2 29/09/2017 Marina Golosova, NEC 2017
5
Subject area (non)specific components
ETL supervisor Failure handling: restart process reprocess data Ensure data delivery Run every X min Loader Transformation Data sink 1 Transformation Transformation Extractor Data source Parallelization coordinator Loader Transformation Data sink 2 Transformation Transformation 29/09/2017 Marina Golosova, NEC 2017
6
ETL subsystem development framework
Tasks: process supervising (run, stop, restart, …) data delivery between processes (exactly once, …) parallelization management Project: the framework based on Apache Kafka: data delivery via Kafka topics and producer/consumer API process management with Kafka Streams library parallelization management via topics partitioning and Kafka Streams application configuration 29/09/2017 Marina Golosova, NEC 2017
7
DKB ETL subsystem Kafka extentions:
topology constructor (config files instead of Java code) external process adapters for Processing/Source/Sink (allow running any executable as a topology node) primitive data transfer protocol (to be improved) 29/09/2017 Marina Golosova, NEC 2017
8
Acknowledgements Many thanks to the wonderful people who helped me through the work: Maria Grigorieva Alexei Klimentov Torre Wenaus Eugene Ryabinkin ATLAS collaboration The work was supported by the Russian Ministry of Science and Education under contract №14.Z 29/09/2017 Marina Golosova, NEC 2017
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.