Download presentation
Presentation is loading. Please wait.
Published byDeirdre Tamsin Houston Modified over 8 years ago
1
Efficient Data Management Tools for the Heterogeneous Big Data Warehouse Autors: Aleksandr Alekseev (Programmer), Victoria Osipova (Associate professor), Alexei Klimentov (Project leader), Maksim Ivanov (Department head), Nina Grigorjeva (Student) XXV Symposium on Nuclear Electronics and Computing NEC'2015
2
Problem: SQL vs NoSQL Victoria Osipova / NEC'2015 2 #Criterion Technology SQLNoSQL Data structure 1Formalized structure 2Scaling 3Data consistency Data processing 4Atomicity 5Isolation 6Reliability 7Data Managing 8Processing of Big Data 9Map/Reduce 10Replication
3
Challenge Heterogeneous Big Data Warehouse consisting of: 1) SQL Database 2) NoSQL System 3) Data Management System Victoria Osipova / NEC'2015 3 There are no good or bad tools, there are efficient tools for the specific task.
4
1. SQL Database Relational DBMS Oracle 11g on Real Application Cluster with 3 nodes 23 normalized relational tables for domain Seismic geological exploration 4 Victoria Osipova / NEC'2015
5
2. NoSQL-System 3 classes (depending on data model): – columnar – key-value – document-oriented 3 representatives of classes: – Apache Cassandra (Datastax) – Apache Hadoop (Cloudera), in particular Hive, Impala – MongoDB Hardware: – Server HP Proliant DL 360 G6 – Processor 2 x Intel Xeon X5550 2,67 Ghz – Memory 12 Gb – HDD 500 Gb, Raid 1 – OS Linux Ubuntu server edition 14.04.3 LTS Victoria Osipova / NEC'2015 5
6
Experiment Results for MongoDB Average - 2,62 sec, maximum - 3 sec, minimum - 2,44 sec. Victoria Osipova / NEC'2015 6
7
Experiment Results for Hadoop + Hive Average - 13,09 sec, maximum - 13,44 sec, minimum - 12,76 sec. 7 Victoria Osipova / NEC'2015
8
Experiment Results for Hadoop + Impala Average - 1,41 sec, maximum - 1,76 sec, minimum - 1,26 sec. 8 Victoria Osipova / NEC'2015
9
Experiment Results for Apache Cassandra Using Original Drivers Average - 0,17 sec, maximum - 0,48 sec, minimum - 0,07 sec. 9 Victoria Osipova / NEC'2015
10
Experiment Results for Apache Cassandra Using DataStax Drivers Average - 0,15 sec, maximum - 0,26 sec, minimum - 0,04 sec. 10 Victoria Osipova / NEC'2015
11
Aggregate Experiment Results 11 Victoria Osipova / NEC'2015
12
NoSQL-Systems Ranking R i - rank of i-th monitoring system; V ij - rank of j-th requirement to i-th monitoring system; L ij - weight of j-th requirement to i-th monitoring system. 12 Weight5010 205 5 100 NoSQL- System i Query execution time for fetching NoSQL- system monitoring Ease of writing queries Additional tools for processing data Ease of system configuration and deployment Completeness of documentation and manuals Rank R Hadoop30109202576 MongoDB356585 4 63 Cassandra50810 3 5 86 Victoria Osipova / NEC'2015
13
3. Data Management System of Heterogeneous Warehouse Functions: Data export from Oracle to NoSQL Data visualization out of Oracle to NoSQL NoSQL data updating Query performance estimation for NoSQL Reporting and data upload out of NoSQL Remote access to system using any Web-browser 13 Victoria Osipova / NEC'2015
14
3. Data Management System of Heterogeneous Warehouse Modules: Data conversion from SQL to NoSQL – Dataset generation for Cassandra, Hadoop, MongoDB – Data export in Cassandra, Hadoop, MongoDB – NoSQL data updating Query performance estimation – Query performance estimation to NoSQL – Reporting of query performance estimation to NoSQL NoSQL data representation – Query results visualization – Query results export in the format of PDF, DOC, XML 14 Victoria Osipova / NEC'2015
15
Architecture of Heterogeneous Big Data Warehouse 15 Victoria Osipova / NEC'2015
16
Summary Comparative analysis of pros and cons of SQL&NoSQL technologies A series of experiments on processing data by 3 NoSQL systems: Cassandra, Hadoop, MongoDB Data Management System of Heterogeneous Big Data Warehouse with 11 modules 16 Victoria Osipova / NEC'2015
17
Thank you for attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.