Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Data Management Tools for the Heterogeneous Big Data Warehouse Autors: Aleksandr Alekseev (Programmer), Victoria Osipova (Associate professor),

Similar presentations


Presentation on theme: "Efficient Data Management Tools for the Heterogeneous Big Data Warehouse Autors: Aleksandr Alekseev (Programmer), Victoria Osipova (Associate professor),"— Presentation transcript:

1 Efficient Data Management Tools for the Heterogeneous Big Data Warehouse Autors: Aleksandr Alekseev (Programmer), Victoria Osipova (Associate professor), Alexei Klimentov (Project leader), Maksim Ivanov (Department head), Nina Grigorjeva (Student) XXV Symposium on Nuclear Electronics and Computing NEC'2015

2 Problem: SQL vs NoSQL Victoria Osipova / NEC'2015 2 #Criterion Technology SQLNoSQL Data structure 1Formalized structure  2Scaling  3Data consistency  Data processing 4Atomicity  5Isolation  6Reliability  7Data Managing  8Processing of Big Data  9Map/Reduce  10Replication 

3 Challenge Heterogeneous Big Data Warehouse consisting of: 1) SQL Database 2) NoSQL System 3) Data Management System Victoria Osipova / NEC'2015 3 There are no good or bad tools, there are efficient tools for the specific task.

4 1. SQL Database Relational DBMS Oracle 11g on Real Application Cluster with 3 nodes 23 normalized relational tables for domain Seismic geological exploration 4 Victoria Osipova / NEC'2015

5 2. NoSQL-System 3 classes (depending on data model): – columnar – key-value – document-oriented 3 representatives of classes: – Apache Cassandra (Datastax) – Apache Hadoop (Cloudera), in particular Hive, Impala – MongoDB Hardware: – Server HP Proliant DL 360 G6 – Processor 2 x Intel Xeon X5550 2,67 Ghz – Memory 12 Gb – HDD 500 Gb, Raid 1 – OS Linux Ubuntu server edition 14.04.3 LTS Victoria Osipova / NEC'2015 5

6 Experiment Results for MongoDB Average - 2,62 sec, maximum - 3 sec, minimum - 2,44 sec. Victoria Osipova / NEC'2015 6

7 Experiment Results for Hadoop + Hive Average - 13,09 sec, maximum - 13,44 sec, minimum - 12,76 sec. 7 Victoria Osipova / NEC'2015

8 Experiment Results for Hadoop + Impala Average - 1,41 sec, maximum - 1,76 sec, minimum - 1,26 sec. 8 Victoria Osipova / NEC'2015

9 Experiment Results for Apache Cassandra Using Original Drivers Average - 0,17 sec, maximum - 0,48 sec, minimum - 0,07 sec. 9 Victoria Osipova / NEC'2015

10 Experiment Results for Apache Cassandra Using DataStax Drivers Average - 0,15 sec, maximum - 0,26 sec, minimum - 0,04 sec. 10 Victoria Osipova / NEC'2015

11 Aggregate Experiment Results 11 Victoria Osipova / NEC'2015

12 NoSQL-Systems Ranking R i - rank of i-th monitoring system; V ij - rank of j-th requirement to i-th monitoring system; L ij - weight of j-th requirement to i-th monitoring system. 12 Weight5010 205 5 100 NoSQL- System i Query execution time for fetching NoSQL- system monitoring Ease of writing queries Additional tools for processing data Ease of system configuration and deployment Completeness of documentation and manuals Rank R Hadoop30109202576 MongoDB356585 4 63 Cassandra50810 3 5 86 Victoria Osipova / NEC'2015

13 3. Data Management System of Heterogeneous Warehouse Functions: Data export from Oracle to NoSQL Data visualization out of Oracle to NoSQL NoSQL data updating Query performance estimation for NoSQL Reporting and data upload out of NoSQL Remote access to system using any Web-browser 13 Victoria Osipova / NEC'2015

14 3. Data Management System of Heterogeneous Warehouse Modules: Data conversion from SQL to NoSQL – Dataset generation for Cassandra, Hadoop, MongoDB – Data export in Cassandra, Hadoop, MongoDB – NoSQL data updating Query performance estimation – Query performance estimation to NoSQL – Reporting of query performance estimation to NoSQL NoSQL data representation – Query results visualization – Query results export in the format of PDF, DOC, XML 14 Victoria Osipova / NEC'2015

15 Architecture of Heterogeneous Big Data Warehouse 15 Victoria Osipova / NEC'2015

16 Summary Comparative analysis of pros and cons of SQL&NoSQL technologies A series of experiments on processing data by 3 NoSQL systems: Cassandra, Hadoop, MongoDB Data Management System of Heterogeneous Big Data Warehouse with 11 modules 16 Victoria Osipova / NEC'2015

17 Thank you for attention!


Download ppt "Efficient Data Management Tools for the Heterogeneous Big Data Warehouse Autors: Aleksandr Alekseev (Programmer), Victoria Osipova (Associate professor),"

Similar presentations


Ads by Google