Presentation is loading. Please wait.

Presentation is loading. Please wait.

HadoopDB project An Architetural hybrid of MapReduce and DBMS Technologies for Analytical Workloads Anssi Salohalla.

Similar presentations


Presentation on theme: "HadoopDB project An Architetural hybrid of MapReduce and DBMS Technologies for Analytical Workloads Anssi Salohalla."— Presentation transcript:

1 HadoopDB project An Architetural hybrid of MapReduce and DBMS Technologies for Analytical Workloads Anssi Salohalla

2 Background Amount of data that needs to be stored for analyzing is exploding On the other hand, analyzing performance can’t be compromized despite the increase in data amount Efficient high-end proprietary machines are expensive

3 Parallel databases Shared-nothing MPP architecture (a collection of independent machines, each with local hard disk and main memory, connected together on high-speed network) Machines are cheaper, lower-end, commodity hardware Scales well up to a point, tens of nodes Good performance Poor fault tolerance Problems with heterogeneous environment (machines must be equal in performance) Good support for flexible query interface

4 MapReduce systems Cheap Scales well to thousands of nodes
Good support for heterogeneous environment Good fault tolerance Performance issues compared to parallel DBs Generally no support for SQL (excluding eg. Hive)

5 What is HadoopDB Recent study at Yale University, Database Research Dep. Hybrid architecture of parallel databases and MapReduce system The idea is to combine the best qualities of both technologies Multiple single-node databases are connected using Hadoop as the task coordinator and network communication layer Queries are distributed across the nodes by MapReduce framework, but as much work as possible is done in the database node

6 HadoopDB architecture
Reference: Azza Abouzeid, Kamil BajdaPawlikowski, Daniel Abadi, Avi Silberschatz, Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

7 Desired properties of HadoopDB
Performance Fault tolerance Support for heterogeneous environment Flexible query interface

8 Study benchmark systems
Hadoop system HadoopDB Vertica DBMS-X

9 Benchmark tasks Data loading Grep task Selection task Aggregation task
Join task UDF Aggregation task Fault tolerance and heterogeneous environment

10 Results 1/2 Reference: Azza Abouzeid, Kamil BajdaPawlikowski,
Daniel Abadi, Avi Silberschatz, Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

11 Results 2/2 Reference: Azza Abouzeid, Kamil BajdaPawlikowski,
Daniel Abadi, Avi Silberschatz, Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

12 Conclusions HadoopDB is close in performance to parallel databases
HadoopDB is able to operate in truly heterogeneous environment and has the fault tolerance of Hadoop environment Equal licensing costs to Hadoop Better performance expected in future

13 Further reading HadoopDB Project. Web page: Azza Abouzeid, Kamil BajdaPawlikowski, Daniel Abadi, Avi Silberschatz, Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads Hadoop Project. Hadoop Cluster Setup. Web page: .

14 Questions?


Download ppt "HadoopDB project An Architetural hybrid of MapReduce and DBMS Technologies for Analytical Workloads Anssi Salohalla."

Similar presentations


Ads by Google