Download presentation
Published byEstella Johns Modified over 9 years ago
1
HadoopDB project An Architetural hybrid of MapReduce and DBMS Technologies for Analytical Workloads Anssi Salohalla
2
Background Amount of data that needs to be stored for analyzing is exploding On the other hand, analyzing performance can’t be compromized despite the increase in data amount Efficient high-end proprietary machines are expensive
3
Parallel databases Shared-nothing MPP architecture (a collection of independent machines, each with local hard disk and main memory, connected together on high-speed network) Machines are cheaper, lower-end, commodity hardware Scales well up to a point, tens of nodes Good performance Poor fault tolerance Problems with heterogeneous environment (machines must be equal in performance) Good support for flexible query interface
4
MapReduce systems Cheap Scales well to thousands of nodes
Good support for heterogeneous environment Good fault tolerance Performance issues compared to parallel DBs Generally no support for SQL (excluding eg. Hive)
5
What is HadoopDB Recent study at Yale University, Database Research Dep. Hybrid architecture of parallel databases and MapReduce system The idea is to combine the best qualities of both technologies Multiple single-node databases are connected using Hadoop as the task coordinator and network communication layer Queries are distributed across the nodes by MapReduce framework, but as much work as possible is done in the database node
6
HadoopDB architecture
Reference: Azza Abouzeid, Kamil BajdaPawlikowski, Daniel Abadi, Avi Silberschatz, Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
7
Desired properties of HadoopDB
Performance Fault tolerance Support for heterogeneous environment Flexible query interface
8
Study benchmark systems
Hadoop system HadoopDB Vertica DBMS-X
9
Benchmark tasks Data loading Grep task Selection task Aggregation task
Join task UDF Aggregation task Fault tolerance and heterogeneous environment
10
Results 1/2 Reference: Azza Abouzeid, Kamil BajdaPawlikowski,
Daniel Abadi, Avi Silberschatz, Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
11
Results 2/2 Reference: Azza Abouzeid, Kamil BajdaPawlikowski,
Daniel Abadi, Avi Silberschatz, Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
12
Conclusions HadoopDB is close in performance to parallel databases
HadoopDB is able to operate in truly heterogeneous environment and has the fault tolerance of Hadoop environment Equal licensing costs to Hadoop Better performance expected in future
13
Further reading HadoopDB Project. Web page: Azza Abouzeid, Kamil BajdaPawlikowski, Daniel Abadi, Avi Silberschatz, Alexander Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads Hadoop Project. Hadoop Cluster Setup. Web page: .
14
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.