Download presentation
Presentation is loading. Please wait.
Published byNickolas Richardson Modified over 9 years ago
1
Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis
2
Distributed Systems Hadoop Distributed File System (HDFS ) Distributed Database(HBase) MapReduce Programming Model Study of Β, Β + Trees Building Trees on Η Base Range Queries on B+ & B Trees Experiments in the Construction of Trees Analyzing Results Conclusions
3
Open Source Implementation of GFS Distributed File System Used by Google Google File System Distributed File System Management of Large Amount of Data Failure Detection & Automatic Recovery Scalability Designed Using Java Independent from Operating System Computers with Different Hardware
4
HBase Open Source Implementation of BigTable NoSQL Systems Organizing Data in Tables Tables Divided in Column Families Category: Column Family Stores Architecture Similar to HDFS Work Using HDFS
5
Distributed Programming Model Data Intensive Applications Distributed Computing in a Cluster of Machines Functional Programming Map Function Reduce Function Operations Data Structured in (key,value) Process Data Parallel at Input (Mapper) Process Intermediate Results(Reducer) Map(k1,v1) → List(k2,v2) Reduce(k2,list(v2)) → List(v3)
6
Mapper Input Data Processing Pairing in the Form (key,value) Custom Partitioner Data Clustering Specific Range of Values on Each Reducer Reducer Tree Building(BulkInsert,BulkLoading) Some Data saved in memory during process Cleanup Write Tree at Hbase Table
7
More Efficient Lesser Requirements in Physical Memory. Completion in Less Steps Ο (n/B). Relative Easy Implementation Execution Steps Sorted keys from Map Face Divide into Leafs Save Information for the Next Level Write Created Nodes when Buffer Full Repeat Procedure Until you Reach the Root
8
Tree Node = Row in Table Define Node Column Family Row Key Internal Nodes – Last Key of Respective Node Leafs – Adding a Special Tag in Front of Last Node key (Sorting in Lexicographic order)
9
Check Tree Range Find Leaf Leaf Including left range Leaf Including right range Hbase Table Scan to Find Keys Use Rowkey from each Leaf to Scan Complexity Τ Trees, Ε keys in Tree, Β Tree Order Ο (2*( Τ + log B (E) )
10
Respectively with B+ Trees Find Trees with Required Range Pinpoint Individual Trees from Start to End Execution of Depth First Search on Each Tree Depth First Search Retrieval of Keys in Internal Nodes Complexity Depth First Search Complexity Ο (|V| + |E|)* Τ
11
Hadoop & HBase Hadoop version 1.0.1 HBase version 0.94.1 Operating System Debian Base 6.0.5 Machines(4) – Okeanos 4 CPUs(Virtual) per machine RAM 2048MB per machine HDD 40 GB per machine Data tpc-H Orders Table (cust_id,order_id)
12
Experiment Observation Tree Order Execution Time Necessary Storage Space Physical Memory Number of Reducers
13
Comparison of Trees with Order 5 & 101 Augmented Execution Time Rebalance Operation Physical Memory & HDD Space Necessary Information for Tree Structure Conclusion Problem in Scalability Large Physical Memory Requirements Augmented Execution Time
16
Tree Order 5 Β+TreeB-Tree Data Input Size230ΜΒ230MB Output Tree Size2,2 GΒ1,4 GB Execution Time (sec) 900451 Median Execution Time Map(sec) 56,2955 Median Execution Time Shuffle (sec) 2828,75 Median Execution Time Reduce (sec) 125,588,25 Number of Reducers 88 Physical Memory Allocated19525 MB15222 MB Tree Order 101 Β+TreeB-Tree Input Data Size230ΜΒ230MB Output Tree Size598,2ΜΒ256MB Execution Time (sec) 263246 Median Execution Time Map (sec) 5249,86 Median Execution Time Shuffle (sec) 28,6329,75 Median Execution Time Reduce (sec) 68,2566,25 Number of Reducers 88 Physical Memory Allocated9501 MB9286 MB
17
BulkLoading vs BulkInsert Comparison Smaller Execution Time Less Requirements in Physical Memory Smaller Required Space on HDD Testing Buffer Fluctuation Buffer 128,512 Smaller Execution Time Adjustable Requirements for Physical Memory
20
Tree Order 101Β+TreeB-Tree Input Data Size230ΜΒ230MB Output Tree Size267,1ΜΒ256MB Execution Time (sec) 132125 Median Execution Time Map(sec) 51,1453,57 Median Execution Time Reduce (sec) 43,537,75 Number of Reducers 88 Buffer Size(Put Objects) 128 Physical Memory Allocated6517 ΜΒ6165 ΜΒ Tree Order 101Β+TreeB-Tree Input Data Size230ΜΒ230MB Output Tree Size267,1ΜΒ256MB Execution Time (sec) 114108 Median Execution Time Map(sec) 5255,14 Median Execution Time Reduce (sec) 3330,63 Number of Reducers 88 Buffer Size(Put Objects) 512 Physical Memory Allocated6613 ΜΒ6678 ΜΒ
21
In Comparing Building Techniques BulkInsert Precise Choice of Tree Order Augmented Execution Time with Small Order Trees Due to constant Rebalancing High Physical Memory Requirements Not So Scalable BulkLoading Created Tree is Full ( Next Insert could cause an Tree Rebalancing) Smaller Execution Time Adjustable Requirements in Physical Memory More Complicated Implementation Why Use B & B+ Trees In Collaboration with Pre-Warm Techniques Less Burden on Master. Communication Between Slaves
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.