Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis.

Similar presentations


Presentation on theme: "Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis."— Presentation transcript:

1 Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis

2  Distributed Systems  Hadoop Distributed File System (HDFS )  Distributed Database(HBase)  MapReduce Programming Model  Study of Β, Β + Trees  Building Trees on Η Base  Range Queries on B+ & B Trees  Experiments in the Construction of Trees  Analyzing Results  Conclusions

3  Open Source Implementation of GFS  Distributed File System Used by Google  Google File System  Distributed File System  Management of Large Amount of Data  Failure Detection & Automatic Recovery  Scalability  Designed Using Java  Independent from Operating System  Computers with Different Hardware

4  HBase  Open Source Implementation of BigTable  NoSQL Systems  Organizing Data in Tables  Tables Divided in Column Families  Category: Column Family Stores  Architecture Similar to HDFS  Work Using HDFS

5  Distributed Programming Model  Data Intensive Applications  Distributed Computing in a Cluster of Machines  Functional Programming  Map Function  Reduce Function  Operations  Data Structured in (key,value)  Process Data Parallel at Input (Mapper)  Process Intermediate Results(Reducer)  Map(k1,v1) → List(k2,v2)  Reduce(k2,list(v2)) → List(v3)

6  Mapper  Input Data Processing  Pairing in the Form (key,value)  Custom Partitioner  Data Clustering  Specific Range of Values on Each Reducer  Reducer  Tree Building(BulkInsert,BulkLoading)  Some Data saved in memory during process  Cleanup  Write Tree at Hbase Table

7  More Efficient  Lesser Requirements in Physical Memory.  Completion in Less Steps Ο (n/B).  Relative Easy Implementation  Execution Steps  Sorted keys from Map Face  Divide into Leafs  Save Information for the Next Level  Write Created Nodes when Buffer Full  Repeat Procedure Until you Reach the Root

8  Tree Node = Row in Table  Define Node Column Family  Row Key  Internal Nodes – Last Key of Respective Node  Leafs – Adding a Special Tag in Front of Last Node key (Sorting in Lexicographic order)

9  Check Tree Range  Find Leaf  Leaf Including left range  Leaf Including right range  Hbase Table  Scan to Find Keys  Use Rowkey from each Leaf to Scan  Complexity  Τ Trees, Ε keys in Tree, Β Tree Order  Ο (2*( Τ + log B (E) )

10  Respectively with B+ Trees  Find Trees with Required Range  Pinpoint Individual Trees from Start to End  Execution of Depth First Search on Each Tree  Depth First Search  Retrieval of Keys in Internal Nodes  Complexity  Depth First Search Complexity  Ο (|V| + |E|)* Τ

11  Hadoop & HBase  Hadoop version 1.0.1  HBase version 0.94.1  Operating System  Debian Base 6.0.5  Machines(4) – Okeanos  4 CPUs(Virtual) per machine  RAM 2048MB per machine  HDD 40 GB per machine  Data  tpc-H  Orders Table (cust_id,order_id)

12  Experiment Observation  Tree Order  Execution Time  Necessary Storage Space  Physical Memory  Number of Reducers

13  Comparison of Trees with Order 5 & 101  Augmented Execution Time  Rebalance Operation  Physical Memory & HDD Space  Necessary Information for Tree Structure  Conclusion  Problem in Scalability  Large Physical Memory Requirements  Augmented Execution Time

14

15

16 Tree Order 5 Β+TreeB-Tree Data Input Size230ΜΒ230MB Output Tree Size2,2 GΒ1,4 GB Execution Time (sec) 900451 Median Execution Time Map(sec) 56,2955 Median Execution Time Shuffle (sec) 2828,75 Median Execution Time Reduce (sec) 125,588,25 Number of Reducers 88 Physical Memory Allocated19525 MB15222 MB Tree Order 101 Β+TreeB-Tree Input Data Size230ΜΒ230MB Output Tree Size598,2ΜΒ256MB Execution Time (sec) 263246 Median Execution Time Map (sec) 5249,86 Median Execution Time Shuffle (sec) 28,6329,75 Median Execution Time Reduce (sec) 68,2566,25 Number of Reducers 88 Physical Memory Allocated9501 MB9286 MB

17  BulkLoading vs BulkInsert Comparison  Smaller Execution Time  Less Requirements in Physical Memory  Smaller Required Space on HDD  Testing Buffer Fluctuation  Buffer 128,512  Smaller Execution Time  Adjustable Requirements for Physical Memory

18

19

20 Tree Order 101Β+TreeB-Tree Input Data Size230ΜΒ230MB Output Tree Size267,1ΜΒ256MB Execution Time (sec) 132125 Median Execution Time Map(sec) 51,1453,57 Median Execution Time Reduce (sec) 43,537,75 Number of Reducers 88 Buffer Size(Put Objects) 128 Physical Memory Allocated6517 ΜΒ6165 ΜΒ Tree Order 101Β+TreeB-Tree Input Data Size230ΜΒ230MB Output Tree Size267,1ΜΒ256MB Execution Time (sec) 114108 Median Execution Time Map(sec) 5255,14 Median Execution Time Reduce (sec) 3330,63 Number of Reducers 88 Buffer Size(Put Objects) 512 Physical Memory Allocated6613 ΜΒ6678 ΜΒ

21  In Comparing Building Techniques  BulkInsert  Precise Choice of Tree Order  Augmented Execution Time with Small Order Trees Due to constant Rebalancing  High Physical Memory Requirements  Not So Scalable  BulkLoading  Created Tree is Full ( Next Insert could cause an Tree Rebalancing)  Smaller Execution Time  Adjustable Requirements in Physical Memory  More Complicated Implementation  Why Use B & B+ Trees  In Collaboration with Pre-Warm Techniques  Less Burden on Master.  Communication Between Slaves

22


Download ppt "Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis."

Similar presentations


Ads by Google