Hadoop Clustering Performance testing on the small scale. Jonathan Pingilley, Garrison Vaughan, Calvin Sauerbier, Joshua Nester, Adam Albertson.

Hadoop Clustering Performance testing on the small scale. Jonathan Pingilley, Garrison Vaughan, Calvin Sauerbier, Joshua Nester, Adam Albertson

Hadoop – A Quick Look What is Hadoop?

Distributed Computing framework for data- intensive distributed applications Commonly used in large clusters of Commercial-Off-The-Shelf Hardware Noted for Reliability and Speed and failure/fault tolorance.

THE QUESTION? Small Cluster Performance and reliability.

Testing Overview Three Main Tests – Speed and Data loss – Fault Tolerance – Node Recovery Hardware – repurposed Dell Optiplex 270 and 280 units for compatibility reasons

Test 1 DataLoss Tolerance Single simplest test of our testing procedure Word count on cluster, deleting all books on DFS I minute in and monitoring the result

Test 2 Speed Baselines Baseline test, with only a single node – Exact command not usable on just a single node, but a close duplicate was located to simulate similar results: » Cat *.txt | tr ‘’ ‘\n’ |sort |uniq –ic Baseline with cluster – Nearly identical to the single node test, but using the cluster as a whole, using 1-4 nodes Tests run 3 times and averaged for consistency

Test 3 Speed with Node Failure Variable tests with 1 to 3 nodes removed and complete task analysis. Each variation run 3 times and averaged for time comparisons

Test 4 Speed with Node Recovery Variable tests with 1 to 3 nodes removed 1 minute in, reconnected 1 minute later and complete task analyzed. Each variation run 3 times and averaged for time comparisons

Test Parameters All books loaded onto the master node and DFS. Default timeout changed from 10 minutes to 30 seconds to allow for timely testing. Node removal was 1 minute in.

RESULTS You are required to maneuver straight down this trench…

Data Loss Tolerance Test Group 1 Presentation.

Hadoop Speed Test Group 1 Presentation – Independent Test 22m 33s – 1 node 29m 50s w/ 22s deviation – 2 nodes 17m 32s w/ 18s deviation – 3 nodes 15m 6s w/ 16s deviation – 4 nodes 3m 54s w/6s deviation

Speed w/ Node Failure One Node removed – 13m 57s w/ 17s deviation 2 nodes – 16m 5s w/ 25s deviation 3 nodes – 28m 19s w/ 19s deviation

Speed w/ Node Recovery One Node Removed and Recovered – 5m 9s w/ 6s deviation – Recovery: 1m 3s w/ 3s deviation 2 nodes – 5m 27s w/ 8s deviation – Recovery: 51s w/ 2s deviation 3 nodes – 5m 31s w/ 6s deviation – Recovery: 54s w/ 5s deviation

CONCLUSION Is this the end?

Conclusion Hadoop overhead is large on clusters numbering less than 4 nodes – Roughly 24% overhead w/ a performance degradation of 50% Upon introduction of a 4 th node, average node performance dramatically increases up to 144% due to optimizations. Performance numbers were reflected in the tests performed, and loss of nodes impacted total time to compute minimally

Conclusion, Part Deux. Recovery performance was outstanding – nodes were disconnected for 1 minute and aside for a couple seconds of resync and overhead reintegrated without trouble.

The Final Word Ultimately, Hadoop performed above and beyond expectations, proving to be a valid and relatively inexpensive way to handle managing large volumes of certain kinds of data when used above 4 nodes. Excellent recovery and performance, and relatively easy to use.

Hadoop Clustering Performance testing on the small scale. Jonathan Pingilley, Garrison Vaughan, Calvin Sauerbier, Joshua Nester, Adam Albertson.

Similar presentations

Presentation on theme: "Hadoop Clustering Performance testing on the small scale. Jonathan Pingilley, Garrison Vaughan, Calvin Sauerbier, Joshua Nester, Adam Albertson."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hadoop Clustering Performance testing on the small scale. Jonathan Pingilley, Garrison Vaughan, Calvin Sauerbier, Joshua Nester, Adam Albertson.

Similar presentations

Presentation on theme: "Hadoop Clustering Performance testing on the small scale. Jonathan Pingilley, Garrison Vaughan, Calvin Sauerbier, Joshua Nester, Adam Albertson."— Presentation transcript:

Similar presentations

About project

Feedback