Download presentation
Presentation is loading. Please wait.
1
Programming in Hadoop Guangda HU tarlou.gd@gmail.com Huayang GUO dragonghy@gmail.com
2
Hadoop Overview About Hadoop –Apache Hadoop is a Java software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data.
3
Hadoop Overview Architecture –HDFS (Hadoop Distributed File System) –Job Tracker –Task Tracker
4
Hadoop Overview Mechanism –Map and Reduce
5
Hadoop Overview Applications –Facebook (Hadoop, Hive, Scribe) –Yahoo! (Hadoop in Yahoo Search) –Veritas (San Point Direct, Veritas File System) –IBM Transarc (Andrew File System) –UW Computer Science Alumni (Condor Project)
6
Our Work Setup running environment –Single node setup –Multi-node cluster setup –Network access Experiments and analysis –Word count –Integration –Largest number
7
Environment Setup Hardware –Two multi-core machines with Linux –Ethernet connection Software –Ubuntu 9.04 –Hadoop 0.20.1 –Five virtual machine on VirtualBox
8
Environment Setup Cluster structure –Two machines 166.111.69.85 59.66.132.161 –One master node –Three slave nodes
9
Experiments Benchmark –Word count (default example) –Super word count (SuperWordCount.java) –Integration (Integration.java) –Largest numbers (LargestGen.java)
10
Benchmark Analysis
12
More experiments FilesComputationTime (s) 242.4 * 10 9 102 1202.4 * 10 9 179 NodesFilesSlope (sec/10 9 ) 424≈ 30 224≈ 40
13
Challenges & Acquirements Network & virtual cluster communication Hadoop technique survey Cooperation
14
References http://www.ibm.com/developerworks/cn/ http://en.wikipedia.org/wiki/Hadoop http://www.michael-noll.com/wiki/ Linux Man Pages Hadoop source code and Java Doc
15
Thanks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.