李智宇、 林威宏、 施閔耀
+ Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 李智宇、 林威宏、 施閔耀
+ What is Hadoop ? open-source software framework process and store big data Easy to use and implement, economic, flexible lots of nodes(server) written in JAVA free license created by Doug Cutting and Mike Cafarella in 李智宇、 林威宏、 施閔耀
+ Advantages of Interpreted Language Cross-platform(ex: Windows, Ubuntu, Mac OS X) smaller executable program size easier to modify during both development and execution 李智宇、 林威宏、 施閔耀 4
+ Architecture of Hadoop 李智宇、 林威宏、 施閔耀 5
+ Hadoop in Enterprise 李智宇、 林威宏、 施閔耀 The Dell representation of the Hadoop ecosystem.
+ Hadoop in Enterprise 李智宇、 林威宏、 施閔耀
+ Who is using Hadoop ? more than half of the Fortune 50 uses Hadoop by 李智宇、 林威宏、 施閔耀
+ HDFS Hadoop Distributed File System Client: user name node: manage and store metadata, namespace of files Data node: store files each data node sends its status to name node periodically 李智宇、 林威宏、 施閔耀 9
+ HDFS: Writing data in HDFS Each file will be divided into blocks(in size 64 or 128MB), and have three copies in different data nodes. Client asks name node to get a list of data node sorted by distance, and send the file to the nearest one, then the data node will send the file to the rest node. When above operation done, data node will send “done” to name node 李智宇、 林威宏、 施閔耀 10
+ HDFS: Reading data in HDFS Client send filename to the name node, then the name node will send a list of the blocks of files sorted by distance. Client use the list to get the file from data node 李智宇、 林威宏、 施閔耀 11
+ HDFS: failure node failure communication failure data corruption 李智宇、 林威宏、 施閔耀 12
+ HDFS: handle failure Handle writing failure: name node will skip the data node without an ACK. Handle reading failure: recall that when reading a file, client will get a list of data node content the file 李智宇、 林威宏、 施閔耀 13
+ HDFS: handle failure Name node handle node failure : name node will find out the data the failure node have, and copy those data from others and restore them to other data node. Note that HDFS can’t guarantee at least one copy of data is alive 李智宇、 林威宏、 施閔耀 14
+ MapReduce similar to divide-and-conquer First, use “Map” to divide tasks Second, use “Shuffle” to “transfer the data from the mapper nodes to a reducer’s node and decompress if needed. “ Third, use “Reduce” to “execute the user- defined reduce function to produce the final output data. “ 李智宇、 林威宏、 施閔耀 15
+ MapReduce-Map 李智宇、 林威宏、 施閔耀 16
+ MapReduce-shuffle 李智宇、 林威宏、 施閔耀 17
+ MapReduce-Reduce 李智宇、 林威宏、 施閔耀 18
+ MapReduce 李智宇、 林威宏、 施閔耀 19
+ Comparison 李智宇、 林威宏、 施閔耀 20
+ Comparison 李智宇、 林威宏、 施閔耀 21
+ Why Hadoop? technically 李智宇、 林威宏、 施閔耀 22 Comparison of Grep Task Result with Vertica and DBMS-X
+ Why Hadoop? Simple structure vs. Optimization Transaction time not minimized Lower performance with same number of nodes No compelling reason to choose Hadoop technically 李智宇、 林威宏、 施閔耀 23
+ Why Hadoop? commercially 李智宇、 林威宏、 施閔耀 24
+ Why Hadoop Cheap (Buy more servers to beat DBMS) Flexible (Both in design and deployment) Easier to design Easier to scale up Combine with other system to achieve better performance commercially 李智宇、 林威宏、 施閔耀 25
+ Conclusion Hadoop is much easier for users to implement and more economic MapReduce advocates should study the techniques used in parallel DBMSs Hybrid systems are also popular With improvement of performance, we believe Hadoop will lead the trend of big data computing 李智宇、 林威宏、 施閔耀 26
+ Reference x768/522903b7/Yahoo_Logo.png x768/522903b7/Yahoo_Logo.png content/uploads/2013/09/google.jpg content/uploads/2013/09/google.jpg 李智宇、 林威宏、 施閔耀
+ Reference York_Times_logo.png York_Times_logo.png Documents/hadoop-introduction.pdf Documents/hadoop-introduction.pdf e.pdf e.pdf b&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fwww.classcloud. org%2Fcloud%2Fraw- attachment%2Fwiki%2FHinet100402%2F02.HadoopOverview.pdf& ei=IE2XUtLfBMfxiAea_oHQCA&usg=AFQjCNFoIXxLJrOnoul4cKJpQ8 v3_kuTYg 李智宇、 林威宏、 施閔耀
+ Reference e-Hadoop-Deployment-Comparison-Study.pdf e-Hadoop-Deployment-Comparison-Study.pdf &cd=1&ved=0CCkQFjAA&url=http%3A%2F%2Fwww.psgtech.edu %2Fyrgcc%2Fattach%2FMAP%2520REDUCE%2520PROGRAMMIN G.ppt&ei=7lGXUtvCJsy5iAfWtYH4Bw&usg=AFQjCNGWRKJLal- tvbvORULZV6_Te2y74g&sig2=Ba77ihsV1SEqcNeEFkRzfg mapreduce.html mapreduce.html 李智宇、 林威宏、 施閔耀
+ Reference A Comparison of Approaches to Large-Scale Data Analysis by Sam Madden m m money.jpg 李智宇、 林威宏、 施閔耀 30