+ 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108.

+ 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

+ Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

+ What is Hadoop ? open-source software framework process and store big data Easy to use and implement, economic, flexible lots of nodes(server) written in JAVA free license created by Doug Cutting and Mike Cafarella in 2005 3 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

+ Advantages of Interpreted Language Cross-platform(ex: Windows, Ubuntu, Mac OS X) smaller executable program size easier to modify during both development and execution 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 4

+ Architecture of Hadoop 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 5

+ Hadoop in Enterprise 6 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 The Dell representation of the Hadoop ecosystem.

+ Hadoop in Enterprise 7 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

+ Who is using Hadoop ? more than half of the Fortune 50 uses Hadoop by 2013 8 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

+ HDFS Hadoop Distributed File System Client: user name node: manage and store metadata, namespace of files Data node: store files each data node sends its status to name node periodically 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 9

+ HDFS: Writing data in HDFS Each file will be divided into blocks(in size 64 or 128MB), and have three copies in different data nodes. Client asks name node to get a list of data node sorted by distance, and send the file to the nearest one, then the data node will send the file to the rest node. When above operation done, data node will send “done” to name node. 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 10

+ HDFS: Reading data in HDFS Client send filename to the name node, then the name node will send a list of the blocks of files sorted by distance. Client use the list to get the file from data node. 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 11

+ HDFS: failure node failure communication failure data corruption 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 12

+ HDFS: handle failure Handle writing failure: name node will skip the data node without an ACK. Handle reading failure: recall that when reading a file, client will get a list of data node content the file. 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 13

+ HDFS: handle failure Name node handle node failure : name node will find out the data the failure node have, and copy those data from others and restore them to other data node. Note that HDFS can’t guarantee at least one copy of data is alive. 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 14

+ MapReduce similar to divide-and-conquer First, use “Map” to divide tasks Second, use “Shuffle” to “transfer the data from the mapper nodes to a reducer’s node and decompress if needed. “ Third, use “Reduce” to “execute the user- defined reduce function to produce the final output data. “ 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 15

+ MapReduce-Map 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 16

+ MapReduce-shuffle 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 17

+ MapReduce-Reduce 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 18

+ MapReduce 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 19

+ Comparison 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 20

+ Comparison 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 21

+ Why Hadoop? technically 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 22 Comparison of Grep Task Result with Vertica and DBMS-X

+ Why Hadoop? Simple structure vs. Optimization Transaction time not minimized Lower performance with same number of nodes No compelling reason to choose Hadoop technically 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 23

+ Why Hadoop? commercially 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 24

+ Why Hadoop Cheap (Buy more servers to beat DBMS) Flexible (Both in design and deployment) Easier to design Easier to scale up Combine with other system to achieve better performance commercially 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 25

+ Conclusion Hadoop is much easier for users to implement and more economic MapReduce advocates should study the techniques used in parallel DBMSs Hybrid systems are also popular With improvement of performance, we believe Hadoop will lead the trend of big data computing 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 26

+ Reference http://hadoop.apache.org/ http://www.runpc.com.tw/content/cloud_content.aspx?id=105318 http://en.wikipedia.org/wiki/Apache_Hadoo https://www.facebookbrand.com/ http://assets.fontsinuse.com/static/use-media-items/15/14246/full- 2048x768/522903b7/Yahoo_Logo.png http://assets.fontsinuse.com/static/use-media-items/15/14246/full- 2048x768/522903b7/Yahoo_Logo.png http://wiki.apache.org/hadoop/PoweredBy http://semiaccurate.com/assets/uploads/2011/09/Amazon-logo.jpg http://www.conceptcupboard.com/blog/wp- content/uploads/2013/09/google.jpg http://www.conceptcupboard.com/blog/wp- content/uploads/2013/09/google.jpg 27 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

+ Reference http://datashieldcorp.com/files/2013/11/adobe-LOGO-2.jpg http://upload.wikimedia.org/wikipedia/commons/7/77/The_New_ York_Times_logo.png http://upload.wikimedia.org/wikipedia/commons/7/77/The_New_ York_Times_logo.png http://i.dell.com/sites/content/business/solutions/whitepapers/en/ Documents/hadoop-introduction.pdf http://i.dell.com/sites/content/business/solutions/whitepapers/en/ Documents/hadoop-introduction.pdf http://hadoop.intel.com/pdfs/IntelDistributionReferenceArchitectur e.pdf http://hadoop.intel.com/pdfs/IntelDistributionReferenceArchitectur e.pdf http://www.google.com.tw/url?sa=t&rct=j&q=&esrc=s&source=we b&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fwww.classcloud. org%2Fcloud%2Fraw- attachment%2Fwiki%2FHinet100402%2F02.HadoopOverview.pdf& ei=IE2XUtLfBMfxiAea_oHQCA&usg=AFQjCNFoIXxLJrOnoul4cKJpQ8 v3_kuTYg 28 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

+ Reference http://www.accenture.com/SiteCollectionDocuments/PDF/Accentur e-Hadoop-Deployment-Comparison-Study.pdf http://www.accenture.com/SiteCollectionDocuments/PDF/Accentur e-Hadoop-Deployment-Comparison-Study.pdf https://www.google.com.tw/url?sa=t&rct=j&q&esrc=s&source=web &cd=1&ved=0CCkQFjAA&url=http%3A%2F%2Fwww.psgtech.edu %2Fyrgcc%2Fattach%2FMAP%2520REDUCE%2520PROGRAMMIN G.ppt&ei=7lGXUtvCJsy5iAfWtYH4Bw&usg=AFQjCNGWRKJLal- tvbvORULZV6_Te2y74g&sig2=Ba77ihsV1SEqcNeEFkRzfg https://www.cs.duke.edu/starfish/files/hadoop-models.pdf http://dotnetmis91.blogspot.tw/2010/04/hdfs-hadoop- mapreduce.html http://dotnetmis91.blogspot.tw/2010/04/hdfs-hadoop- mapreduce.html http://wiki.apache.org/hadoop/HDFS http://www.ewdna.com/2013/04/Hadoop-HDFS-Comics.html 29 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀

+ Reference http://en.wikipedia.org/wiki/Interpreted_language A Comparison of Approaches to Large-Scale Data Analysis by Sam Madden http://www.cc.ntu.edu.tw/chinese/epaper/0011/20091220_1106.ht m http://www.cc.ntu.edu.tw/chinese/epaper/0011/20091220_1106.ht m http://web.cs.wpi.edu/~cs561/s12/Lectures/6/Hadoop.pdf http://www.mobilemartin.com/mobile/show-me-the-mobile- money.jpg 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 30

+ 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108.

Similar presentations

Presentation on theme: "+ 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

+ 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108.

Similar presentations

Presentation on theme: "+ 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion 2 100062108."— Presentation transcript:

Similar presentations

About project

Feedback