By: Joel Dominic and Carroll Wongchote 4/18/2012
Cloud Computing Hadoop Fault Tolerance Mishaps Solutions Techniques Results
Many computers working together to complete a problem The Cloud
Big Problem Smaller Problem
Software framework for distributed computing Written in Java Two components: HDFS and MapReduce. Apache software project Mimics Google File System and Google Map Reduce Used for processing large amounts of text data i.e. logs, web pages, etc.
Hadoop Distributed File System Source:
Built off 2 functional programming paradigms map reduce map map +2 [ 1, 2, 3, 4, 5, 6] [(1+2), (2+2), (3+2), (4+2), (5+2), (6+2)] = [3, 4, 5, 6, 7, 8] reduce reduce + [3, 4, 5, 6, 7, 8] ( ) = 33 reduce * [3, 4, 5, 6, 7, 8] (3 * 4 * 5 * 6 * 7 * 8) = 60480
Object Mapper Object Mapper Result Reducer Final Result
Facebook “A 1100-machine cluster with 8800 cores and about 12 PB raw storage.” “A 300-machine cluster with 2400 cores and about 3 PB raw storage.” Yahoo! “More than 100,000 CPUs in >40,000 computers running Hadoop” “Our biggest cluster: 4500 nodes (2*4cpu boxes w 4*1TB disk & 16GB RAM)”
What is fault tolerance? Examples of fault tolerant systems Brake system in cars Columns on patio
Hadoop was built with fault tolerance in mind Failures happen Don’t worry about failures just replicate data or processes Hadoop works at the application layer to handle failures
Topology Machine Specifications Methods Physical computers Virtualized computers All in the same room Manually installing the software (OS, Hadoop, etc) on each physical machine
4 Virtual Machines 3GHz single-core processors, 512MB RAM, 8GB HDD 7 Physical Machines Dell (2) 3GHz dual-core processor, 2GB RAM, 160GB HDD 3.4GHz single-core processor, 1GB RAM, 120GB HDD Lenovo (5) 2.4GHz dual-core processor, 2GB RAM, 250GB HDD Running Ubuntu LTS Sun Java 6 JDK Hadoop 0.20
Slave Node Master Node
Campus blocking ports MapReduce WARN: Attempt failure MapReduce WARN: Connection failure MapReduce job not completing Virtualization Copying machines Connecting to the network
Campus blocking ports Moved from campus network to private network MapReduce WARN: Attempt failure MapReduce WARN: Connection failure MapReduce job not completing Both solved by editing the /etc/hosts file /etc/hosts deals with resolving hostnames on local computers Virtualization Solved with determination
Downloaded 164 books from gutenberg.org ~200MB of text data Ran a word count on the books with all nodes active Control group Ran the same program with different times and percentages of failures
Increase in networking skills Strong unix skills Basic scripting Network troubleshooting Virtualization experience Installing operating systems (~30+) Understanding of Hadoop and fault tolerance Programming routers
Cloud Computing Hadoop Fault Tolerance Mishaps Solutions Techniques Results
noll.com/tutorials/running-hadoop-on- ubuntu-linux-multi-node-cluster/ noll.com/tutorials/running-hadoop-on- ubuntu-linux-multi-node-cluster/