By: Joel Dominic and Carroll Wongchote 4/18/2012.

Slides:



Advertisements
Similar presentations
MapReduce Simplified Data Processing on Large Clusters
Advertisements

 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
Resource Management with YARN: YARN Past, Present and Future
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
Workshop on Basics & Hands on Kapil Bhosale M.Tech (CSE) Walchand College of Engineering, Sangli. (Worked on Hadoop in Tibco) 1.
Big Data and Hadoop and DLRL Introduction to the DLRL Hadoop Cluster Sunshin Lee and Edward A. Fox DLRL, CS, Virginia Tech 21 May 2015 presentation for.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
GROUP 7 TOOLS FOR BIG DATA Sandeep Prasad Dipojjwal Ray.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Frankie Pike. 2010: 1.2 zettabytes 1.2 trillion gigabytes DVDs past the moon 2-way = 6 newspapers everyday ~58% growth per year Why care?
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HDFS Hadoop Distributed File System
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Introduction to MapReduce ECE7610. The Age of Big-Data  Big-data age  Facebook collects 500 terabytes a day(2011)  Google collects 20000PB a day (2011)
HAMS Technologies 1
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
SEMINAR ON Guided by: Prof. D.V.Chaudhari Seminar by: Namrata Sakhare Roll No: 65 B.E.Comp.
HAMS Technologies 1
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
Hadoop Ali Sharza Khan High Performance Computing 1.
Introduction to Hadoop Owen O’Malley Yahoo!, Grid Team
Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Cole Jaya Chakladar Group No: 1.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
Team: 3 Md Liakat Ali Abdulaziz Altowayan Andreea Cotoranu Stephanie Haughton Gene Locklear Leslie Meadows.
Programming in Hadoop Guangda HU Huayang GUO
Hadoop implementation of MapReduce computational model Ján Vaňo.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.
Hadoop Joshua Nester, Garrison Vaughan, Calvin Sauerbier, Jonathan Pingilley, and Adam Albertson.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Kole Jaya Chakladar Group No: 1.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Next Generation of Apache Hadoop MapReduce Owen
Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Microsoft Ignite /28/2017 6:07 PM
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
Introduction to MapReduce and Hadoop
Map reduce Cs 595 Lecture 11.
Big Data is a Big Deal!.
Hadoop Aakash Kag What Why How 1.
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
INTRODUCTION TO BIGDATA & HADOOP
15-826: Multimedia Databases and Data Mining
Introduction to MapReduce and Hadoop
Introduction to HDFS: Hadoop Distributed File System
Hadoop Clusters Tess Fulkerson.
Software Engineering Introduction to Apache Hadoop Map Reduce
The Basics of Apache Hadoop
湖南大学-信息科学与工程学院-计算机与科学系
Hadoop Basics.
Presentation transcript:

By: Joel Dominic and Carroll Wongchote 4/18/2012

 Cloud Computing  Hadoop  Fault Tolerance  Mishaps  Solutions  Techniques  Results

 Many computers working together to complete a problem The Cloud

Big Problem Smaller Problem

 Software framework for distributed computing  Written in Java  Two components: HDFS and MapReduce.  Apache software project  Mimics Google File System and Google Map Reduce  Used for processing large amounts of text data  i.e. logs, web pages, etc.

 Hadoop Distributed File System Source:

 Built off 2 functional programming paradigms  map  reduce  map  map +2 [ 1, 2, 3, 4, 5, 6]  [(1+2), (2+2), (3+2), (4+2), (5+2), (6+2)] = [3, 4, 5, 6, 7, 8]  reduce  reduce + [3, 4, 5, 6, 7, 8]  ( ) = 33  reduce * [3, 4, 5, 6, 7, 8]  (3 * 4 * 5 * 6 * 7 * 8) = 60480

Object Mapper Object Mapper Result Reducer Final Result

 Facebook  “A 1100-machine cluster with 8800 cores and about 12 PB raw storage.”  “A 300-machine cluster with 2400 cores and about 3 PB raw storage.”  Yahoo!  “More than 100,000 CPUs in >40,000 computers running Hadoop”  “Our biggest cluster: 4500 nodes (2*4cpu boxes w 4*1TB disk & 16GB RAM)”

 What is fault tolerance?  Examples of fault tolerant systems  Brake system in cars  Columns on patio

 Hadoop was built with fault tolerance in mind  Failures happen  Don’t worry about failures just replicate data or processes  Hadoop works at the application layer to handle failures

 Topology  Machine Specifications  Methods  Physical computers  Virtualized computers  All in the same room  Manually installing the software (OS, Hadoop, etc) on each physical machine

 4 Virtual Machines  3GHz single-core processors, 512MB RAM, 8GB HDD  7 Physical Machines  Dell (2)  3GHz dual-core processor, 2GB RAM, 160GB HDD  3.4GHz single-core processor, 1GB RAM, 120GB HDD  Lenovo (5)  2.4GHz dual-core processor, 2GB RAM, 250GB HDD  Running Ubuntu LTS  Sun Java 6 JDK  Hadoop 0.20

Slave Node Master Node

 Campus blocking ports  MapReduce WARN: Attempt failure  MapReduce WARN: Connection failure  MapReduce job not completing  Virtualization  Copying machines  Connecting to the network

 Campus blocking ports  Moved from campus network to private network  MapReduce WARN: Attempt failure  MapReduce WARN: Connection failure  MapReduce job not completing  Both solved by editing the /etc/hosts file  /etc/hosts deals with resolving hostnames on local computers  Virtualization  Solved with determination

 Downloaded 164 books from gutenberg.org  ~200MB of text data  Ran a word count on the books with all nodes active  Control group  Ran the same program with different times and percentages of failures

 Increase in networking skills  Strong unix skills  Basic scripting  Network troubleshooting  Virtualization experience  Installing operating systems (~30+)  Understanding of Hadoop and fault tolerance  Programming routers

 Cloud Computing  Hadoop  Fault Tolerance  Mishaps  Solutions  Techniques  Results

  noll.com/tutorials/running-hadoop-on- ubuntu-linux-multi-node-cluster/ noll.com/tutorials/running-hadoop-on- ubuntu-linux-multi-node-cluster/     