Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  Concept of the Project  System architecture  Implementation – HDFS  Implementation – System.

Slides:



Advertisements
Similar presentations
Distributed and Parallel Processing Technology Chapter2. MapReduce
Advertisements

The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
MapReduce.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
SDN + Storage.
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Developing a MapReduce Application – packet dissection.
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
Resource Management with YARN: YARN Past, Present and Future
Hadoop: The Definitive Guide Chap. 2 MapReduce
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Introduction to MapReduce Programming & Local Hadoop Cluster Accesses Instructions Rozemary Scarlat August 31, 2011.
Jian Wang Based on “Meet Hadoop! Open Source Grid Computing” by Devaraj Das Yahoo! Inc. Bangalore & Apache Software Foundation.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
Hadoop: The Definitive Guide Chap. 8 MapReduce Features
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
THE HOG LANGUAGE A scripting MapReduce language. Jason Halpern Testing/Validation Samuel Messing Project Manager Benjamin Rapaport System Architect Kurry.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture VI: 2014/04/14.
HAMS Technologies 1
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
HAMS Technologies 1
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Cole Jaya Chakladar Group No: 1.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
Programming in Hadoop Guangda HU Huayang GUO
Virtualization and Databases Ashraf Aboulnaga University of Waterloo.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
NTU Cloud 2010/05/30. System Diagram Architecture Gluster File System – Provide a distributed shared file system for migration NFS – A Prototype Image.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Hadoop Joshua Nester, Garrison Vaughan, Calvin Sauerbier, Jonathan Pingilley, and Adam Albertson.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Kole Jaya Chakladar Group No: 1.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  System architecture  Implementation – HDFS  Implementation – System Analysis ◦ System Information.
Next Generation of Apache Hadoop MapReduce Owen
Part III BigData Analysis Tools (YARN) Yuan Xue
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
Big Data is a Big Deal!.
Hadoop MapReduce Framework
Hadoop Clusters Tess Fulkerson.
Software Engineering Introduction to Apache Hadoop Map Reduce
The Basics of Apache Hadoop
Cloud Distributed Computing Environment Hadoop
Hadoop Technopoints.
Introduction to Apache
Overview of big data tools
Presentation transcript:

Youngil Kim Awalin Sopan Sonia Ng Zeng

 Introduction  Concept of the Project  System architecture  Implementation – HDFS  Implementation – System Analysis ◦ System Information Logger (SIL) ◦ System Information Gatherer (SIG) ◦ Map/Reduce  Implementation – Visualization  Implementation – P2P Application  Demo

 How can we know system information from many nodes? ◦ It is hard to track which node has a problem when too many nodes exist  But… HDFS and Map/Reduce make it easy! ◦ Gather system information of each node to HDFS ◦ Analyze system information using Map/Reduce ◦ A kind of network managing system like HP’s Open View

 Tool to have an overview of the nodes in the P2P ◦ Still preserving the de-centralized nature of P2P ◦ Can be run on any computer – from within the P2P or outside of it. So, the computer running the tool is not necessarily the “master” ◦ If the tool is not running, the P2P still remains intact  Still, one can control the P2P from the tool  The tool will provide an interface to do both: overview and control ◦ Therefore, the user does not need to be an expert to work with a network system

p2p Local P2P app. p2p Local P2P app. p2p Local P2P app. p2p Local P2P app. P2P Network

System Info Gatherer (Hadoop Master) System Info Gatherer (Hadoop Master) Hadoop Slave Node Hadoop Slave HDFS p2p Local P2P app. p2p Local P2P app. p2p Local P2P app. p2p Local P2P app. Sys Info Logger P2P Network

System Info Gatherer (Hadoop Master) System Info Gatherer (Hadoop Master) Hadoop Slave Node Hadoop Slave HDFS System Manager (Visualization) System Manager (Visualization) p2p Local P2P app. p2p Local P2P app. p2p Local P2P app. p2p Local P2P app. Sys Info Logger System Control Network P2P Network System Information

 Implemented minimal P2P to show how our tool works ◦ How to control application or system on each node using visualization ◦ Has STOP/RESUME operations  Functions ◦ Response to “QUERY”  Show active/inactive (overview) ◦ Response to “CONTROL”  Change node status based on control argument (active/inactive)

 Hadoop for DFS & Map/Reduce Framework ◦ We use bug cluster ◦ Master: brood00 ◦ Slaves: Currently tested with 5 nodes (bug51 ~ bug55) ◦ Using each local storage  Using “/tmp” directory because home directory is not a local storage but NFS volume. ◦ Network Ports:  hdfs(9000), job tracker(9001),  Namenode Interface (50070), JobTracker Interface (50030)

 mr_syslog.py ◦ Implemented in Python ◦ Saves information in both local storage and HDFS ◦ Gathers information every 10 secs ◦ Creates logfile based on time  Information of each node is saved with the following format ◦ ◦ bug : mem(75.50), cpu(1.00), disk(10.00) ◦ bug : mem(75.50), cpu(1.50), disk(10.00) ◦ bug : mem(75.51), cpu(0.40), disk(10.00) ◦ bug : mem(75.51), cpu(0.50), disk(10.00) ◦ bug : mem(75.50), cpu(0.50), disk(10.00) ◦ bug : mem(75.50), cpu(0.40), disk(10.00)

 Functions ◦ Find current resource usage of each node at current time using Map/Reduce  Currently, it shows maximum values per minute time slot ◦ Communication Gateway between nodes and visualization tool  Send “QUERY” to each P2P application to check on the status of each node  Send node status to visualization tool  Node ID  Status (in/active)  CPU Usage  Memory Usage  Disk Storage

 Map: ◦ Input – each node log file  Key: position of file  Value: raw data, one line per key ◦ Output  Key: node ID  Value: set of system information (CPU/memory/storage usage)  Eg:

 Reduce: ◦ Input – from Map  Key: node ID  Value: set of set of system information  Eg: ◦ Output  Key: Node ID  Value: Maximum values for each piece of information  Eg:

 Written in Java  Used Prefuse toolkit for a tabular visualization for the node status  Only need to use the right-click menu to control the node  Live communication with the nodes ◦ To query the node status from the SIG ◦ To send commands to the nodes in the P2P network in real-time

 Initial view of all nodes  After stopping Bug53

 System set-up and initialization (video file)  Show namenode & jobtracker interface  Show Map/Reduce jobs  Show Visualization tool ◦ Changes of each status ◦ Control each P2P application