Analyzing Yellowstone’s Network with a Raspberry Pi Cluster Lauren Patterson.

Slides:



Advertisements
Similar presentations
Hadoop at ContextWeb February ContextWeb: Traffic Traffic – up to 6 thousand Ad requests per second. Comscore Trend Data:
Advertisements

and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
Addition Facts
Distributed and Parallel Processing Technology Chapter2. MapReduce
Beyond Mapper and Reducer
Introduction to Hadoop Richard Holowczak Baruch College.
Interconnection Networks: Flow Control and Microarchitecture.
The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
Chapter 12 Joining Tables Part C. SQL Copyright 2005 Radian Publishing Co.
Dan Bassett, Jonathan Canfield December 13, 2011.
Addition 1’s to 20.
25 seconds left…...
Week 1.
Vanderbilt Business Objects Users Group 1 Linking Data from Multiple Sources.
Big Data + SDN SDN Abstractions. The Story Thus Far Different types of traffic in clusters Background Traffic – Bulk transfers – Control messages Active.
Raspberry Pi Performance Benchmarking
Hui Li Pig Tutorial Hui Li Some material adapted from slides by Adam Kawa the 3rd meeting of WHUG June 21, 2012.
Hadoop Pig By Ravikrishna Adepu.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
© Hortonworks Inc Daniel Dai Thejas Nair Page 1 Making Pig Fly Optimizing Data Processing on Hadoop.
Developing a MapReduce Application – packet dissection.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture VII: 2014/04/21.
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
(Hadoop) Pig Dataflow Language B. Ramamurthy Based on Cloudera’s tutorials and Apache’s Pig Manual 6/27/2015.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Standard architecture emerging: – Cluster of commodity.
Overview of Hadoop for Data Mining Federal Big Data Group confidential Mark Silverman Treeminer, Inc. 155 Gibbs Street Suite 514 Rockville, Maryland
Introduction to MapReduce Programming & Local Hadoop Cluster Accesses Instructions Rozemary Scarlat August 31, 2011.
CS525: Big Data Analytics MapReduce Languages Fall 2013 Elke A. Rundensteiner 1.
HADOOP ADMIN: Session -2
THE HOG LANGUAGE A scripting MapReduce language. Jason Halpern Testing/Validation Samuel Messing Project Manager Benjamin Rapaport System Architect Kurry.
HAMS Technologies 1
Pi In The Sky (Web Interface) Gaston Seneza Philander Smith College, Little Rock, AR SIParCS Intern Mentors: Dr. Richard Loft & Dr. Raghu Raj Kumar 1.
Sky Agile Horizons Hadoop at Sky. What is Hadoop? - Reliable, Scalable, Distributed Where did it come from? - Community + Yahoo! Where is it now? - Apache.
Pig Latin CS 6800 Utah State University. Writing MapReduce Jobs Higher order functions Map applies a function to a list Example list [1, 2, 3, 4] Want.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Introduction to Hadoop and HDFS
Pi In The Sky (Storing Big Data on Cloud) Jenish Koirala Claflin University Mentors: Dr. Raghu Raj, Dr. Richard Loft SIParCS at Mesa Lab, NCAR Boulder,
MapReduce High-Level Languages Spring 2014 WPI, Mohamed Eltabakh 1.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
An Introduction to HDInsight June 27 th,
Presented by Priagung Khusumanegara Prof. Kyungbaek Kim
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
By: Stephanie H..  m+Corning+Ar&ie=UTF- 8&ei=zlX6UqWVLcidyQGN7YH4Ag&ved=0CAcQ_AUoAQ
© 2012 Unisys Corporation. All rights reserved. 1 Unisys Corporation. Proprietary and Confidential.
Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.
Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  Concept of the Project  System architecture  Implementation – HDFS  Implementation – System.
Programming in Hadoop Guangda HU Huayang GUO
Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
Apache PIG rev Tools for Data Analysis with Hadoop Hadoop HDFS MapReduce Pig Statistical Software Hive.
MapReduce Joins Shalish.V.J. A Refresher on Joins A join is an operation that combines records from two or more data sets based on a field or set of fields,
Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  System architecture  Implementation – HDFS  Implementation – System Analysis ◦ System Information.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
MapReduce Compilers-Apache Pig
Mail call Us: / / Hadoop Training Sathya technologies is one of the best Software Training Institute.
An Open Source Project Commonly Used for Processing Big Data Sets
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Overview of big data tools
CSE 491/891 Lecture 21 (Pig).
Charles Tappert Seidenberg School of CSIS, Pace University
(Hadoop) Pig Dataflow Language
(Hadoop) Pig Dataflow Language
Apache Oozie What is it ? Why use it ? Architecture Examples
Pig Hive HBase Zookeeper
Presentation transcript:

Analyzing Yellowstone’s Network with a Raspberry Pi Cluster Lauren Patterson

Using a low cost Raspberry Pi cluster to find the interconnect path between two nodes on Yellowstone in order to analyze the performance of jobs. Objective of the Project

Assembling the Raspberry Pi cluster

Yellowstone Interconnect Credit: Siddhartha Ghosh

Files Used job1_nodes.txt – Gives the job ID and nodes used ibnetdiscover.log (Discover File) – Lists connections between switches LFTS.txt – Routing table for each switch

What is Hadoop? HDFS MapReduce

HDFS  

Input Data Map Phase Shuffle phase Reduce phase Outpu t Data MapReduce

Pig Apache Pig Pig Latin Grunt

Pig Latin Script Created Pig Latin Script to find the path between two nodes in Yellowstone

JOIN Operations in PIG Default, Inner Join returns intersection of A and B Set B Set A A B U Full, Right and Left Outer Joins return A and B with different parts nulled out (white) Full Right Left Join

Path Finder Code Flow

Results ±3 ±82 ±19 ±15 ±3±4

Python Single Path Python Parallel Python – Mpi4py 1.3.1

±0.02 ±0.07 ±0.006 ±0.11 ±0.004 ±0.11

±18 ±4 ±20 ±2 ±7±4 ±1 ±2 ±0.5

What Do All Of These Have In Common? Raspberry Pi Hadoop Pig Python

Acknowledgments Richard Loft Karina Hauser Stephanie Barr Bruce Chittenden Amogh Simha Raghu Raj Prasanna Kumar

Questions?