A Tutorial on Hadoop Cloud Computing : Future Trends.

Slides:

Advertisements

Similar presentations

 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.

Advertisements

 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)

Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.

Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.

Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.

This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.

Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

Software Architecture

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche.

Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Introduction to Hadoop and HDFS

SEMINAR ON Guided by: Prof. D.V.Chaudhari Seminar by: Namrata Sakhare Roll No: 65 B.E.Comp.

Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.

The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing The volume of data being made publicly available.

Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.

+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.

Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

Cloud Computing & Big Data Group 9 Femme L H Sabaru | Aditya Gisheila N P | Aninda Harapan | Harry | Andrew Khosugih.

What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.

{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.

BIG DATA/ Hadoop Interview Questions.

Apache Hadoop on Windows Azure Avkash Chauhan

Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,

Microsoft Ignite /28/2017 6:07 PM

Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.

BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.

Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.

Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.

Data Analytics (CS40003) Introduction to Data Lecture #1

Connected Infrastructure

Organizations Are Embracing New Opportunities

SAS users meeting in Halifax

Connected Living Connected Living What to look for Architecture

MapReduce Compiler RHadoop

Smart Building Solution

Hadoop Aakash Kag What Why How 1.

By Chris immanuel, Heym Kumar, Sai janani, Susmitha

Connected Maintenance Solution

Tutorial: Big Data Algorithms and Applications Under Hadoop

Chapter 14 Big Data Analytics and NoSQL

Smart Building Solution

Hadoopla: Microsoft and the Hadoop Ecosystem

The Hadoop Sandbox The Playground for the Future of Your Career

Connected Maintenance Solution

Connected Living Connected Living What to look for Architecture

Connected Infrastructure

Introduction to MapReduce and Hadoop

Rahi Ashokkumar Patel U

Hadoop Clusters Tess Fulkerson.

INF 103 Education for Service-- tutorialrank.com

Ministry of Higher Education

Big Data - in Performance Engineering

Ch 4. The Evolution of Analytic Scalability

Big Data Young Lee BUS 550.

TIM TAYLOR AND JOSH NEEDHAM

Lecture 16 (Intro to MapReduce and Hadoop)

Flexible Distributed Reporting for Millions of Publishers and Thousands of Advertisers Berlin |

Zoie Barrett and Brian Lam

Charles Tappert Seidenberg School of CSIS, Pace University

Dep. of Information Technology By: Raz Dara Mohammad Amin

Big Data Analysis in Digital Marketing

UNIT 6 RECENT TRENDS.

Presentation transcript:

A Tutorial on Hadoop Cloud Computing : Future Trends

 Big Data  IOT and Big Data  Hadoop  Hadoop Components  MapReduce  HDFS  Projects  Scalability  Usage Model  Usage Areas  Companies  Future outlook Agenda

 Data sets that exceed the boundaries and sizes of normal processing capabilities, forcing you to take non-traditional approach  “Every day, we create 2.5 quintillion bytes of data, so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.” This data is “ big data.” Big Data

 Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured.  Big data may be as important to business and society as the Internet has become.  Why?  More data may lead to more accurate analyses. Big Data

Big Data Challenges Velocity VolumeVariety Data is all sorts of variety (audio, video, text etc.) Lot of data is coming at very high speed via different sources Big volume of data is gathered which is continuously growing

Big Data Challenges Velocity VolumeVariety Data is all sorts of variety (audio, video, text etc.) Lot of data is coming at very high speed via different sources Big volume of data is gathered which is continuously growing

What is Internet of Things (IoT)? 7  The Internet of Things (IoT) is the interconnection of uniquely identifiable embedded computing devices within the existing Internet infrastructure.  Objective: smart world  Main components:  The things (or assets) themselves.  The communication networks connecting them.  The computing systems that make use of the data flowing to and from our things.

Tools for IoT 8  Different companies are providing solutions for developers  To build and deploy powerful IoT applications  Tools for device manufacturers to quickly add new connected services to products  Some of the companies are listed below  Mnubo  Oracle  Swarm  OpenRemote  Etherios  ioBrideg  ThingWorx  Arrayent etc.

What is Different in IoT regarding Big Data 9  Big Data is characterized by having a particular challenge in one or more of the 3Vs: Volume, Velocity and Variety  IoT presents challenges in combination of them  Most challenging IoT applications impact both Velocity & Volume and sometimes alsoVariety

IoT and Big Data 10  One of the most prominent features of IoT is its real time or near real time communication of information about the “connected things”  For Challenging IoT applications, the difficulty lies in doing this at scale ( e.g. from 10s of thousands to 10s of millions and above).  Smart * scenarios are in many cases also characterized due to the Variety of data sources and data to be stored and processed

IoT and Big Data – Challenges 11  Another important feature of the vision of IoT is that by observing the behavior of “many things” it will be possible to gain important insights, optimize processes, etc.  This vision boils down to many challenges:  To store all events (Velocity & Volume Challenge).  To run analytical queries over the stored events (Velocity & Volume challenge).  To perform analytics (data mining and machine learning) over the data to gain insights ( Velocity, Volume & Variety challenge).

IoT and Big Data - Challenges 12  Another angle in the vision of IoT is the ability of performing Real Time Analytics  How to detect and react in real time to opportunities and threats to any business;  This requirement presents multiple challenges:  How to process streaming events on the fly (Velocity Challenge).  How to store streaming events in the operational database (Velocity challenge).  How to correlate streaming events with stored data in the operational database (Velocity and Volume challenge).

13 Big Data Processing Techniques for IoT IoT devices & environments Generate Techniques

What is Hadoop The name Hadoop is not an acronym it’s a made-up name The project’s creator, Doug Cutting, says: The name my kid gave a stuffed yellow elephant Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere

What is Hadoop Hadoop Framework of tools Is a Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers.

Objective Hadoop Running applications on Big Data Supports

Apache Is a Apache License Hadoop Open Source

Traditional Approach Big Data  Enterprise Approach Powerful Computers Processing Limit Only So much data could be processed Not Scalable

Breaking the Data Big Data  Hadoop Approach Is broken into pieces

Breaking the Data Big Data  Hadoop Approach Computation Computation Combined Results

Architecture  High Level components MapReduce HDFS Projects How Hadoop is able to break the data and computation into pieces and combine and send results back to the application Provide assistance to Hadoop in the performing different activities

Distributed Model Linux Low Cost Computers (Commodity Hardware)  Does not use expensive Computers

Task Trackers and Data Nodes Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node To Process the small piece of Task assigned to this particular node Slaves To manage the chunk of data assigned to this particular node

Master Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Node Slaves

MapReduce Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node MapReduce Master Slaves

HDFS Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node HDFS Master Slaves

Applications Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Applicationss Contacts to Master Master Slaves

Batch Processing Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Applicationss Queue Batch Processing Master Slaves

Role of Job Tracker Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves  Divides the task  Assigns tasks to different nodes  Receives and combines the results  Final result is send back to the application

Role of Name Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves  Keeps the index of each chunk of data  Which chunk of data is resided at which data node

Data Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Applicationss Data Directly goes to Application Master Slaves

Fault tolerance for Data Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Hardware failure Built in Fault tolerance Keeps 3 copies of data Data available from different nodes HDFS

Fault tolerance for Processing Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves any service failure Job tracker detects Ask other Task tracker to do the same task MapReduce

Master Backup Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Master dies Single Point of failure Tables containing indexes are backed up by Name node Copies are copied over to multiple nodes

Secondary Master Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Master dies Secondary Master is available for backup At Enterprise Level

Easy Programming Where the file is located How to manage Failures How to break computations into pieces How to Program for scaling Do not have to worry about Programmers

Scalability Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves 1000s Scalable

Scalability Cost Processing Speed Number of Computers Scalability cost is Linear To Increase Processing Power Increase the no: of computers

Projects  Set of tools  To assist the hadoop  Managed by apache MapReduce HDFS Projects

MapReduce HDFS Projects Hive HBase Mahout Pig Oozie Flume Scoop

Usage Model Administrators Users Installation Monitor/manage system Tune System Overall working of the software Designing of the application Importing/exporting of data Working with the tools Overlapping Roles User working with admins to tune the system Admins helps end users in writing applications

Usage Areas Social MediaRetail Financial Services Searching Tools GovernmentIntelligence Big users of Hadoop such are Yahoo, Facebook, Amazon, twitter, google etc. Hadoop can be used anywhere, where there is big data

Companies eBay American Airlines The New York Times Federal Reserve Board ChevronIBM Companies Using Hadoop facebook Yahoo Amazon

Examples of Applications Group related documents Search for uncommon patterns Mining of users behavior to generate recommendations Advertisement Searches Security

Future Outlook In 2015, 50 % of Enterprise data is processed by Hadoop Yahoo

Thanks