A Tutorial on Hadoop Cloud Computing : Future Trends.

A Tutorial on Hadoop Cloud Computing : Future Trends

 Big Data  IOT and Big Data  Hadoop  Hadoop Components  MapReduce  HDFS  Projects  Scalability  Usage Model  Usage Areas  Companies  Future outlook Agenda

 Data sets that exceed the boundaries and sizes of normal processing capabilities, forcing you to take non-traditional approach  “Every day, we create 2.5 quintillion bytes of data, so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.” This data is “ big data.” Big Data

 Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured.  Big data may be as important to business and society as the Internet has become.  Why?  More data may lead to more accurate analyses. Big Data

Big Data Challenges Velocity VolumeVariety Data is all sorts of variety (audio, video, text etc.) Lot of data is coming at very high speed via different sources Big volume of data is gathered which is continuously growing

What is Internet of Things (IoT)? 7  The Internet of Things (IoT) is the interconnection of uniquely identifiable embedded computing devices within the existing Internet infrastructure.  Objective: smart world  Main components:  The things (or assets) themselves.  The communication networks connecting them.  The computing systems that make use of the data flowing to and from our things.

Tools for IoT 8  Different companies are providing solutions for developers  To build and deploy powerful IoT applications  Tools for device manufacturers to quickly add new connected services to products  Some of the companies are listed below  Mnubo  Oracle  Swarm  OpenRemote  Etherios  ioBrideg  ThingWorx  Arrayent etc.

What is Different in IoT regarding Big Data 9  Big Data is characterized by having a particular challenge in one or more of the 3Vs: Volume, Velocity and Variety  IoT presents challenges in combination of them  Most challenging IoT applications impact both Velocity & Volume and sometimes alsoVariety

IoT and Big Data 10  One of the most prominent features of IoT is its real time or near real time communication of information about the “connected things”  For Challenging IoT applications, the difficulty lies in doing this at scale ( e.g. from 10s of thousands to 10s of millions and above).  Smart * scenarios are in many cases also characterized due to the Variety of data sources and data to be stored and processed

IoT and Big Data – Challenges 11  Another important feature of the vision of IoT is that by observing the behavior of “many things” it will be possible to gain important insights, optimize processes, etc.  This vision boils down to many challenges:  To store all events (Velocity & Volume Challenge).  To run analytical queries over the stored events (Velocity & Volume challenge).  To perform analytics (data mining and machine learning) over the data to gain insights ( Velocity, Volume & Variety challenge).

IoT and Big Data - Challenges 12  Another angle in the vision of IoT is the ability of performing Real Time Analytics  How to detect and react in real time to opportunities and threats to any business;  This requirement presents multiple challenges:  How to process streaming events on the fly (Velocity Challenge).  How to store streaming events in the operational database (Velocity challenge).  How to correlate streaming events with stored data in the operational database (Velocity and Volume challenge).

13 Big Data Processing Techniques for IoT IoT devices & environments Generate Techniques

What is Hadoop The name Hadoop is not an acronym it’s a made-up name The project’s creator, Doug Cutting, says: The name my kid gave a stuffed yellow elephant Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere

What is Hadoop Hadoop Framework of tools Is a Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers.

Objective Hadoop Running applications on Big Data Supports

Apache Is a Apache License Hadoop Open Source

Traditional Approach Big Data  Enterprise Approach Powerful Computers Processing Limit Only So much data could be processed Not Scalable

Breaking the Data Big Data  Hadoop Approach Is broken into pieces

Breaking the Data Big Data  Hadoop Approach Computation Computation Combined Results

Architecture  High Level components MapReduce HDFS Projects How Hadoop is able to break the data and computation into pieces and combine and send results back to the application Provide assistance to Hadoop in the performing different activities

Distributed Model Linux Low Cost Computers (Commodity Hardware)  Does not use expensive Computers

Task Trackers and Data Nodes Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node To Process the small piece of Task assigned to this particular node Slaves To manage the chunk of data assigned to this particular node

Master Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Node Slaves

MapReduce Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node MapReduce Master Slaves

HDFS Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node HDFS Master Slaves

Applications Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Applicationss Contacts to Master Master Slaves

Batch Processing Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Applicationss Queue Batch Processing Master Slaves

Role of Job Tracker Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves  Divides the task  Assigns tasks to different nodes  Receives and combines the results  Final result is send back to the application

Role of Name Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves  Keeps the index of each chunk of data  Which chunk of data is resided at which data node

Data Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Applicationss Data Directly goes to Application Master Slaves

Fault tolerance for Data Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Hardware failure Built in Fault tolerance Keeps 3 copies of data Data available from different nodes HDFS

Fault tolerance for Processing Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves any service failure Job tracker detects Ask other Task tracker to do the same task MapReduce

Master Backup Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Master dies Single Point of failure Tables containing indexes are backed up by Name node Copies are copied over to multiple nodes

Secondary Master Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Master dies Secondary Master is available for backup At Enterprise Level

Easy Programming Where the file is located How to manage Failures How to break computations into pieces How to Program for scaling Do not have to worry about Programmers

Scalability Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves 1000s Scalable

Scalability Cost Processing Speed Number of Computers Scalability cost is Linear To Increase Processing Power Increase the no: of computers

Projects  Set of tools  To assist the hadoop  Managed by apache MapReduce HDFS Projects

MapReduce HDFS Projects Hive HBase Mahout Pig Oozie Flume Scoop

Usage Model Administrators Users Installation Monitor/manage system Tune System Overall working of the software Designing of the application Importing/exporting of data Working with the tools Overlapping Roles User working with admins to tune the system Admins helps end users in writing applications

Usage Areas Social MediaRetail Financial Services Searching Tools GovernmentIntelligence Big users of Hadoop such are Yahoo, Facebook, Amazon, twitter, google etc. Hadoop can be used anywhere, where there is big data

Companies eBay American Airlines The New York Times Federal Reserve Board ChevronIBM Companies Using Hadoop facebook Yahoo Amazon

Examples of Applications Group related documents Search for uncommon patterns Mining of users behavior to generate recommendations Advertisement Searches Security

Future Outlook In 2015, 50 % of Enterprise data is processed by Hadoop Yahoo

Thanks

A Tutorial on Hadoop Cloud Computing : Future Trends.

Similar presentations

Presentation on theme: "A Tutorial on Hadoop Cloud Computing : Future Trends."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Tutorial on Hadoop Cloud Computing : Future Trends.

Similar presentations

Presentation on theme: "A Tutorial on Hadoop Cloud Computing : Future Trends."— Presentation transcript:

Similar presentations

About project

Feedback