Download presentation
Presentation is loading. Please wait.
Published byAdela Roberts Modified over 8 years ago
1
A Tutorial on Hadoop Cloud Computing : Future Trends
2
Big Data IOT and Big Data Hadoop Hadoop Components MapReduce HDFS Projects Scalability Usage Model Usage Areas Companies Future outlook Agenda
3
Data sets that exceed the boundaries and sizes of normal processing capabilities, forcing you to take non-traditional approach “Every day, we create 2.5 quintillion bytes of data, so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.” This data is “ big data.” Big Data
4
Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. Big data may be as important to business and society as the Internet has become. Why? More data may lead to more accurate analyses. Big Data
5
Big Data Challenges Velocity VolumeVariety Data is all sorts of variety (audio, video, text etc.) Lot of data is coming at very high speed via different sources Big volume of data is gathered which is continuously growing
6
Big Data Challenges Velocity VolumeVariety Data is all sorts of variety (audio, video, text etc.) Lot of data is coming at very high speed via different sources Big volume of data is gathered which is continuously growing
7
What is Internet of Things (IoT)? 7 The Internet of Things (IoT) is the interconnection of uniquely identifiable embedded computing devices within the existing Internet infrastructure. Objective: smart world Main components: The things (or assets) themselves. The communication networks connecting them. The computing systems that make use of the data flowing to and from our things.
8
Tools for IoT 8 Different companies are providing solutions for developers To build and deploy powerful IoT applications Tools for device manufacturers to quickly add new connected services to products Some of the companies are listed below Mnubo Oracle Swarm OpenRemote Etherios ioBrideg ThingWorx Arrayent etc.
9
What is Different in IoT regarding Big Data 9 Big Data is characterized by having a particular challenge in one or more of the 3Vs: Volume, Velocity and Variety IoT presents challenges in combination of them Most challenging IoT applications impact both Velocity & Volume and sometimes alsoVariety
10
IoT and Big Data 10 One of the most prominent features of IoT is its real time or near real time communication of information about the “connected things” For Challenging IoT applications, the difficulty lies in doing this at scale ( e.g. from 10s of thousands to 10s of millions and above). Smart * scenarios are in many cases also characterized due to the Variety of data sources and data to be stored and processed
11
IoT and Big Data – Challenges 11 Another important feature of the vision of IoT is that by observing the behavior of “many things” it will be possible to gain important insights, optimize processes, etc. This vision boils down to many challenges: To store all events (Velocity & Volume Challenge). To run analytical queries over the stored events (Velocity & Volume challenge). To perform analytics (data mining and machine learning) over the data to gain insights ( Velocity, Volume & Variety challenge).
12
IoT and Big Data - Challenges 12 Another angle in the vision of IoT is the ability of performing Real Time Analytics How to detect and react in real time to opportunities and threats to any business; This requirement presents multiple challenges: How to process streaming events on the fly (Velocity Challenge). How to store streaming events in the operational database (Velocity challenge). How to correlate streaming events with stored data in the operational database (Velocity and Volume challenge).
13
13 Big Data Processing Techniques for IoT IoT devices & environments Generate Techniques
14
What is Hadoop The name Hadoop is not an acronym it’s a made-up name The project’s creator, Doug Cutting, says: The name my kid gave a stuffed yellow elephant Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere
15
What is Hadoop Hadoop Framework of tools Is a Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers.
16
Objective Hadoop Running applications on Big Data Supports
17
Apache Is a Apache License Hadoop Open Source
18
Traditional Approach Big Data Enterprise Approach Powerful Computers Processing Limit Only So much data could be processed Not Scalable
19
Breaking the Data Big Data Hadoop Approach Is broken into pieces
20
Breaking the Data Big Data Hadoop Approach Computation Computation Combined Results
21
Architecture High Level components MapReduce HDFS Projects How Hadoop is able to break the data and computation into pieces and combine and send results back to the application Provide assistance to Hadoop in the performing different activities
22
Distributed Model Linux Low Cost Computers (Commodity Hardware) Does not use expensive Computers
23
Task Trackers and Data Nodes Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node To Process the small piece of Task assigned to this particular node Slaves To manage the chunk of data assigned to this particular node
24
Master Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Node Slaves
25
MapReduce Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node MapReduce Master Slaves
26
HDFS Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node HDFS Master Slaves
27
Applications Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Applicationss Contacts to Master Master Slaves
28
Batch Processing Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Applicationss Queue Batch Processing Master Slaves
29
Role of Job Tracker Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Divides the task Assigns tasks to different nodes Receives and combines the results Final result is send back to the application
30
Role of Name Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Keeps the index of each chunk of data Which chunk of data is resided at which data node
31
Data Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Applicationss Data Directly goes to Application Master Slaves
32
Fault tolerance for Data Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Hardware failure Built in Fault tolerance Keeps 3 copies of data Data available from different nodes HDFS
33
Fault tolerance for Processing Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves any service failure Job tracker detects Ask other Task tracker to do the same task MapReduce
34
Master Backup Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Master dies Single Point of failure Tables containing indexes are backed up by Name node Copies are copied over to multiple nodes
35
Secondary Master Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Master dies Secondary Master is available for backup At Enterprise Level
36
Easy Programming Where the file is located How to manage Failures How to break computations into pieces How to Program for scaling Do not have to worry about Programmers
37
Scalability Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves 1000s Scalable
38
Scalability Cost Processing Speed Number of Computers Scalability cost is Linear To Increase Processing Power Increase the no: of computers
39
Projects Set of tools To assist the hadoop Managed by apache MapReduce HDFS Projects
40
MapReduce HDFS Projects Hive HBase Mahout Pig Oozie Flume Scoop
41
Usage Model Administrators Users Installation Monitor/manage system Tune System Overall working of the software Designing of the application Importing/exporting of data Working with the tools Overlapping Roles User working with admins to tune the system Admins helps end users in writing applications
42
Usage Areas Social MediaRetail Financial Services Searching Tools GovernmentIntelligence Big users of Hadoop such are Yahoo, Facebook, Amazon, twitter, google etc. Hadoop can be used anywhere, where there is big data
43
Companies eBay American Airlines The New York Times Federal Reserve Board ChevronIBM Companies Using Hadoop facebook Yahoo Amazon
44
Examples of Applications Group related documents Search for uncommon patterns Mining of users behavior to generate recommendations Advertisement Searches Security
45
Future Outlook In 2015, 50 % of Enterprise data is processed by Hadoop Yahoo
46
Thanks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.