Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Tutorial on Hadoop Cloud Computing : Future Trends.

Similar presentations


Presentation on theme: "A Tutorial on Hadoop Cloud Computing : Future Trends."— Presentation transcript:

1 A Tutorial on Hadoop Cloud Computing : Future Trends

2  Big Data  IOT and Big Data  Hadoop  Hadoop Components  MapReduce  HDFS  Projects  Scalability  Usage Model  Usage Areas  Companies  Future outlook Agenda

3  Data sets that exceed the boundaries and sizes of normal processing capabilities, forcing you to take non-traditional approach  “Every day, we create 2.5 quintillion bytes of data, so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.” This data is “ big data.” Big Data

4  Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured.  Big data may be as important to business and society as the Internet has become.  Why?  More data may lead to more accurate analyses. Big Data

5 Big Data Challenges Velocity VolumeVariety Data is all sorts of variety (audio, video, text etc.) Lot of data is coming at very high speed via different sources Big volume of data is gathered which is continuously growing

6 Big Data Challenges Velocity VolumeVariety Data is all sorts of variety (audio, video, text etc.) Lot of data is coming at very high speed via different sources Big volume of data is gathered which is continuously growing

7 What is Internet of Things (IoT)? 7  The Internet of Things (IoT) is the interconnection of uniquely identifiable embedded computing devices within the existing Internet infrastructure.  Objective: smart world  Main components:  The things (or assets) themselves.  The communication networks connecting them.  The computing systems that make use of the data flowing to and from our things.

8 Tools for IoT 8  Different companies are providing solutions for developers  To build and deploy powerful IoT applications  Tools for device manufacturers to quickly add new connected services to products  Some of the companies are listed below  Mnubo  Oracle  Swarm  OpenRemote  Etherios  ioBrideg  ThingWorx  Arrayent etc.

9 What is Different in IoT regarding Big Data 9  Big Data is characterized by having a particular challenge in one or more of the 3Vs: Volume, Velocity and Variety  IoT presents challenges in combination of them  Most challenging IoT applications impact both Velocity & Volume and sometimes alsoVariety

10 IoT and Big Data 10  One of the most prominent features of IoT is its real time or near real time communication of information about the “connected things”  For Challenging IoT applications, the difficulty lies in doing this at scale ( e.g. from 10s of thousands to 10s of millions and above).  Smart * scenarios are in many cases also characterized due to the Variety of data sources and data to be stored and processed

11 IoT and Big Data – Challenges 11  Another important feature of the vision of IoT is that by observing the behavior of “many things” it will be possible to gain important insights, optimize processes, etc.  This vision boils down to many challenges:  To store all events (Velocity & Volume Challenge).  To run analytical queries over the stored events (Velocity & Volume challenge).  To perform analytics (data mining and machine learning) over the data to gain insights ( Velocity, Volume & Variety challenge).

12 IoT and Big Data - Challenges 12  Another angle in the vision of IoT is the ability of performing Real Time Analytics  How to detect and react in real time to opportunities and threats to any business;  This requirement presents multiple challenges:  How to process streaming events on the fly (Velocity Challenge).  How to store streaming events in the operational database (Velocity challenge).  How to correlate streaming events with stored data in the operational database (Velocity and Volume challenge).

13 13 Big Data Processing Techniques for IoT IoT devices & environments Generate Techniques

14 What is Hadoop The name Hadoop is not an acronym it’s a made-up name The project’s creator, Doug Cutting, says: The name my kid gave a stuffed yellow elephant Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere

15 What is Hadoop Hadoop Framework of tools Is a Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers.

16 Objective Hadoop Running applications on Big Data Supports

17 Apache Is a Apache License Hadoop Open Source

18 Traditional Approach Big Data  Enterprise Approach Powerful Computers Processing Limit Only So much data could be processed Not Scalable

19 Breaking the Data Big Data  Hadoop Approach Is broken into pieces

20 Breaking the Data Big Data  Hadoop Approach Computation Computation Combined Results

21 Architecture  High Level components MapReduce HDFS Projects How Hadoop is able to break the data and computation into pieces and combine and send results back to the application Provide assistance to Hadoop in the performing different activities

22 Distributed Model Linux Low Cost Computers (Commodity Hardware)  Does not use expensive Computers

23 Task Trackers and Data Nodes Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node To Process the small piece of Task assigned to this particular node Slaves To manage the chunk of data assigned to this particular node

24 Master Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Node Slaves

25 MapReduce Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node MapReduce Master Slaves

26 HDFS Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node HDFS Master Slaves

27 Applications Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Applicationss Contacts to Master Master Slaves

28 Batch Processing Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Applicationss Queue Batch Processing Master Slaves

29 Role of Job Tracker Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves  Divides the task  Assigns tasks to different nodes  Receives and combines the results  Final result is send back to the application

30 Role of Name Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves  Keeps the index of each chunk of data  Which chunk of data is resided at which data node

31 Data Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Applicationss Data Directly goes to Application Master Slaves

32 Fault tolerance for Data Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Hardware failure Built in Fault tolerance Keeps 3 copies of data Data available from different nodes HDFS

33 Fault tolerance for Processing Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves any service failure Job tracker detects Ask other Task tracker to do the same task MapReduce

34 Master Backup Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Master dies Single Point of failure Tables containing indexes are backed up by Name node Copies are copied over to multiple nodes

35 Secondary Master Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves Master dies Secondary Master is available for backup At Enterprise Level

36 Easy Programming Where the file is located How to manage Failures How to break computations into pieces How to Program for scaling Do not have to worry about Programmers

37 Scalability Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Job Tracker Name Node Master Slaves 1000s Scalable

38 Scalability Cost Processing Speed Number of Computers Scalability cost is Linear To Increase Processing Power Increase the no: of computers

39 Projects  Set of tools  To assist the hadoop  Managed by apache MapReduce HDFS Projects

40 MapReduce HDFS Projects Hive HBase Mahout Pig Oozie Flume Scoop

41 Usage Model Administrators Users Installation Monitor/manage system Tune System Overall working of the software Designing of the application Importing/exporting of data Working with the tools Overlapping Roles User working with admins to tune the system Admins helps end users in writing applications

42 Usage Areas Social MediaRetail Financial Services Searching Tools GovernmentIntelligence Big users of Hadoop such are Yahoo, Facebook, Amazon, twitter, google etc. Hadoop can be used anywhere, where there is big data

43 Companies eBay American Airlines The New York Times Federal Reserve Board ChevronIBM Companies Using Hadoop facebook Yahoo Amazon

44 Examples of Applications Group related documents Search for uncommon patterns Mining of users behavior to generate recommendations Advertisement Searches Security

45 Future Outlook In 2015, 50 % of Enterprise data is processed by Hadoop Yahoo

46 Thanks


Download ppt "A Tutorial on Hadoop Cloud Computing : Future Trends."

Similar presentations


Ads by Google