Download presentation
Presentation is loading. Please wait.
Published byElinor Tyler Modified over 9 years ago
2
Big Data & Hadoop By Mr.Nataraj
3
smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits 1 GB (Giga Byte)=1024 MB =(1024)^3 * 8 bits 1 TB (Tera Byte)=1024GB =(1024)^4 * 8 bits 1 PB (Peta Byte)=1024 TB =(1024)^5 * 8 bits 1 EB (Exa Byte)=1024 PB=(1024)^6 * 8 bits 1 ZB (Zetta Byte)=1024 EB=(1024)^7 * 8 bits 1 YB (Yotta Byte)=1024 ZB=(1024)^8 * 8 bits 1 XB (Xenotta Byte) =1024 YB=(1024)^9 * 8 bits smallest unit is bit 1 byte=8 bits 1 KB (Kilo Byte)= 1024 bytes =1024*8 bits 1MB (Mega Byte)=1024 KB=(1024)^2 * 8 bits 1 GB (Giga Byte)=1024 MB =(1024)^3 * 8 bits 1 TB (Tera Byte)=1024GB =(1024)^4 * 8 bits 1 PB (Peta Byte)=1024 TB =(1024)^5 * 8 bits 1 EB (Exa Byte)=1024 PB=(1024)^6 * 8 bits 1 ZB (Zetta Byte)=1024 EB=(1024)^7 * 8 bits 1 YB (Yotta Byte)=1024 ZB=(1024)^8 * 8 bits 1 XB (Xenotta Byte) =1024 YB=(1024)^9 * 8 bits
4
1 byte =A single character 1 KB = A very short story 1 MB=A small novel (6 seconds of TV-quality video) 1 Gigabyte: A pickup truck filled with paper 1 Terabyte : 50000 trees made into paper 2 PB: All US academic research libraries 5 EB: All words ever spoken by human beings
5
WHAT IS BIG DATA
6
SOME INTERESTING FACTS Google: 20,00,000 query per second Facebook 34000 likes per minute Online Shopping of USD 300,000 per minute 1,00,000 tweets in twitter per minute 600 new videos are uploaded per minute in yT Barack Obama used Big Data to win election Driver-less cars uses Big Data Processing for driving vehicles
7
AT&T transfers about 30 petabytes of data through its networks each day AT&T Google processed about 24 petabytes of data per day in 2009 Google The 2009 movie Avatar is reported to have taken over 1 petabyte of local storage at Weta Digital for the rendering of the 3D CGI effects AvatarWeta Digital
8
As of January 2013, Facebook users had uploaded over 240 billion photos, with 350 million new photos every day. For each uploaded photo, Facebook generates and stores four images of different sizes, which translated to a total of 960 billion images and an estimated 357 petabytes of storage Processing capabiltiy Google process 20 PB a day Facebook 2.5 PB of User data + 15 TB/day ebay 6.5 PB of data +50TB/day
9
Doug Cutting working on Lucene Project(A Search engine to search document)got problem of Storage and computation, was looking for distributed Processing. Google publish a Paper GFS(Google File System) Doug cutting & Michael Cafarella implemented GFS to come out with Hadoop Doug Cutting working on Lucene Project(A Search engine to search document)got problem of Storage and computation, was looking for distributed Processing. Google publish a Paper GFS(Google File System) Doug cutting & Michael Cafarella implemented GFS to come out with Hadoop
10
WHAT IS HADOOP A framework written in Java for running applications on large clusters of commodity hardware. Mainly contains 2 parts – HDFS for Storing data – Map-Reduce for processing data Maintains fault-tolerant using replication factor. A framework written in Java for running applications on large clusters of commodity hardware. Mainly contains 2 parts – HDFS for Storing data – Map-Reduce for processing data Maintains fault-tolerant using replication factor.
11
employee.txt(eno,ename,empAge,empSal,empDes) 101,prasad,t20,1000,lead
12
Assume you have around 100,00,000,00000,0000000000 records and you would like to find out all the employees above 60 years of age.How do you program them traditionally. 10 GB= 10 min 1 TB= 1000 minutes =16 hours Google process 20 PB of data per day To process 20 PB it will take 3200 hours = 133 days INSPIRATION FOR HADOOP To store huge data(unlimited) To process huge data
13
Node -A single computer with its own processor and memory. Cluster-combination of nodes as a single unit Commodity Hardware-cheap non-reliable hardware Replication Factor-data getting duplicated & saved in more than one place Data Local Optimization-data will be processed locally
14
Block :- A part of data node1 node2 node3 data1 data2 data3 1 file 200 MB(50MB 50MB 50MB 50MB) Block size :- The size of data that can stored as a single unit apache hadoop:- 64 MB(configurable) 1GB in apache hadoop=16 blocks 65MB(apache)=64MB+ 1MB Replication:- duplicate the data replication factor is: 3
16
SCALING Vertical Scaling Adding more powerful hardware to an existing system. Will Scale only up to certain limit. Horizontal Scaling Adding a completely new node to an existing cluster. will scale up to many nodes
17
3 V's of Hadoop Volume: The amount of data generated Variety: structured data,unstructed data.Database table data Velocity: The frequency at which data is generated
18
1.hadoop believes on scale out instead of scale up when needed buy more oxes dont grow your oxe more powerful 2.hadoop on structured as well unstructured RDBMS only works with structured data. (However now a days many no-sql database has comeout in the market like mongo db,couch base.) 3.hadoop believes on key-value pair rather than data in the column 1.hadoop believes on scale out instead of scale up when needed buy more oxes dont grow your oxe more powerful 2.hadoop on structured as well unstructured RDBMS only works with structured data. (However now a days many no-sql database has comeout in the market like mongo db,couch base.) 3.hadoop believes on key-value pair rather than data in the column
19
No doubt Hadoop is a framework for processing big data. But it is not the only framework to do so. Below are few more alternative. Apache Spark Apache Spark GraphLab GraphLab HPCC Systems- (High Performance Computing Cluster) HPCC Systems Dryad Dryad Stratosphere Stratosphere Storm Storm R3 R3 Disco Disco Phoenix Phoenix Plasma Plasma
20
You can download hadoop from link http://hadoop.apache.org/releases.html http://apache.bytenet.in/hadoop/comm on/ · 18 November, 2014: Release 2.6.0 available18 November, 2014: Release 2.6.0 available · 27 June, 2014: Release 0.23.11 available27 June, 2014: Release 0.23.11 available · 1 Aug, 2013: Release 1.2.1 (stable) available1 Aug, 2013: Release 1.2.1 (stable) available You can download hadoop from link http://hadoop.apache.org/releases.html http://apache.bytenet.in/hadoop/comm on/ · 18 November, 2014: Release 2.6.0 available18 November, 2014: Release 2.6.0 available · 27 June, 2014: Release 0.23.11 available27 June, 2014: Release 0.23.11 available · 1 Aug, 2013: Release 1.2.1 (stable) available1 Aug, 2013: Release 1.2.1 (stable) available DOWLOADING HADOOP
21
1. Name Node 2. Secondary Name Node 3. Job Tracker 4. Task Tracker 5. Data Node 1. Name Node 2. Secondary Name Node 3. Job Tracker 4. Task Tracker 5. Data Node HADOOP 1. Storing Huge Data 2. Processing Huge Data 1. Storing Huge Data 2. Processing Huge Data Hadoop Daemons
23
Modes in Hadoop Standalone Mode Pseudo Distributed Mode Fully Distributed Mode Standalone mode It is the default mode 1 node No separate process will be running(daemons) Everything runs in a single JVM Small development,Test,Debugging
24
Pseudo Distributed Mode 1. A single node, but cluster will be simulated 2. Daemons will run on separate process separate JVMs 3. Development and Debugging
25
1. Multiple nodes 2. Hadoop will run in a cluster of machines/nodes used in Production Environment
26
Hadoop Architecture
28
Hive Pig Scoop Avro Flume Oozie HBase Cassandra
29
Job Tracker
30
Job Tracker contd..
32
Job Tracker Contd..
33
HDFS write
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.