Download presentation
Presentation is loading. Please wait.
1
Hadoop Architecture Mr. Sriram
2
Objectives Hadoop Cluster – A typical usage
Hadoop 2.X Cluster Architecture Analyze Hadoop 2.X Cluster Architecture - Federation Analyze Hadoop 2.X Cluster Architecture - High Availability Hadoop 2.X Resource Management Run Hadoop in different Cluster modes Installation & Configuration of Hadoop Implement basic Hadoop commands on terminal Prepare Hadoop 2.X configuration files and analyze the parameters in it Implement different loading techniques
3
Hadoop Architecture Hadoop Cluster Typical Use
Hadoop 2.X Cluster Architecture Hadoop 2.X Cluster Architecture – Federation Hadoop 2.X Cluster Architecture – High Availability Hadoop 2.X Resource Management Hadoop Cluster – Facebook Hadoop Cluster Modes
4
Hadoop Cluster – A Typical Use Case
5
Hadoop 1.0 Cluster
6
Hadoop 2.0 Cluster
7
Hadoop 2.X Cluster Master-Slave Architecture
8
Hadoop 2.X Cluster Architecture
9
Hadoop 2.X Cluster Architecture - Federation
10
Hadoop 2.X Cluster Architecture – High Availability
11
Hadoop 2.X Resource Management
12
Hadoop 2.X Resource Management..
13
Hadoop Cluster - Facebook
14
Hadoop Cluster Modes
15
Hadoop Installation & Configuration
Hadoop FS shell Commands Terminal Commands Hadoop 2.X Configuration Files -> core-site.xml -> hdfs-site.xml -> mapred-site.xml -> yarm-site.xml Slaves & Masters Per-Process Run Time Environment Hadoop Daemons Hadoop Web UI Parts Hadoop Installation & Configuration
16
Hadoop Installation Pre-requisites
Install JAVA(version 6 or later) Use java –version for checking for java installation Use which java to locate the java directory Hadoop runs on Unix and Windows Linux is the only supported production platform Windows and Mac is supported only as development platform Windows additionally requiresCygwinto run
17
Hadoop Installation We can install Hadoop in any of the following ways: 1) Automated method using Cloudera Manager 2) Manual methods described below: i. Install from a CDH5 tarball ii. Install from RPMs
18
Hadoop Installation from a Tarball
Downloading Tarball Download stable Hadoop release from Cloudera Hadoop release page UnpackingTarball Unpack the Hadoop archive to /home/$USER/using $ tar xzfhadoop-x.y.z.tar.gz
19
Hadoop Installation from a Tarball
Setting Environment Variables Edit conf/hadoop-env.sh in the Hadoop folder Specify the JAVA_HOME variable by adding export JAVA_HOME=/usr/java/ Edit.profile file using any text editor and set the HADOOP_HOME, JAVA_HOME & necessary CLASSPATHs
20
Hadoop Installation from RPM
Download the CDH packages that matches your Red Hat or CentOS system from cloudera one click install page: archive.cloudera.com/cdh4/one-click-install/redhat Install the RPM: sudo yum--nogpgchecklocalinstallcloudera-cdh-4-0.x86_64.rpm Optionally Add a RepositoryKey Sudo rpm--importhttp://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/RPM Install Hadoop in pseudo-distributed mode Sudo yuminstallhadoop-0.20-conf-pseudo
21
Hadoop Installation from Cloudera Manager
1.Download and run the Cloudera Manager Installer Download cloudera-manager-installer.bin from the Cloudera Downloads page 2.After downloading cloudera-manager-installer.bin, change it to have executable permission chmod u+x cloude chmod u+x cloudera-manager-installer.bin 3.Run cloudera-manager-installer.bin sudo./cloudera-manager-installer.bin Read the Cloudera Manager Readme and then press Enter to choose Next
22
Hadoop Installation from Cloudera Manager
4. Start the Cloudera Manager Admin Console Log into Cloudera Manager The default credentials are: Username: admin Password: admin Use Cloudera Manager for Automated CDH Installation and Configuration Find the cluster hosts you specify via hostname and ILP–address ranges Click Search Cloudera Manager identifies the hosts on your cluster to allow you to configure them for CDH Choose the CDH version to install
23
Hadoop FS Shell Commands
24
Terminal Commands
25
Terminal Commands – mkdir, touchz, ls, count
Terminal Type admin terminal # | user terminal $ To make a directory $ hadoop fs -mkdir /user/cloudera/Monday To create an empty file $ hadoop fs -touchz /user/cloudera/Monday/one.txt To list number of files and directories present in HDFS location $ hadoop fs -ls /user/cloudera/Monday To count the number of files and directories available in HDFS location $ hadoop fs -count /user/cloudera/Monday
26
Terminal Commands - Copy
To copy the file from LFS to HDFS $ hadoop fs -put /home/cloudera/Desktop/two.txt /user/cloudera/Monday (or) $ hadoop fs -copyFromLocal /home/cloudera/Desktop/three.txt /user/cloudera/Monday To copy file from HDFS to LFS $ hadoop fs -get /user/cloudera/Monday/two.txt /home/cloudera/Desktop/Tuesday (or ) $ hadoop fs -copyToLocal /user/cloudera/Monday/one.txt /home/cloudera/Desktop/Tuesday
27
Terminal Commands – cat, rm
To print the contents of HDFS file: $ hadoop fs -cat /user/cloudera/Monday/two.txt or $ hadoop fs -text /user/cloudera/Monday/two.txt To remove the directory from HDFS location $ hadoop fs -rm -r /user/cloudera/Monday
28
Hadoop Configuration Files
Each component in Hadoop is configured using an XML file These XML files are allocated in the conf subdirectory in Hadoop folder The three most important XML files are: Core-site.xml-Core properties Hdfs-site.xml-HDFS properties Mapred-site.xml-MapReduce properties To run Hadoop in a particular mode, you need to do two things: Set the appropriate properties in the configuration files Start the Hadoop daemons
29
Hadoop 2.x Configuration Files
30
Hadoop 2.x Configuration Files – Apache Hadoop
31
core-site.xml
32
hdfs-site.xml
33
mapred-site.xml
34
yarn-site.xml
35
Slaves & Masters
36
Per-Process Run Time Environment
37
All Properties
38
Running Hadoop Before Hadoop can be used,a brand-new HDFS installation needs to be formatted Commands To Format the Name Node: hadoop namenode -format To start the HDFS and MapReduce daemon $ sstart-dfs.sh, $ start-mapred.sh. Or use $ start-all.sh to start all daemons If you have placed configuration files outside the default conf directory, start the daemons with the— config option, start-xyz.sh—config path-to-config-directory To stop a daemon $stop-dfs.sh, $stop-mapred.sh Or use $stop-all.sh to stop all daemons.
39
Hadoop 2.x Daemons
40
Hadoop Daemons
41
Hadoop Web UI Parts
42
Hadoop Web UI URL’s
43
Hadoop Stack
44
Data Loading Techniques and Data Analysis
45
Data Loading using Flume
46
Data Loading using SQOOP
47
Further Reading
48
Further Reading..
49
Thank You !!!!!!!!!!!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.