Download presentation
Presentation is loading. Please wait.
Published byMilo Thornton Modified over 6 years ago
1
Lecture 16B: Instructions on how to use Hadoop on Amazon Web Services
CSE 482: Big Data Analysis Lecture 16B: Instructions on how to use Hadoop on Amazon Web Services
2
Using Amazon Web Service (AWS)
First, you must sign up for an AWS account: Go to and click Sign Up Now. Follow the on-screen instructions. (You need to provide your credit or debit card information.) AWS will notify you by when your account is available to be use Next, obtain your AWS education grant credit Once approved, add the promo code (which will be sent to your address) to your AWS account You will be given a $35 AWS credit and will not be charged until your credits run out
3
Services available on AWS
Amazon Elastic Compute Cloud (EC2) Provides access to the cloud computing platform You can launch an SSH terminal to interact with the server Amazon Elastic MapReduce (EMR) An EC2 server with Hadoop, Pig, Hive, and other software already pre-installed You can launch an SSH terminal to interact with the EMR server
4
Logging in to AWS
5
Logging in to AWS
6
After Logging in to AWS Account
7
AWS Management Console
My account: account information (can use this to close your account) My Billing Dashboard: check your bill; redeem your AWS credit My Security Credentials: create authentication tokens
8
Billing & Cost Management Dashboard
Click here to redeem your AWS credit
9
Security Credentials Click here
10
Creating Access Keys Click here
11
Creating Access Keys Click here to download and save the access key file (needed to use AWS API)
12
Using SSH to AWS Elastic MapReduce
To connect to AWS cloud computer: Create public/private key pairs for access to EC2 Copy the key file to the machine where you want to run your SSH from Edit the Security Group Security group specifies who can connect to the compute cluster you’ve launched Launch the AWS EMR cluster Connect to the cluster using SSH Terminate the cluster (*VERY IMPORTANT*) Steps 1-3 need to be performed only once
13
Step 1: Creating Key Pairs on EC2
Sign in to AWS (aws.amazon.com) Click on the "Amazon Elastic EC2" tab. Click on the "Key Pairs" link. Click on the "Create Key Pair" button. Enter a name and save the key file (*.pem). Download the key file onto the machine from which you want to run the SSH from. For example, if you want to run ssh from arctic, you will need to save the *.pem file on your CSE account on arctic.
14
Step 1: Creating Key Pairs on EC2
Click on Services
15
Step 1: Creating Key Pairs on EC2
Click on EC2
16
Step 1: Creating Key Pairs on EC2
Click on Key Pairs and then the Create Key Pair button
17
Step 1: Creating Key Pairs on EC2
18
Step 1: Creating Key Pairs on EC2
19
Step 2: Copy the Key File After creating the key pair, you will obtain a private key, “*.pem” file. Connecting from Mac/Linux Save the file to the directory from which you will run your SSH. Connecting from Windows Convert the “*.pem” private key file to “.ppk” file using puttygen Download puttygen.exe from Follow the instruction from the following page to convert the .pem file to .ppk file
20
Step 3: Edit the Security Group
Click on EC2
21
Step 3: Edit the Security Group
Click here to edit security group
22
Step 3: Edit the Security Group
Select the Elastic MapReduce master and then click on Inbound tab
23
Step 3: Edit the Security Group
Add the rule below to the security group: type protocol Port Range Source SSH TCP 22 Anywhere This allows you to connect to the master node from anywhere. You can also specify a specific IP address to prevent anyone from accessing it
24
Step 4: Launch AWS EMR Cluster
Click on EMR
25
Step 4: Launch AWS EMR Cluster
Click on Create Cluster
26
Step 4: Launch AWS EMR Cluster
Specify m1: medium as Instance type and set number of instances to be 2 (the larger the number of instances, the more costly it is) Select the key pair from the list provided
27
Step 4: Launch AWS EMR Cluster
After you have launched the new cluster, wait for several minutes until the cluster has started.
28
Step 4: Launch AWS EMR Cluster
Click on the SSH link. Read this document carefully Follow the instruction to connect to the cluster using Putty (for Windows) or SSH on Linux.
29
Step 5A: Connect to EMR from Windows
Read the instruction on how to connect to the EMR cluster on Windows This is the ppk file generated from Step 2
30
Step 5A: Connect to EMR from Windows
Start the Putty program and enter the cluster name into the Host Name field
31
Step 5A: Connect to EMR from Windows
Click on SSH -> Auth Provide the private key file (*.ppk) you had generated in step 2
32
Step 5A: Connect to EMR from Windows
Click open
33
Step 5A: Connect to EMR from Windows
Click open
34
Step 5A: Connect to EMR from Windows
Success! You’re now connected to the EMR cluster from Puttygen
35
Step 5A: Connect to EMR from Windows
36
Step 5B: Connect to EMR from Linux
Read the instruction on how to connect to the EMR cluster on Mac/Linux Host name to connect to the EMR cluster
37
Step 5B: Connect to EMR from Linux
Login to one of the machines on CSE server (e.g. arctic.cse.msu.edu or black.cse.msu.edu) Go to the directory that contains the *.pem file (see step 2) Invoke ssh to connect to the cluster Cluster-name is the name given on the previous slide Replace mykey.pem with the name of your *.pem file arctic> ssh -i mykey.pem
38
Step 5B: Connect to EMR from Linux
Result after opening the SSH connection
39
Step 6: Terminate Your Cluster
After you’ve completed the task, terminate the cluster you have launched (VERY IMPORTANT). This is very important step. You will be charged as long as your cluster is still running. Note that you have only $35 credit on AWS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.