Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 16B: Instructions on how to use Hadoop on Amazon Web Services

Similar presentations


Presentation on theme: "Lecture 16B: Instructions on how to use Hadoop on Amazon Web Services"— Presentation transcript:

1 Lecture 16B: Instructions on how to use Hadoop on Amazon Web Services
CSE 482: Big Data Analysis Lecture 16B: Instructions on how to use Hadoop on Amazon Web Services

2 Using Amazon Web Service (AWS)
First, you must sign up for an AWS account: Go to and click Sign Up Now. Follow the on-screen instructions. (You need to provide your credit or debit card information.) AWS will notify you by when your account is available to be use Next, obtain your AWS education grant credit Once approved, add the promo code (which will be sent to your address) to your AWS account You will be given a $35 AWS credit and will not be charged until your credits run out

3 Services available on AWS
Amazon Elastic Compute Cloud (EC2) Provides access to the cloud computing platform You can launch an SSH terminal to interact with the server Amazon Elastic MapReduce (EMR) An EC2 server with Hadoop, Pig, Hive, and other software already pre-installed You can launch an SSH terminal to interact with the EMR server

4 Logging in to AWS

5 Logging in to AWS

6 After Logging in to AWS Account

7 AWS Management Console
My account: account information (can use this to close your account) My Billing Dashboard: check your bill; redeem your AWS credit My Security Credentials: create authentication tokens

8 Billing & Cost Management Dashboard
Click here to redeem your AWS credit

9 Security Credentials Click here

10 Creating Access Keys Click here

11 Creating Access Keys Click here to download and save the access key file (needed to use AWS API)

12 Using SSH to AWS Elastic MapReduce
To connect to AWS cloud computer: Create public/private key pairs for access to EC2 Copy the key file to the machine where you want to run your SSH from Edit the Security Group Security group specifies who can connect to the compute cluster you’ve launched Launch the AWS EMR cluster Connect to the cluster using SSH Terminate the cluster (*VERY IMPORTANT*) Steps 1-3 need to be performed only once

13 Step 1: Creating Key Pairs on EC2
Sign in to AWS (aws.amazon.com) Click on the "Amazon Elastic EC2" tab. Click on the "Key Pairs" link. Click on the "Create Key Pair" button. Enter a name and save the key file (*.pem). Download the key file onto the machine from which you want to run the SSH from. For example, if you want to run ssh from arctic, you will need to save the *.pem file on your CSE account on arctic.

14 Step 1: Creating Key Pairs on EC2
Click on Services

15 Step 1: Creating Key Pairs on EC2
Click on EC2

16 Step 1: Creating Key Pairs on EC2
Click on Key Pairs and then the Create Key Pair button

17 Step 1: Creating Key Pairs on EC2

18 Step 1: Creating Key Pairs on EC2

19 Step 2: Copy the Key File After creating the key pair, you will obtain a private key, “*.pem” file. Connecting from Mac/Linux Save the file to the directory from which you will run your SSH. Connecting from Windows Convert the “*.pem” private key file to “.ppk” file using puttygen Download puttygen.exe from Follow the instruction from the following page to convert the .pem file to .ppk file

20 Step 3: Edit the Security Group
Click on EC2

21 Step 3: Edit the Security Group
Click here to edit security group

22 Step 3: Edit the Security Group
Select the Elastic MapReduce master and then click on Inbound tab

23 Step 3: Edit the Security Group
Add the rule below to the security group: type protocol Port Range Source SSH TCP 22 Anywhere This allows you to connect to the master node from anywhere. You can also specify a specific IP address to prevent anyone from accessing it

24 Step 4: Launch AWS EMR Cluster
Click on EMR

25 Step 4: Launch AWS EMR Cluster
Click on Create Cluster

26 Step 4: Launch AWS EMR Cluster
Specify m1: medium as Instance type and set number of instances to be 2 (the larger the number of instances, the more costly it is) Select the key pair from the list provided

27 Step 4: Launch AWS EMR Cluster
After you have launched the new cluster, wait for several minutes until the cluster has started.

28 Step 4: Launch AWS EMR Cluster
Click on the SSH link. Read this document carefully Follow the instruction to connect to the cluster using Putty (for Windows) or SSH on Linux.

29 Step 5A: Connect to EMR from Windows
Read the instruction on how to connect to the EMR cluster on Windows This is the ppk file generated from Step 2

30 Step 5A: Connect to EMR from Windows
Start the Putty program and enter the cluster name into the Host Name field

31 Step 5A: Connect to EMR from Windows
Click on SSH -> Auth Provide the private key file (*.ppk) you had generated in step 2

32 Step 5A: Connect to EMR from Windows
Click open

33 Step 5A: Connect to EMR from Windows
Click open

34 Step 5A: Connect to EMR from Windows
Success! You’re now connected to the EMR cluster from Puttygen

35 Step 5A: Connect to EMR from Windows

36 Step 5B: Connect to EMR from Linux
Read the instruction on how to connect to the EMR cluster on Mac/Linux Host name to connect to the EMR cluster

37 Step 5B: Connect to EMR from Linux
Login to one of the machines on CSE server (e.g. arctic.cse.msu.edu or black.cse.msu.edu) Go to the directory that contains the *.pem file (see step 2) Invoke ssh to connect to the cluster Cluster-name is the name given on the previous slide Replace mykey.pem with the name of your *.pem file arctic> ssh -i mykey.pem

38 Step 5B: Connect to EMR from Linux
Result after opening the SSH connection

39 Step 6: Terminate Your Cluster
After you’ve completed the task, terminate the cluster you have launched (VERY IMPORTANT). This is very important step. You will be charged as long as your cluster is still running. Note that you have only $35 credit on AWS


Download ppt "Lecture 16B: Instructions on how to use Hadoop on Amazon Web Services"

Similar presentations


Ads by Google