How to download, configure and run a mapReduce program In a cloudera VM Presented By: Mehakdeep Singh 8868237 Amrit Singh Chaggar 8868228 Ranjodh Singh.

Slides:



Advertisements
Similar presentations
Platforms: Unix and on Windows. Linux: the only supported production platform. Other variants of Unix, like Mac OS X: run Hadoop for development. Windows.
Advertisements

The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
Overview of Hadoop for Data Mining Federal Big Data Group confidential Mark Silverman Treeminer, Inc. 155 Gibbs Street Suite 514 Rockville, Maryland
IERG4180 Tutorial 4 Jim.
VMware vCenter Server Module 4.
Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu.
Integrating HADOOP with Eclipse on a Virtual Machine Moheeb Alwarsh January 26, 2012 Kent State University.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
HADOOP ADMIN: Session -2
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.

Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
VMWare Workstation Installation. Starting Vmware Workstation Go to the start menu and start the VMware Workstation program. *Note: The following instructions.
Hands-On Virtual Computing
Terasort Using SAGA-MapReduce Given by: Sharath Maddineni
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Linux in a Virtual Environment Nagarajan Prabakar School of Computing and Information Sciences Florida International University.
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Cole Jaya Chakladar Group No: 1.
Map-Reduce Big Data, Map-Reduce, Apache Hadoop SoftUni Team Technical Trainers Software University
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Kole Jaya Chakladar Group No: 1.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
Selenium server By, Kartikeya Rastogi Mayur Sapre Mosheca. R
Virtual Machines Module 2. Objectives Define virtual machine Define common terminology Identify advantages and disadvantages Determine what software is.
Deploying Highly Available SQL Server in Windows Azure A Presentation and Demonstration by Microsoft Cluster MVP David Bermingham.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
DIT314 ~ Client Operating System & Administration
MapReduce Compiler RHadoop
Daniel Templeton, Cloudera, Inc.
Hadoop Architecture Mr. Sriram
Hadoop Aakash Kag What Why How 1.
Connect:Direct for UNIX v4.2.x Silent Installation
Introduction to Distributed Platforms
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
An Open Source Project Commonly Used for Processing Big Data Sets
인공지능연구실 이남기 ( ) 유비쿼터스 응용시스템: 실습 가이드 인공지능연구실 이남기 ( )
Chapter 10 Data Analytics for IoT
Large-scale file systems and Map-Reduce
Hadoop MapReduce Framework
Spark Presentation.
Rahi Ashokkumar Patel U
Hadoop Clusters Tess Fulkerson.
Central Florida Business Intelligence User Group
Create A Virtual Machine
Integration of Singularity With Makeflow
Ministry of Higher Education
Big Data Programming: an Introduction
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Cloud Distributed Computing Environment Hadoop
CS6604 Digital Libraries IDEAL Webpages Presented by
WordCount 빅데이터 분산컴퓨팅 박영택.
Introduction to Apache
Lecture 16 (Intro to MapReduce and Hadoop)
5/7/2019 Map Reduce Map reduce.
Hadoop Installation Fully Distributed Mode
Server & Tools Business
02 | Getting Started with HDInsight
Oracle 1z0-928 Oracle Cloud Platform Big Data Management 2018 Associate.
Presentation transcript:

How to download, configure and run a mapReduce program In a cloudera VM Presented By: Mehakdeep Singh 8868237 Amrit Singh Chaggar 8868228 Ranjodh Singh 8740179

Outline Cloudera Introduction Download and Configuration MapReduce Implementation

Cloudera - Introduction Cloudera provides a powerful data platform that enables enterprises to manage their rapidly increasing volume and variety of data. It provides products and solutions which enable: To deploy and manage Apache Hadoop and related projects. To manipulate and analyze your data To keep that data secured and protected

Cloudera – Products and Tools CDH Cloudera Distribution Hadoop is open-source Apache Hadoop distribution and other related open-source projects, including Cloudera Impala and Cloudera Search. Cloudera Manager It is an end-to-end application for managing CDH clusters. It automates the installation process and reduces the deployment time from weeks to minutes. Cloudera Navigator It is a fully integrated data management tool for the Hadoop platform. Audit data access and verify access privileges

CDH It is the complete, tested, and popular distribution of Apache Hadoop. It provides: Flexibility Integration Security Scalability High availability QuickStarts for CDH 5.8 Cloudera QuickStart VM (Single Node Cluster) make it easy to quickly get hands on CDH for testing, demo and self learning purposes. It includes a tutorial, sample data and scripts for getting started.

Download and Configuration Prerequisites: These 64-bit VMs require a 64-bit host OS and a virtualization product that can support a 64-bit guest OS. To use a VMware VM, you must use a player compatible with WorkStation 8.x or higher: Player 4.x or higher Fusion 4.x or higher The amount of RAM required by VM to run CDH 5 is 4+ GB.

Download and Configuration Version used: VMware Workstation 12.5.3 Player for Windows 64-bit Operating Systems. Download Link https://my.vmware.com/en/web/vmware/free#desktop_end_user_computing/vmware_workstation_player/12_0 Cloudera version used is Quickstarts for CDH 5.8 https://www.cloudera.com/downloads/quickstart_vms/5-8.html NOTE: Make sure that Virtualization is enabled in BIOS settings in case of Windows.

VMware Workstation 12.5.3 Player

Cloudera VM Download Screen

How to Setup and Configure

MapReduce A MapReduce job splits a large data set into independent chunks and organizes them into key, value pairs for parallel processing. This parallel processing improves the speed and reliability of the cluster, returning solutions more quickly and with greater reliability. The Map function divides the input into ranges by the InputFormat and creates a map task for each range in the input. The JobTracker distributes those tasks to the worker nodes. The output of each map task is partitioned into a group of key- value pairs for each reduce. The Reduce function then collects the various results and combines them to answer the larger problem that the master node needs to solve. Each reduce pulls the relevant partition from the machines where the maps executed, then writes its output back into HDFS. Thus, the reduce is able to collect the data from all of the maps for the keys and combine them to solve the problem.

MapReduce Processing

How to run MapReduce program 1) Create the input and output locations in hdfs by using the following commands. sudo su hdfs // to access hdfs. By default it is cloudera Hadoop fs -mkdir /user/cloudera // creates folder cloudera within user folder hadoop fs -chown cloudera /user/cloudera // changes ownership of user/cloudera to cloudera exit // exits bash hadoop fs -mkdir /user/cloudera/wordcount /user/cloudera/wordcount/input This creates two folders. 1. wordcount in cloudera 2. input in wordcount

Next step is creating the input files hadoop fs -put file* /user/cloudera/wordcount/input //to put the files from Local storage to HDFS cd /usr/lib/hadoop-mapreduce/ // change the directory to /Hadoop- mapreduce. This folder contains some sample jar files including the one for Wordcount

hadoop-mapreduce]$ hadoop jar hadoop-mapreduce-examples hadoop-mapreduce]$ hadoop jar hadoop-mapreduce-examples.jar wordcount /user/cloudera/wordcount/input /user/cloudera/wordcount/output/ This command is used to run jar file. The location of input files and output is specified. wordcount specifies that it is wordcount operation Total input paths : 3 This is equal number of input files. (file0,file1,file2)

Checking the Output Two ways : 1. thru’ terminal 2. from user interface 1. terminal hadoop-mapreduce]$ hadoop fs –cat /user/cloudera/ wordcount /output /part-r-00000

2. User Interface Open firefox and From Hadoop>Utilities, select browse file system and move to user/cloudera/output and open file r00000

On opening the r00000 file we get the output

References https://my.vmware.com/en/web/vmware/free#desktop_end_user_comput ing/vmware_workstation_player/12_0 https://www.cloudera.com/downloads/quickstart_vms/5-8.html https://www.cloudera.com/documentation/enterprise/latest/topics/cloud era_quickstart_vm.html#xd_583c10bfdbd326ba-3ca24a24-13d80143249-- 7f9d https://www.cloudera.com/documentation/other/tutorial/CDH5/topics/ht_ usage.html