By Chris immanuel, Heym Kumar, Sai janani, Susmitha

Slides:



Advertisements
Similar presentations
1 VIRTUAL MACHINES By: Sai Siddharth Kumar Dantu.
Advertisements

Overview of MapReduce and Hadoop
LIBRA: Lightweight Data Skew Mitigation in MapReduce
MapR – HADOOP DEVELOPMENT IN A VIRTUAL MACHINE Thomas Tiahrt, MA, PhD CSC482 Introduction to Text Analytics.
ProjectWise Virtualization Kevin Boland. What is Virtualization? Virtualization is a technique for deploying technologies. Virtualization creates a level.
Paper on Best implemented scientific concept for E-Governance Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola By Nitin V. Choudhari, DIO,NIC,Akola.
To run the program: To run the program: You need the OS: You need the OS:
VMware vSphere 4 Introduction. Agenda VMware vSphere Virtualization Technology vMotion Storage vMotion Snapshot High Availability DRS Resource Pools Monitoring.
Paper on Best implemented scientific concept for E-Governance projects Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola.
Introduction to VMware Virtualization
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
VMware vSphere Configuration and Management v6
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.
Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD
By: Joel Dominic and Carroll Wongchote 4/18/2012.
1 Automated Power Management Through Virtualization Anne Holler, VMware Anil Kapur, VMware.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
Using Virtualization in the Classroom
”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.
Virtualization for Cloud Computing
Guide to Operating Systems, 5th Edition
Virtual Machine Monitors
Introduction to VMware Virtualization
Big Data is a Big Deal!.
Bentley Systems, Incorporated
MapReduce Compiler RHadoop
Hadoop Aakash Kag What Why How 1.
Introduction to Distributed Platforms
INTRODUCTION TO BIGDATA & HADOOP
How to download, configure and run a mapReduce program In a cloudera VM Presented By: Mehakdeep Singh Amrit Singh Chaggar Ranjodh Singh.
Virtualization OVERVIEW
Chapter 10 Data Analytics for IoT
Distributed Network Traffic Feature Extraction for a Real-time IDS
Large-scale file systems and Map-Reduce
Running virtualized Hadoop, does it make sense?
Building a Virtual Infrastructure
Spark Presentation.
TABLE OF CONTENTS. TABLE OF CONTENTS Not Possible in single computer and DB Serialised solution not possible Large data backup difficult so data.
Introduction to HDFS: Hadoop Distributed File System
Hadoop Clusters Tess Fulkerson.
Software Engineering Introduction to Apache Hadoop Map Reduce
1. 2 VIRTUAL MACHINES By: Satya Prasanna Mallick Reg.No
Hands-On Virtualization in the Classroom
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Ministry of Higher Education
Overview Introduction VPS Understanding VPS Architecture
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
The Basics of Apache Hadoop
Cloud Distributed Computing Environment Hadoop
Big Data - in Performance Engineering
CS6604 Digital Libraries IDEAL Webpages Presented by
湖南大学-信息科学与工程学院-计算机与科学系
MapReduce: Data Distribution for Reduce
Virtualization 101.
Guide to Operating Systems, 5th Edition
CS110: Discussion about Spark
Ch 4. The Evolution of Analytic Scalability
Introduction to Apache
Virtualization.
CS 345A Data Mining MapReduce This presentation has been altered.
Zoie Barrett and Brian Lam
Experiences with Hadoop and MapReduce
Hadoop Installation Fully Distributed Mode
A Virtual Machine Monitor for Utilizing Non-dedicated Clusters
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

By Chris immanuel, Heym Kumar, Sai janani, Susmitha How to download, configure and run a MapReduce program in a Cloudera VM? By Chris immanuel, Heym Kumar, Sai janani, Susmitha

Overview What is Big Data? MapReduce MapReduce Architecture Why virtualize Hadoop? VMware Key features of VMware Server Virtualization VMPlayer Configuration Running the Wordcount Program

What is Big Data? Big data is being generated around us at all times Big data is arriving at an alarming velocity, volume and variety To extract meaningful value from the big data, you need optimal processing power, analytics capability and skills Big data uses the Mapreduce technique to process and analyse the data

MapReduce Map Reduce is the abstraction behind Hadoop. The unit of execution is the job Job has input , output, Map function and Reduce function Input and output have the key value pairs, where these two functions are provided by the developers. There are 2 different types of job phases: Mapping phase and Reducing phase In Mapping phase, map function is applied to Input data and an intermediate data is generated. In Reducing phase, Reduce function is applied to intermediate data and final output is generated.

MapReduce Architecture

Why virtualize Hadoop? Simplified Hadoop cluster configuration and provisioning. Support Hadoop usage in existing virtualized datacenters. Support multi-tenant environments.

VMware VMware is a subsidiary of Dell Technologies that provides cloud and virtualization software and services which is rapidly transforming the IT landscape and fundamentally changing the way that people compute. With Virtualization it is possible to run multiple operating systems and multiple applications on the same SERVER at the same time, increasing the utilization and flexibility of hardware.

Key Features of VMware Server Virtualization Partitioning: Different OS can run on one physical machine System resources can be divided between virtual machines Isolation Fault and security isolation on a hardware level Extended resource control for constant performance. Encapsulation Complete status of a virtual machine can be stored in a file Move and copy of a virtual machine is as easy as it is with files

VMPlayer Configuration VM player host system should have the following configuration 64-bit x86 CPU with 1.3 GHz or faster core speed. Multiprocessor systems should be have 8 Gb RAM The virtual machine we used is LINUX operating system Running in 2 core processors Occupying 3.5 Gb of memory Hark disk is 40Gb

Running the Wordcount Program

OUTPUT

Configuring HDFS to run in Pseudo-distributed mode

Configuring HDFS to run in Pseudo-distributed mode To start the namenode use the command start-dfs.sh To see list of the nodes which are being executed we use the command jps To stop the HDFS we use the command stop-dfs.sh

THANK YOU!