SHARE THE KNOWLEDGE CopyLeft © 2012 BIWORLD 하둡설치 overview 2012. 10. 15(mon) BIWORLD 운영자 김기선 BIWORLD.

Slides:



Advertisements
Similar presentations
The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
Advertisements

Mapreduce and Hadoop Introduce Mapreduce and Hadoop
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
KX-NS1000 Initial Set Up For step by step : 16 May,
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Hadoop Setup. Prerequisite: System: Mac OS / Linux / Cygwin on Windows Notice: 1. only works in Ubuntu will be supported by TA. You may try other environments.
 Contents 1.Introduction about operating system. 2. What is 32 bit and 64 bit operating system. 3. File systems. 4. Minimum requirement for Windows 7.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
Hadoop Demo Presented by: Imranul Hoque 1. Topics Hadoop running modes – Stand alone – Pseudo distributed – Cluster Running MapReduce jobs Status/logs.
Jian Wang Based on “Meet Hadoop! Open Source Grid Computing” by Devaraj Das Yahoo! Inc. Bangalore & Apache Software Foundation.
© 2010 VMware Inc. All rights reserved VMware ESX and ESXi Module 3.
Integrating HADOOP with Eclipse on a Virtual Machine Moheeb Alwarsh January 26, 2012 Kent State University.
Ssh: secure shell. overview Purpose Protocol specifics Configuration Security considerations Other uses.
GROUP 7 TOOLS FOR BIG DATA Sandeep Prasad Dipojjwal Ray.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
Inter-process Communication in Hadoop
Linux Filesystem Management
Building service testbeds on FIRE D5.2.5 Virtual Cluster on Federated Cloud Demonstration Kit August 2012 Version 1.0 Copyright © 2012 CESGA. All rights.
© 2008 Cisco Systems, Inc. All rights reserved.CIPT1 v6.0—2-1 Administering Cisco Unified Communications Manager Understanding Cisco Unified Communications.
Guide to Linux Installation and Administration, 2e1 Chapter 8 Basic Administration Tasks.
System Administration and Basic Functionality Version 4.0 – September 2007 Q-Advisor Quick Start.
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture VI: 2014/04/14.
Internet of Things with Intel Edison Compiling and running Pierre Collet Intel Software.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
ZhangGang Since the Hadoop farm has not successfully configured at CC, so I can not do some test with HBase. I just use the machine named.
ITI-481: Unix Administration Meeting 3. Today’s Agenda Hands-on exercises with booting and software installation. Account Management Basic Network Configuration.
1 Installation When this module is complete, you will be able to:  Set a static IP address for your laptop  Install the snom ONE software  Navigate.
Linux Network Configuration Linux System Administration /etc/resolv.conf Tells the kernel which name server should be queried when a program asks to "resolve"
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Networking in Linux. ♦ Introduction A computer network is defined as a number of systems that are connected to each other and exchange information across.
Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.
Apache Mahout. Prerequisites for Building MAHOUT Java JDK 1.6 Maven 3.0 or higher ( ). Subversion (optional)
7200 Samsung Confidential & Proprietary Information Copyright 2006, All Rights Reserved. 1/16 OfficeServ 7200 Enterprise IP Solutions Data Server S/W Upgrade.
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
MapReduce on FutureGrid Andrew Younge Jerome Mitchell.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Cole Jaya Chakladar Group No: 1.
SSH Tricks for CSF Slide 1 NEbraskaCERT SSH Tricks Matthew G. Marsh 05/21/03.
Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  Concept of the Project  System architecture  Implementation – HDFS  Implementation – System.
Review Please hand in any homework and practicals Vim Scripting Inter-device communication.
Host Security Overview Onion concept of security Defense in depth How secure do you need to be? You can only reduce risk Tradeoffs - more security means:
Set up environment for mapreduce developing on Hadoop.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
17 Establishing Dial-up Connection to the Internet Using Windows 9x 1.Install and configure the modem 2.Configure Dial-Up Adapter 3.Configure Dial-Up Networking.
Hadoop Joshua Nester, Garrison Vaughan, Calvin Sauerbier, Jonathan Pingilley, and Adam Albertson.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Kole Jaya Chakladar Group No: 1.
Linux Operations and Administration
Unit – 5 FTP Server. FTP Introduction One of the oldest and most commonly used protocols The original specification for the File Transfer Protocol was.
Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.
Installing VERITAS Cluster Server. Topic 1: Using the VERITAS Product Installer After completing this topic, you will be able to install VCS using the.
Configuring Your First Hadoop Cluster On Amazon EC2 Benjamin Wootton
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
Chapter 7: Using Network Clients The Complete Guide To Linux System Administration.
Hadoop. Introduction Distributed programming framework. Hadoop is an open source framework for writing and running distributed applications that.
Ssh: secure shell.
Chap-I Network and System Configuration in Linux
Hadoop Architecture Mr. Sriram
Unit 2 Hadoop and big data
Set up environment for mapreduce developing on Hadoop
Chapter 10 Data Analytics for IoT
TABLE OF CONTENTS. TABLE OF CONTENTS Not Possible in single computer and DB Serialised solution not possible Large data backup difficult so data.
Presented by: - Yogesh Kumar
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
Lab 7 - Topics Establishing SSH Connection Install SSH Configure SSH
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
Lecture 16 (Intro to MapReduce and Hadoop)
Presentation transcript:

SHARE THE KNOWLEDGE CopyLeft © 2012 BIWORLD 하둡설치 overview (mon) BIWORLD 운영자 김기선 BIWORLD

1 / 15 BI WORLD 1. 하둡설치 사전 준비사항 - 리눅스 설치 Prerequisites Redhat 기업용 레드헷 / 페도라

2 / 15 BI WORLD ** 리눅스 설치 개요 - 최소한의 패키지만 설치합니다. - 최소한의 데몬만 실행합니다. 리눅스 설치 1.Media Test : [Skip] 선택 / [Enter] 2.CentOS 6 설치 : [Next] 3. 언어선택 : [ 한국어 ] 4. 키보드 : [Next] 5. 설치와 관련된 장치의 종류 : [Basic Storage Devie] 선택 6. 기존시스템 설치된 경우 : [ 새로설치 ] 선택 7. 호스트명 : namenode 입력 (secondnode,datanode1,datanode2...) [Configure Network] : 네트워크 설정 옵션 8. 시간대설정 ( 서울 ) : [Next] 9. 루트 암호 입력 : *** 입력 10. 설치 종류 : [Replace Existing Linux System(s) ] 선택 [ 파티션 설정 재확인 및 수정 ] 체크 11. 파티션 편집 : 적절히 파티션 용량을 결정하시면 됩니다. [Next] -> 포멧 경고 팝업창 : [ 포멧 ] 클릭 Writing storage configuration to disk 팝업창 : [Write changes to disk] 클릭 12.Boot Loader : [Next] 13. 소프트웨어 설치 : [Minimal] 선택 + 사용자정의 설치 * 개발 : 개발용도구 * 기반시스템 : 기본, 네트워킹도구, 레거시유닉스호환성 * 서버 :FTP 서버, 시스템관리도구 * 시스템관리 :SNMP 지원, 시스템관리 14. 설치완료 : 리부팅 Prerequisites 파티션 /boot//usr/home/dataswap 용량 100M20G30G8G

3 / 15 BI WORLD 네트워크 설정 ( 로컬 IP ) # /etc/sysconfig/network NETWORKING=yes HOSTNAME=namenode GATEWAY= # /etc/sysconfig/network-scripts/ifcfg-eth0 NETMASK= IPADDR= GATEWAY= $ /etc/rc.d/init.d/network restart # /etc/resolv.conf nameserver 기타 필수 유틸리티 설치 $ yum -y install nmap $ yum -y install ntsysv wget 필수 데몬만 실행하기 $ ntsysv 서버 리부팅 $ shutdown -r now Prerequisites

4 / 15 BI WORLD 1. 하둡설치 사전 준비사항 - JDK 설치 or later version - hadoop 은 java 로 만들어진 분산파일 Framework > # RPM 으로 JDK 설치 $ rpm -ivh jdk/jdk-7u4-linux-x64.rpm # 버전에 따라 하둡 설정을 바꾸지 않도록 /usr/local/java 링크생성 $ ln -s /usr/java/jdk1.7.0_04 /usr/local/java Prerequisites

5 / 15 BI WORLD 2. 하둡설치 1) 방화벽 설정 - To check the status of iptables, you can use these commands under root privilege: /etc/init.d/iptables status You can simply turn iptables o, or at least open these ports: 9000; 9001; 50010; 50020; 50030; 50060; 50070; 50075; Iptables 설정하는 부분은 인터넷에서 참조 DaemonDefault PortConfiguration Parameter HDF S Namenode50070dfs.http.address Datanodes50075dfs.datanode.http.address Secondarynamenode50090dfs.secondary.http.address Backup/Checkpoint node50105dfs.backup.http.address MRJobracker50030mapred.job.tracker.http.address Tasktrackers50060mapred.task.tracker.http.address Replaces secondarynamenode in Hadoop daemons expose some information over HTTP. All Hadoop daemons expose the following 하둡설치 하둡사용 Port

6 / 15 BI WORLD 2. 하둡설치 1) 방화벽 설정 ( 계속 ) 하둡설치 DaemonDefault PortConfiguration ParameterProtocolUsed for Namenode8020 fs.default.name ? IPC: ClientProtocol Filesystem metadata operations. Datanode50010 dfs.datanode.address Custom Hadoop Xceiver: DataNode and DFSClient DFS data transfer Datanode50020 dfs.datanode.ipc.address IPC: InterDatanodeProtocol, ClientDatanodeProtocol ClientProtocol Block metadata operations and recovery Backupnode50100 dfs.backup.address Same as namenodeHDFS Metadata Operations JobtrackerIll-defined. ? mapred.job.tracker IPC: JobSubmissionProtocol, InterTrackerProtocol Job submission, task tracker heartbeats. Tasktracker :0 ¤ mapred.task.tracker.report.address IPC: TaskUmbilicalProtocol Communicating with child jobs ? This is the port part of hdfs://host:8020/. ? Default is not well-defined. Common values are 8021, 9001, or See MAPREDUCE-566. Binds to an unused local port.MAPREDUCE-566 하둡사용 Port

7 / 15 BI WORLD 2. 하둡설치 2) 사용자 생성 Create group hadoop user: $groupadd hadoop user Create user hadoop: $useradd -g hadoop user -s /bin/bash -d /home/hadoop hadoop  in which -g species user hadoop belongs to group hadoop user, -s species the shell to use, -d species the home folder for user hadoop. Set password for user hadoop: $passwd hadoop 하둡설치 사용자 생성

8 / 15 BI WORLD 2. 하둡설치 3) ssh 설정 Login to each node with the account hadoop and run the following command: ssh-keygen -t rsa This command is used to generate the pair of public and private keys. \-t" species the type of keys, here we use RSA algorithm. When questions are asked, simply press enter to continue. Then two files id_rsa and id_rsa.pub are created under the folder /home/hadoop/.ssh/ Establish authentications: Now we can copy the public key of masternode to all the slavenodes. Login to masternode with account hadoop and run the following command: cat /home/hadoop/.ssh/id_rsa.pub /home/hadoop/.ssh/authorized_keys scp /home/hadoop/.ssh/id_rsa.pub >:/home/hadoop/.ssh/master.pub The second command should be executed several times until the public key is copied to all the slavenodes. Please note that ip address of slavenodei can be replaced with the domain name of slavenodei. Then we can login to each slavenode with account hadoop and run the following command: cat /home/hadoop/.ssh/master.pub /home/hadoop/.ssh/authorized_keys Then login back to masternode with account hadoop, and run ssh ip address of slavenodei to test whether masternode can login to slavenodes without password authentication. 하둡설치 SSH 설정

9 / 15 BI WORLD 2. 하둡설치 4) Hadoop 환경 설정 # /etc/profile 파일 수정 JAVA_HOME=/usr/local/java CLASSPATH=/usr/local/java/jre/lib/* export JAVA_HOME CLASSPATH pathmunge /usr/local/java after pathmunge /usr/local/java/bin after export HADOOP_HOME=/home/hadoop/hadoop export HADOOP_CONF=/home/hadoop/hadoop /conf export PATH=HADOOP HOME/bin:$PATH $ source /etc/profile 하둡설치 Hadoop 환경 설정 – /etc/profile

10 / 15 BI WORLD 2. 하둡설치 4) Hadoop 환경 설정  hadoop tar.gz 폴더 압축 해제 # $HADOOP_CONF 폴더 내의 3 개 xml 파일 설정 HADOOP_HOME/src/core/core-default.xml, HADOOP_HOME/src/hdfs/hdfs-default.xml, HADOOP_HOME/src/mapred/mapred-default.xml. 하둡설치 Hadoop 환경 설정 – $HADOOP_CONF

11 / 15 BI WORLD 2. 하둡설치 5) 나머지 노드 환경 설정 및 설정 완료된 hadoop 폴더 / 파일 복사 you need to set environment variables on each slavenode, like previous slides. Remote Copy Hadoop Folder to SlaveNodes Now that we have congured hadoop on the masternode, we can use remote copy command to replicate the hadoop folder to all the slavenodes. scp -r /home/hadoop/hadoop >:/home/hadoop/ 하둡설치 Hadoop 환경 설정 – 나머지 노드 환경 설정 및 hadoop 폴더 복사

12 / 15 BI WORLD 3. 하둡실행 To format the namenode is simple. Login to the masternode with account \hadoop", Run this command: hadoop namenode -format A message will be displayed to report the success of formatting. Then we can start the cluster: start-all.sh Alternatively, you can choose to start the le system only by using start-dfs.sh, or start map-reduce job by start-mapred.sh. To stop the cluster, use command stop-all.sh If there is no mistakes in the previous installation and conguration, we should nd no errors or exceptions in the log les in HADOOP HOME/logs/. We can use the web browser to get more information of the hadoop cluster. Here are some useful links: Hadoop Distributed File System (HDFS): address or domain name of namenode:50070 Hadoop Jobtracker: address or domain name of jobtracker:50030 Hadoop Tasktracker: address or domain name of map-reduce processor:50060 하둡설치 Hadoop 실행

13 / 15 BI WORLD 2. 하둡설치 appendix) hadoop cluster 구조도 – 5nodes 구성 하둡설치 Hadoop 환경 설정 – 클러스터 구조도 NameNodeSecond NN JobTracker TaskTracker DataNode 1DataNode 2DataNode 3 Zookeeper pig,hive,oozie flume,sqoop hue chukwa agent chukwa collect chukwa agent chukwa collect chukwa agent chukwa collect pig,hive,oozie chukwa HICC hue HBase MasterHBase Slave Hadoop StandAlone

14 / 15 BI WORLD 2. 하둡설치 1) 리눅스 설치 2) JDK 설치 or later version 3) 방화벽설정 (iptables – root 권한필요 ) 4) 사용자 / 사용자그룹 생성 5) ssh 설정 6) /etc/profile 설정 및 hadoop tar.gz 폴더 압축 해제 후 하둡 xml 파일 설정 7) 나머지 노드에서 1)~6) 까지 반복 8) 하둡 포멧 및 하둡 기동 하둡설치 Hadoop 설치 - review

15 / 15 BI WORLD 하둡설치 Q & A 감사합니다.