Hive Installation Guide and Practical Example Lecturer : Prof. Kyungbaek Kim Presenter : Alvin Prayuda Juniarta Dwiyantoro.

Slides:



Advertisements
Similar presentations
COMPANY LOGO HERE Getting Started 1. Download the setup file: Go to Click on the Visit Setup Page link (includes Java.
Advertisements

Platforms: Unix and on Windows. Linux: the only supported production platform. Other variants of Unix, like Mac OS X: run Hadoop for development. Windows.
Working with pig Cloud computing lecture. Purpose  Get familiar with the pig environment  Advanced features  Walk though some examples.
Procedure Editor :36. Procedure Editor :36 „Enable SampleTrack“ checked means: - Orders created with SampleTrack program „Enable.
1 Hadoop HDFS Install Hadoop HDFS with Ubuntu
Hadoop Setup. Prerequisite: System: Mac OS / Linux / Cygwin on Windows Notice: 1. only works in Ubuntu will be supported by TA. You may try other environments.
Hadoop Demo Presented by: Imranul Hoque 1. Topics Hadoop running modes – Stand alone – Pseudo distributed – Cluster Running MapReduce jobs Status/logs.
CS525: Big Data Analytics MapReduce Languages Fall 2013 Elke A. Rundensteiner 1.
Introduction to Apache Hadoop CSCI 572: Information Retrieval and Search Engines Summer 2010.
Hola Hadoop. 0. Clean-Up The Hard-disks Delete tmp/ folder from workspace/mdp-lab3 Delete unneeded downloads.
Example: Jena and Fuseki
Cassandra Installation Guide and Example Lecturer : Prof. Kyungbaek Kim Presenter : I Gde Dharma Nugraha.
ICP Kit 2011 HHC Data Entry Module The World Bank ICP Kit Training African Development Bank.
Hive : A Petabyte Scale Data Warehouse Using Hadoop
Whirlwind tour of Hadoop Inspired by Google's GFS Clusters from systems Batch Processing High Throughput Partition-able problems Fault Tolerance.
1 Bulk Administration Tool BAT Installation and Configuration By Prof. Valencia Community College.
Assessment for Learning Questionnaire Staff Guide.
ABAQUS Installation on LINUX Platform D. Hanumanthappa, A. Jérusalem May 5th, 2010.
Hive Facebook 2009.
Data storing and data access. Plan Basic Java API for HBase – demo Bulk data loading Hands-on – Distributed storage for user files SQL on noSQL Summary.
A NoSQL Database - Hive Dania Abed Rabbou.
Presented by Priagung Khusumanegara Prof. Kyungbaek Kim
Jabberd Quick Installation Guide. The jabberd server is the original open-source server implementation of the Jabber protocol, and is the most popular.
U1_a02_copyright text and images ADD NAME HERE. Insert below a copyright free IMAGE that could be used in your health and safety presentation.
Procedure Editor :17 This SOP describes the Setting procedure and order folders. How to load procedures in order files for sources and.
Basic Unix Commands CGS 3460, Lecture 6 Jan 23, 2006 Zhen Yang.
WDO-It! 102 Workshop: Using an abstraction of a process to capture provenance UTEP’s Trust Laboratory NDR HP MP.
Apache Mahout. Prerequisites for Building MAHOUT Java JDK 1.6 Maven 3.0 or higher ( ). Subversion (optional)
Department of Computer Engineering Dongguk University Prof. Jin-Woo Jung Practice hour : 2008/11/14 8. Qt / Embedded.
Tiling Z Stacks on the LCI510 Kim Peifley 03/23/15.
Department of Computer Engineering Dongguk University Prof. Jin-Woo Jung T.A. Han-Mu Park
Hadoop: what is it?. Hadoop manages: – processor time – memory – disk space – network bandwidth Does not have a security model Can handle HW failure.
ZHT Hands-on tutorial How to install, configure and run ZHT on a multi-nodes cluster.
Pig Installation Guide and Practical Example Presented by Priagung Khusumanegara Prof. Kyungbaek Kim.
Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.
Using Sequence Files. Mahout Installation – wget distribution-0.9.tar.gz
SQOOP INSTALLATION GUIDE Lecturer : Prof. Kyungbaek Kim Presenter : Zubair Amjad.
Installing 9.6 BDE binaries on hadoop data nodes Snapshots captured from Cloudera sandbox.
下載 hadoop 1: 安裝 jdk 1: sudo add-apt-repository ppa:webupd8team/java.
More Oracle SQL Scripts. Highlight (but don’t open) authors table, got o External data Excel, and make an external spreadsheet with the data.
Chapter 3 Servlet Basics. Contents A.Installing Eclipse WTP and configuring Tomcat B.Prime Number Problem C.Multiplication Table Problem.
{ Analyzer Tutorial By You will be able to find the download link of the latest version here.
Before the Session Verify HDInsight Emulator properly installed Verify Visual Studio and NuGet installed on emulator system Verify emulator system has.
Open the standard.idw template. Save copy as a different file name. If you want to create as a template, save to the c:\Program Files\Autodesk \Inventor10.
Do I Understand Power Point? Assessment Game. Question 1 What is Power Point?
EML 2023 – Modeling, Parts Lecture 1.1 –Configuring SolidWorks.
Debugging RTC CLI in Eclipse
1 Database Systems Introduction to Microsoft Access Part 1.
Oozie – Workflow Engine
Development Environment Basics
Hadoop Architecture Mr. Sriram
CSCE 742 Software Architectures
Connect:Direct for UNIX v4.2.x Silent Installation
What’s the difference between the Ribbon and the Quick Access Toolbar?
How to use this template
Presented by: - Yogesh Kumar
Getting Data into Hadoop
Hive Mr. Sriram
Using Linux and Lab Setup OS Lab 1
Three modes of Hadoop.
Fractals help.
How to use this template
HIVE CSCE 587 Spring 2018.
worldsystembuilder.com/gceo
11/22/2018 7:29:58 AM.
PuTTY Download Putty from:
56% Enter your text here This is a sample text. Enter your text here
05 | Processing Big Data with Hive
Presentation transcript:

Hive Installation Guide and Practical Example Lecturer : Prof. Kyungbaek Kim Presenter : Alvin Prayuda Juniarta Dwiyantoro

Installation Guide(1) How to install Hive v Requirements Java 1.6 (example use java-7-openjdk) Hadoop 0.20.x, 0.23.x, or 2.0.x (example use Hadoop in pseudo mode)

Installation Guide(2) Download Hive from a Stable Release bin.tar.gz bin.tar.gz Extract the tar files and move it to preferred location (example use /usr/local/hive) tar –xvzf hive-x.y.z.tar.gz mv hive-x.y.z /usr/local/hive Modify ~/.bashrc and add the following statement in the last line Export HIVE_HOME=/usr/local/hive Export PATH=$HIVE_HOME/bin:$PATH source ~/.bashrc

Configuration Guide(1) Hive uses Hadoop, so modify ~/.bashrc to add Hadoop in the path or add the following statement export HADOOP_HOME= (example use /usr/local/hadoop) Start hadoop dfs and yarn start-dfs.sh start-yarn.sh

Configuration Guide(2) Create /tmp and /user/hive/warehouse in the HDFS and set them chmod g+2 hadoop fs –mkdir /tmp hadoop fs –mkdir /user/hive/warehouse hadoop fs –chmod g+w /tmp hadoop fs –chmod g+w /user/hive/warehouse

Configuration Guide(3) Go to /usr/local/hive/conf cd /usr/local/hive/conf Change the name of these configuration files template hive-env.sh.template  hive-env.sh hive-default.xml.template  hive-default.xml hive-exec-log4j.properties.template  hive-exec-log4j.properties hive-log4j.properties.template  hive-log4j.properties

Configuration Guide(4) Create new file, add these statement below and save as hive-site.xml fs.defaultFS hdfs://localhost:9000 mapred.job.tracker localhost:50030

Configuration Guide(5) Open file hive-env.sh Uncomment HADOOP_HOME and HIVE_CONF_DIR and modify it like below export HADOOP_HOME=/usr/local/hadoop export HIVE_CONF_DIR=/usr/local/hive/conf Run hive CLI Hive Note : if the configuration is correct, all table created will exist in HDFS /user/hive/warehouse

Practical Example(1) Download example data from Extract the file, we will use Batting.csv data Copy the data into HDFS hadoop fs -put /home/hduser/Downloads/Batting.csv /user/hive Enter hive cli

Practical Example(2) Create table temp_batting create table temp_batting(col_value string); Load data from Batting.csv to temp_batting load data inpath ’user/hive/Batting.csv’ overwrite into table temp_batting; To see the data format select * from temp_batting;

Practical Example(2) Create new table batting create table batting(player_id string, year int, runs int); Extract information from temp_batting to batting insert overwrite table batting select regexp_extract(col_value, ‘^(?:([^,]*)\,?){1}’, 1) player_id, regexp_extract(col_value, ‘^(?:([^,]*)\,?){2}’, 1) year, regexp_extract(col_value, ‘^(?:([^,]*)\,?){9}’, 1) runs from temp_batting; View the resulting table select * from batting;

Practical Example(3) Find the highest run for each year select year, max(runs) from batting group by year; Find the corresponding player for highest run each year select a.year, a.player_id, a.runs from batting a join (select year,max(runs) runs from batting group by year) b on (a.year = b.year and a.runs = b.runs) ; Delete table temp_batting drop table temp_batting;

Screenshot of Practical Example