Hive Installation Guide and Practical Example Lecturer : Prof. Kyungbaek Kim Presenter : Alvin Prayuda Juniarta Dwiyantoro
Installation Guide(1) How to install Hive v Requirements Java 1.6 (example use java-7-openjdk) Hadoop 0.20.x, 0.23.x, or 2.0.x (example use Hadoop in pseudo mode)
Installation Guide(2) Download Hive from a Stable Release bin.tar.gz bin.tar.gz Extract the tar files and move it to preferred location (example use /usr/local/hive) tar –xvzf hive-x.y.z.tar.gz mv hive-x.y.z /usr/local/hive Modify ~/.bashrc and add the following statement in the last line Export HIVE_HOME=/usr/local/hive Export PATH=$HIVE_HOME/bin:$PATH source ~/.bashrc
Configuration Guide(1) Hive uses Hadoop, so modify ~/.bashrc to add Hadoop in the path or add the following statement export HADOOP_HOME= (example use /usr/local/hadoop) Start hadoop dfs and yarn start-dfs.sh start-yarn.sh
Configuration Guide(2) Create /tmp and /user/hive/warehouse in the HDFS and set them chmod g+2 hadoop fs –mkdir /tmp hadoop fs –mkdir /user/hive/warehouse hadoop fs –chmod g+w /tmp hadoop fs –chmod g+w /user/hive/warehouse
Configuration Guide(3) Go to /usr/local/hive/conf cd /usr/local/hive/conf Change the name of these configuration files template hive-env.sh.template hive-env.sh hive-default.xml.template hive-default.xml hive-exec-log4j.properties.template hive-exec-log4j.properties hive-log4j.properties.template hive-log4j.properties
Configuration Guide(4) Create new file, add these statement below and save as hive-site.xml fs.defaultFS hdfs://localhost:9000 mapred.job.tracker localhost:50030
Configuration Guide(5) Open file hive-env.sh Uncomment HADOOP_HOME and HIVE_CONF_DIR and modify it like below export HADOOP_HOME=/usr/local/hadoop export HIVE_CONF_DIR=/usr/local/hive/conf Run hive CLI Hive Note : if the configuration is correct, all table created will exist in HDFS /user/hive/warehouse
Practical Example(1) Download example data from Extract the file, we will use Batting.csv data Copy the data into HDFS hadoop fs -put /home/hduser/Downloads/Batting.csv /user/hive Enter hive cli
Practical Example(2) Create table temp_batting create table temp_batting(col_value string); Load data from Batting.csv to temp_batting load data inpath ’user/hive/Batting.csv’ overwrite into table temp_batting; To see the data format select * from temp_batting;
Practical Example(2) Create new table batting create table batting(player_id string, year int, runs int); Extract information from temp_batting to batting insert overwrite table batting select regexp_extract(col_value, ‘^(?:([^,]*)\,?){1}’, 1) player_id, regexp_extract(col_value, ‘^(?:([^,]*)\,?){2}’, 1) year, regexp_extract(col_value, ‘^(?:([^,]*)\,?){9}’, 1) runs from temp_batting; View the resulting table select * from batting;
Practical Example(3) Find the highest run for each year select year, max(runs) from batting group by year; Find the corresponding player for highest run each year select a.year, a.player_id, a.runs from batting a join (select year,max(runs) runs from batting group by year) b on (a.year = b.year and a.runs = b.runs) ; Delete table temp_batting drop table temp_batting;
Screenshot of Practical Example