Hadoop&Hbase Developed Using JAVA USE NETBEANS IDE.

Slides:



Advertisements
Similar presentations
Hadoop Programming. Overview MapReduce Types Input Formats Output Formats Serialization Job g/apache/hadoop/mapreduce/package-
Advertisements

Software and Services Group “Project Panthera”: Better Analytics with SQL, MapReduce and HBase Jason Dai Principal Engineer Intel SSG (Software and Services.
CS525: Special Topics in DBs Large-Scale Data Management HBase Spring 2013 WPI, Mohamed Eltabakh 1.
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
Parallel and Distributed Computing: MapReduce Alona Fyshe.
Chapter 2 Data Models Database Systems: Design, Implementation, and Management, Eleventh Edition, Coronel & Morris.
Google MapReduce Framework A Summary of: MapReduce & Hadoop API Slides prepared by Peter Erickson
Phoenix Liau Trend Micro Cloud Computing Era (Practice)
Lecture 11 – Hadoop Technical Introduction. Terminology Google calls it:Hadoop equivalent: MapReduceHadoop GFSHDFS BigtableHBase ChubbyZookeeper.
CS246 TA Session: Hadoop Tutorial Peyman kazemian 1/11/2011.
An Introduction to MapReduce: Abstractions and Beyond! -by- Timothy Carlstrom Joshua Dick Gerard Dwan Eric Griffel Zachary Kleinfeld Peter Lucia Evan May.
Introduction to Google MapReduce WING Group Meeting 13 Oct 2006 Hendra Setiawan.
MapReduce Programming Yue-Shan Chang. split 0 split 1 split 2 split 3 split 4 worker Master User Program output file 0 output file 1 (1) fork (2) assign.
HBase and Bigtable Storage
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Data Analytics 김재윤 이성민 ( 팀장 ) 이용현 최찬희 하승도. Contents Part l 1. Introduction - Data Analytics Cases - What is Data Analytics? - OLTP, OLAP - ROLAP - MOLAP.
MapReduce Costin Raiciu Advanced Topics in Distributed Systems, 2011.
IBM Research ® © 2007 IBM Corporation INTRODUCTION TO HADOOP & MAP- REDUCE.
Map Reduce Programming Waue Chen. Why ? Moore’s law ?  每隔 18 個月, CPU 的主頻就會增加一倍  2005 開始失效 多核及平行運算時代來臨.
HAMS Technologies 1
MapReduce Design Patterns CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Data storing and data access. Plan Basic Java API for HBase – demo Bulk data loading Hands-on – Distributed storage for user files SQL on noSQL Summary.
IBM Research ® © 2007 IBM Corporation INTRODUCTION TO HADOOP & MAP- REDUCE.
Hadoop Introduction Wang Xiaobo Outline Install hadoop HDFS MapReduce WordCount Analyzing Compile image data TeleNav Confidential.
Parallel Data Mining and Processing with Hadoop/MapReduce CS240A/290N, Tao Yang.
Big Data for Relational Practitioners Len Wyatt Program Manager Microsoft Corporation DBI225.
Hadoop as a Service Boston Azure / Microsoft DevBoston 07-Feb-2012 Copyright (c) 2011, Bill Wilder – Use allowed under Creative Commons license
MapReduce Costin Raiciu Advanced Topics in Distributed Systems, 2012.
MapReduce design patterns Chapter 5: Join Patterns G 진다인.
Data storing and data access. Adding a row with Java API import org.apache.hadoop.hbase.* 1.Configuration creation Configuration config = HBaseConfiguration.create();
Writing a MapReduce Program 1. Agenda  How to use the Hadoop API to write a MapReduce program in Java  How to use the Streaming API to write Mappers.
HBase Elke A. Rundensteiner Fall 2013
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
MapReduce Programming Model. HP Cluster Computing Challenges  Programmability: need to parallelize algorithms manually  Must look at problems from parallel.
HBase and Bigtable Storage Xiaoming Gao Judy Qiu Hui Li.
Before we start, please download: VirtualBox: – The Hortonworks Data Platform: –
Map-Reduce Big Data, Map-Reduce, Apache Hadoop SoftUni Team Technical Trainers Software University
Εξόρυξη γνώσης από Βάσεις Δεδομένων και τον Παγκόσμιο Ιστό
Introduction to Hadoop Owen O’Malley Yahoo!, Grid Team Modified by R. Cook.
HBase Programming 王耀聰 陳威宇 TSMC 教育訓練課程.
Team3: Xiaokui Shu, Ron Cohen CS5604 at Virginia Tech December 6, 2010.
Cloud Computing Mapreduce (2) Keke Chen. Outline  Hadoop streaming example  Hadoop java API Framework important APIs  Mini-project.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, January 31, 2013 Session 2: Hadoop Nuts and Bolts This work is licensed.
© 2015 IBM Corporation Partner Webinar Hadoop and Spark – the value of IBM ? October 21 st, 2015 Ian Radmore / Nikolay Manchev – IBM Analytics.
Graeme Malcolm |
Airlinecount CSCE 587 Spring Preliminary steps in the VM First: log in to vm Ex: ssh vm-hadoop-XX.cse.sc.edu -p222 Where: XX is the vm number assigned.
Parallel Data Processing with Hadoop/MapReduce
Big Data Storage (III) -- 使用 HareDB HBase Client 工具.
HADOOP Priyanshu Jha A.D.Dilip 6 th IT. Map Reduce patented[1] software framework introduced by Google to support distributed computing on large data.
Distributed Systems Lecture 3 Big Data and MapReduce 1.
MAPREDUCE Massive Data Processing (I). Outline MapReduce Introduction Sample Code Program Prototype Programming using Eclipse.
Sort in MapReduce. MapReduce Block 1 Block 2 Block 3 Block 4 Block 5 Map Reduce Output 1 Output 2 Shuffle/Sort.
COMP9313: Big Data Management Lecturer: Xin Cao Course web site: COMP9313: Big Data Management Lecturer: Xin Cao Course web site:
COMP9313: Big Data Management Lecturer: Xin Cao Course web site:
Unit 2 Hadoop and big data
COMP9313: Big Data Management Lecturer: Xin Cao Course web site:
Introduction to Google MapReduce
HBase Mohamed Eltabakh
Map Reduce Program September 25th 2017 Kyung Eun Park, D.Sc.
Lecture 17 (Hadoop: Getting Started)
Central Florida Business Intelligence User Group
MapReduce: Programming
Lecture 11 – Hadoop Technical Introduction
Airlinecount CSCE 587 Fall 2017.
MIT 802 Introduction to Data Platforms and Sources Lecture 2
인공지능연구실 이남기 ( ) 유비쿼터스 응용시스템: 실습 가이드 인공지능연구실 이남기 ( )
Lecture 18 (Hadoop: Programming Examples)
Chapter X: Big Data.
Chapter 10: Big Data.
MIT 802 Introduction to Data Platforms and Sources Lecture 2
Presentation transcript:

Hadoop&Hbase Developed Using JAVA USE NETBEANS IDE

OUTLINE  開發環境安裝  JDK  NetBeans  第一個專案  Hadoop 開發  HBase 開發  架構  基本環境設定  HBase 基本操作  Put Data To HBase  Scan Data In HBase  HIVE 基本操作

NetBeans 介紹  NetBeans 是由昇陽電腦( Sun Microsystems )建立的開放原始碼的軟體開 發工具,是一個開發框架,可擴展的開發平台,可以用於 Java , C 語言/ C++ , PHP , HTML5 等程式的開發,本身是一個開發平台,可以通過擴展外 掛模組來擴展功能。昇陽電腦 Java C 語言 C++ PHP HTML5  在 NetBeans Platform 平台中,應用軟體是用一系列的軟體模組( modular software components )建構出來。而這些模組是一個 jar 檔( Java archive file )它包含了一組 Java 程式的型別而它們實作全依據依 NetBeans 定義了的 公開介面以及一系列用來區分不同模組的定義描述檔( manifest file )。有賴 於模組化帶來的好處,用模組來建構的應用程式可只要加上新的模組就能進 一步擴充。由於模組可以獨立地進行開發,所以由 NetBeans 平台開發出來 的應用程式就能利用著第三方軟體,非常容易及有效率地進行擴充。 Java

下載 NetBeans 

下載 JDK 

安裝 JDK 因我們需要使用 JDK 開發,並且 NetBeans 安裝需要用到,所以需要先安裝 !

安裝 JDK

JDK 安裝完成

NetBeans 安裝

這邊需要選擇 JDK 的路徑 請選擇安裝 JDK 的路徑

NetBeans 安裝

NetBeans 開發環境

第一個專案

專案名稱

第一個專案 執行 程式碼 結果輸出

Hadoop 程式開發

import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;  IMPORT CLASS public class WordCount { // 程式碼 … }

Hadoop 程式開發 public static class Map extends Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); }  MAP

Hadoop 程式開發 public static class Reduce extends Reducer { public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); }  Reduce

Hadoop 程式開發 public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "WordCount"); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); }  main

Hadoop 程式開發 Build 成 JAR 檔 此為 JAR 檔路徑

Hadoop 程式開發  將 JAR 傳至 Hadoop 主機上  並透過以下指令和參數選擇 input 及輸出位置 , 執行 jar 檔  此時可以看到 JOB 執行的訊息  觀看執行結果

Hadoop 程式開發 (2)  Main import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class MaxTemperature { public static void main(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: MaxTemperature "); System.exit(-1); } Job job = new Job(); job.setJarByClass(MaxTemperature.class); job.setJobName("Max temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); }

Hadoop 程式開發 (2)  Mapper import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class MaxTemperatureMapper extends Mapper { private static final int MISSING = public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(15, 19); int airTemperature; if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } String quality = line.substring(92, 93); if (airTemperature != MISSING && quality.matches("[01459]")) { context.write(new Text(year), new IntWritable(airTemperature)); }

Hadoop 程式開發 (2)  Reducer import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class MaxTemperatureReducer extends Reducer public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { int maxValue = Integer.MIN_VALUE; for (IntWritable value : values) { maxValue = Math.max(maxValue, value.get()); } context.write(key, new IntWritable(maxValue)); }

Hadoop 程式開發 (2)  透過下列網只取得 sample 檔  c/all/1901.gz?raw=true c/all/1901.gz?raw=true  將檔案上傳至 HDFS 上  執行 JAR 檔  查 看結果

HBase 程式開發

基本架構 透過 ZooKeeper 連線至 HBase

JAVA JAR ZooKeeper Region Servers HDFS HBase Access HBase Data Use ZooKeeper

基本架構 透過 HIVE 連線至 HBase 並使用 SQL 查詢

JAVAJDBC ZooKeeper Region Servers HDFS HBase Access HBase Data Use Hive Hive JobTrackerHadoop

基礎環境設定  本篇將使用 JAVA 透過 Zookeeper 與 HBase 連線  使用的 JAR 檔 ( 附錄 /lib 中 )  log4j jar  zookeeper jar  hbase-client hadoop2.jar  hadoop-mapreduce-client-core jar  hbase-common hadoop2.jar ….

基礎環境設定  加入 JAR 檔案至專案 右鍵 Add JAR 需先將 JAR 複製至專案目錄 開啟 確認出現在專案中

基礎環境設定  IMPORT CLASS import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hbase.HTableDescriptor; import org.apache.hadoop.hbase.client.HBaseAdmin;

HBase 基本操作  連線設定 public static Configuration GetConfig(){ /*Config*/ Configuration hBaseConfig = HBaseConfiguration.create(); // 連線 zookeeper 主機 IP hBaseConfig.set(“hbase.zookeeper.quorum”,“ ”); //zookeeper PORT hBaseConfig.set(“hbase.zookeeper.property.clientPort”, “2181”); //hbase.maste hBaseConfig.set(“hbase.master”, “ :9000”); }

HBase 基本操作  建立資料表及加入 Family public static void main(String[] args) throws IOException { /* Config */ Configuration hBaseConfig = GetConfig(); HBaseAdmin hBaseAdmin = new HBaseAdmin(hBaseConfig); /*Create Table*/ HTableDescriptor tableDescriptor = new HTableDescriptor("TEST"); /*AddFamily*/ tableDescriptor.addFamily(new HColumnDescriptor("Name")); tableDescriptor.addFamily(new HColumnDescriptor("Birth")); tableDescriptor.addFamily(new HColumnDescriptor("Address")); tableDescriptor.addFamily(new HColumnDescriptor("Sex")); hBaseAdmin.createTable(tableDescriptor); }

HBase 基本操作  在資料表中加入 Column public static void main(String[] args) throws IOException { /* Config */ Configuration hBaseConfig = GetConfig(); HBaseAdmin hBaseAdmin = new HBaseAdmin(hBaseConfig); /*addColumn*/ hBaseAdmin.addColumn("TEST", new HColumnDescriptor ("Chinese")); hBaseAdmin.addColumn("TEST", new HColumnDescriptor ("Type")); hBaseAdmin.addColumn("TEST", new HColumnDescriptor ("Day")); hBaseAdmin.addColumn("TEST", new HColumnDescriptor ("Home")); hBaseAdmin.addColumn("TEST", new HColumnDescriptor ("Sex")); }

HBase 基本操作  HBaseWebUI: 透過網頁介面可以看到新增的資料表

HBase 基本操作  刪除資料表 public static void DeleteTable(String TableName) throws IOException { HBaseAdmin admin = new HBaseAdmin(GetConfig()); admin.disableTable(TableName); admin.deleteTable(TableName); System.out.println("delete table success"); admin.close(); }

Put Data to HBase  增加下列 import import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.util.Bytes;

Put Data to HBase  Put 一個 value 到 HBase HTable table = new HTable(GetConfig(), "TEST");// 指定 table String row="1";// 寫入的 row /* 寫入的 family& 寫入的 column*/ String family[] = {"Name", "Birth","Birth","Address","Sex"}; String column[] = {"Chinese","Type","Day","Home","Sex"}; String value="40";// 寫入的 value byte[] brow = Bytes.toBytes(row); byte[] bfamily = Bytes.toBytes(family[0]); byte[] bcolumn = Bytes.toBytes(column[0]); byte[] bvalue = Bytes.toBytes(value); Put p = new Put(brow); p.add(bfamily, bcolumn, bvalue); table.put(p); TABLE NAME 要寫入的 value

Scan Data In HBase  掃描資料表的所有內容 HTablePool pool = new HTablePool(GetConfig(), 1000); HTable table = (HTable) pool.getTable(tableName); Scan scan = new Scan(); ResultScanner rs = table.getScanner(scan); for (Result r : rs) { out.println(new String(r.getRow(),"UTF-8")); for (KeyValue keyValue : r.raw()) { out.println(new String(keyValue.getFamily(),"UTF-8")); out.println(new String(keyValue.getValue(),"UTF-8")); } TABLE NAME 取得 ROW KEY 逐一掃瞄每個列的內容 取得 Family 名稱 取得 Value 內容

Scan Data In HBase  掃描指定 ROWKEY HTablePool pool = new HTablePool(configuration, 1000); HTable table = (HTable) pool.getTable(tableName); Get scan = new Get(Rowkey.getBytes());// 根據 ROWKEY 查詢 Result r = table.get(scan); out.println (new String(r.getRow(),"UTF-8")); for (KeyValue keyValue : r.raw()) { out.println(new String(keyValue.getFamily(),"UTF-8")); out.println(new String(keyValue.getValue(),"UTF-8")); } TABLE NAME 取得 ROW KEY 逐一掃瞄每個列的內容 取得 Family 名稱 取得 Value 內容

Scan Data In HBase  透過關鍵字查詢指定的 Column HTablePool pool = new HTablePool(configuration, 1000); HTable table = (HTable) pool.getTable(tableName); Filter filter = new SingleColumnValueFilter( Bytes.toBytes(col), Bytes.toBytes("Chinese"), CompareOp.EQUAL, Bytes.toBytes(keyword)); Scan sc = new Scan(); sc.setFilter(filter); ResultScanner rs = table.getScanner(sc); for (Result r : rs) { out.println(new String(r.getRow(),"UTF-8")); for (KeyValue keyValue : r.raw()) { out.println(new String(keyValue.getFamily(),"UTF-8")); out.println(new String(keyValue.getValue(),"UTF-8")); } TABLE NAME 取得 ROW KEY 逐一掃瞄每個列的內容 取得 Family 名稱 取得 Value 內容 關鍵字 指定 Column

開始使用 HIVE  輸入指令後進入 HIVE SHELL hive hive>

HIVE 基本操作  Creating Hive Tables hive> CREATE TABLE pokes (foo INT, bar STRING);  Browsing through Tables hive> SHOW TABLES;  Dropping Tables hive> DROP TABLE pokes;  INSERT hive> INSERT OVERWRITE TABLE tablename [PARTITON(partcol1=val1,partclo2=val2)]select_statement FROM from_statement insert overwrite table test_insert select * from test_table;

HIVE 基本操作  SELECT hive> SELECT * FROM tablename LIMIT 20;

HIVE 基本操作  SELECT+WHERE hive> SELECT * FROM tablename WHERE key = 'r1';