Presentation is loading. Please wait.

Presentation is loading. Please wait.

05 | Processing Big Data with Hive

Similar presentations


Presentation on theme: "05 | Processing Big Data with Hive"— Presentation transcript:

1 05 | Processing Big Data with Hive
Graeme Malcolm | Data Technology Specialist, Content Master Pete Harris | Learning Product Planner, Microsoft

2 Module Overview What is Hive? Creating Hive Tables
Loading Data into Hive Tables Querying Hive Tables Using Hive with PowerShell

3 What is Hive? SELECT… A metadata service that projects tabular schemas over HDFS folders Enables the contents of folders to be queried as tables, using SQL-like query semantics Queries are translated into Map/Reduce jobs

4 Creating Hive Tables Use the CREATE TABLE HiveQL statement
Defines schema metadata to be projected onto data in a folder when the table is queried (not when it is created) Specify file format and file location Defaults to sequencefile format in the /hive/warehouse/<table_name> folder Create internal or external tables Internal tables manage the lifetime of the underlying folders External tables are managed independently from folders

5 CREATE TABLE Internal table (folders deleted when table is dropped)
CREATE TABLE table1 (col1 STRING, col2 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '; Default location (/hive/warehouse/table1) CREATE TABLE table2 (col1 STRING, col2 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION '/data/table2'; Stored in a custom location (but still internal, so the folder is deleted when table is dropped) CREATE EXTERNAL TABLE table3 (col1 STRING, col2 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION '/data/table3'; External table (folders and files are left intact in Azure Blob Store when the table is dropped)

6 Loading Data into Hive Tables
Save data files in table folders Use the LOAD statement Moves or copies files to the appropriate folder Use the INSERT statement Inserts data from one table to another LOAD DATA LOCAL INPATH '/data/source' INTO TABLE MyTable; FROM StagingTable INSERT INTO TABLE MyTable SELECT Col1, Col2;

7 Querying Hive Tables with HiveQL
Query data using the SELECT statement Hive translates the query into Map/Reduce jobs and applies the table schema to the underlying data files SELECT Col1, SUM(Col2) AS TotalCol2 FROM MyTable WHERE Col1 >= ' ' AND Col1 <= ' ' GROUP BY Col1 ORDER BY Col1;

8 Demo: Using Hive In this demonstration, you will see how to:
Create Hive Tables Load Data into Hive Tables Query a Hive Table with HiveQL Drop a Hive Table

9 Using Hive in PowerShell
The AzureHDInsightHiveJobDefinition cmdlet Create a job definition Use Query for explicit HiveQL statements, or File to reference a saved script Run the job with the Start-AzureHDInsightJob cmdlet The Invoke-Hive cmdlet Simpler syntax to run a HiveQL query

10 Demo: Using Hive in PowerShell
In this demonstration, you will see how to: Use PowerShell to Run a HiveQL Command Use PowerShell to Query a Hive Table

11 Module Summary Hive enables Map/Reduce processing through SQL-like syntax Internal tables manage the lifetime of their data, External tables are metadata only Use HiveQL queries in PowerShell scripts to perform Hive operations and retrieve data

12


Download ppt "05 | Processing Big Data with Hive"

Similar presentations


Ads by Google