05 | Processing Big Data with Hive

Slides:



Advertisements
Similar presentations
Day 9. SELECT INSERT UPDATE DELETE » The standard UPDATE statement. UPDATE table SET field1=val1, field2=val2 WHERE condition » Multiple table UPDATE.
Advertisements

Session 2Introduction to Database Technology Data Types and Table Creation.
Creating Tables. 2 home back first prev next last What Will I Learn? List and provide an example of each of the number, character, and date data types.
Basic SQL Introduction Presented by: Madhuri Bhogadi.
CS525: Special Topics in DBs Large-Scale Data Management MapReduce High-Level Langauges Spring 2013 WPI, Mohamed Eltabakh 1.
Chapter 18 - Data sources and datasets 1 Outline How to create a data source How to use a data source How to use Query Builder to build a simple query.
Brian Alderman | MCT, CEO / Founder of MicroTechPoint Pete Harris | Microsoft Senior Content Publisher.
Query Manager. QM is a collection of tools you can use to obtain information from the AS/400 database Used to –select, arrange, and analyze information.
Chapter 12: Using ADO.NET 2.0 Programming with Microsoft Visual Basic 2005, Third Edition.
Introduction to Hive Liyin Tang
A Guide to SQL, Seventh Edition. Objectives Understand the concepts and terminology associated with relational databases Create and run SQL commands in.
A Guide to MySQL 3. 2 Objectives Start MySQL and learn how to use the MySQL Reference Manual Create a database Change (activate) a database Create tables.
CS525: Big Data Analytics MapReduce Languages Fall 2013 Elke A. Rundensteiner 1.
Copying, Managing, and Transforming Data With DTS.
Application Development On AWS MOULIKRISHNA KOPPOLU CHANDAN SINGH RANA.
Module 2: Using Transact-SQL Querying Tools. Overview SQL Query Analyzer Using the Object Browser Tool in SQL Query Analyzer Using Templates in SQL Query.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
A Guide to SQL, Eighth Edition Chapter Three Creating Tables.
Programming with Microsoft Visual Basic 2012 Chapter 13: Working with Access Databases and LINQ.
NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.
JourneyTEAM - –
Hive Facebook 2009.
1 Data Bound Controls II Chapter Objectives You will be able to Use a Data Source control to get data from a SQL database and make it available.
Creating Dynamic Web Pages Using PHP and MySQL CS 320.
Big Data for Relational Practitioners Len Wyatt Program Manager Microsoft Corporation DBI225.
A NoSQL Database - Hive Dania Abed Rabbou.
1 Reports. 2 Objectives  Use concatenation in a query  Change column headings and formats  Add a title to a report  Group data in a report  Include.
Teradata Parallel Transporter Scripts with Simplified Syntax
A Guide to MySQL 3. 2 Introduction  Structured Query Language (SQL): Popular and widely used language for retrieving and manipulating database data Developed.
1 More basics on DB access Elke A. Rundensteiner.
1 Chapter 20 – Data sources and datasets Outline How to create a data source How to use a data source How to use Query Builder to build a simple query.
Visual Programing SQL Overview Section 1.
Chapter Fourteen Access Databases and SQL Programming with Microsoft Visual Basic th Edition.
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
IMS 4212: Application Architecture and Intro to Stored Procedures 1 Dr. Lawrence West, Management Dept., University of Central Florida
1 CS 430 Database Theory Winter 2005 Lecture 13: SQL DML - Modifying Data.
Graeme Malcolm |
Apache Hive CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
 CONACT UC:  Magnific training   
Programming with Microsoft Visual Basic 2012 Chapter 14: Access Databases and SQL.
C Copyright © 2009, Oracle. All rights reserved. Using SQL Developer.
D Copyright © 2009, Oracle. All rights reserved. Using SQL*Plus.
MSBIC Hadoop Series Querying Data with Hive Bryan Smith
3 A Guide to MySQL.
HIVE A Warehousing Solution Over a MapReduce Framework
A Guide to SQL, Seventh Edition
Prepared by : Moshira M. Ali CS490 Coordinator Arab Open University
A Warehousing Solution Over a Map-Reduce Framework
The Model Architecture with SQL and Polybase
Current outstanding balance
09 | Modifying Data Graeme Malcolm | Senior Content Developer, Microsoft Geoff Allix | Principal Technologist, Content Master.
Hive Mr. Sriram
Graeme Malcolm | Data Technology Specialist, Content Master
07 | Analyzing Big Data with Excel
09 | Modifying Data Graeme Malcolm | Senior Content Developer, Microsoft Geoff Allix | Principal Technologist, Content Master.
U-SQL Object Model.
Server & Tools Business
CSE 491/891 Lecture 21 (Pig).
CSE 491/891 Lecture 24 (Hive).
Contents Preface I Introduction Lesson Objectives I-2
Database SQL.
Server & Tools Business
04 | Always On High Availability
02 | Getting Started with HDInsight
03 | Windows Azure PowerShell
04 | Processing Big Data with Pig
06 | SQL Server and the Cloud
02 | Mastering Your Data Graeme Malcolm | Data Technology Specialist, Content Master Pete Harris | Learning Product Planner, Microsoft.
06 | Automating Big Data Processing
Presentation transcript:

05 | Processing Big Data with Hive Graeme Malcolm | Data Technology Specialist, Content Master Pete Harris | Learning Product Planner, Microsoft

Module Overview What is Hive? Creating Hive Tables Loading Data into Hive Tables Querying Hive Tables Using Hive with PowerShell

What is Hive? SELECT… A metadata service that projects tabular schemas over HDFS folders Enables the contents of folders to be queried as tables, using SQL-like query semantics Queries are translated into Map/Reduce jobs

Creating Hive Tables Use the CREATE TABLE HiveQL statement Defines schema metadata to be projected onto data in a folder when the table is queried (not when it is created) Specify file format and file location Defaults to sequencefile format in the /hive/warehouse/<table_name> folder Create internal or external tables Internal tables manage the lifetime of the underlying folders External tables are managed independently from folders

CREATE TABLE Internal table (folders deleted when table is dropped) CREATE TABLE table1 (col1 STRING, col2 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '; Default location (/hive/warehouse/table1) CREATE TABLE table2 (col1 STRING, col2 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION '/data/table2'; Stored in a custom location (but still internal, so the folder is deleted when table is dropped) CREATE EXTERNAL TABLE table3 (col1 STRING, col2 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION '/data/table3'; External table (folders and files are left intact in Azure Blob Store when the table is dropped)

Loading Data into Hive Tables Save data files in table folders Use the LOAD statement Moves or copies files to the appropriate folder Use the INSERT statement Inserts data from one table to another LOAD DATA LOCAL INPATH '/data/source' INTO TABLE MyTable; FROM StagingTable INSERT INTO TABLE MyTable SELECT Col1, Col2;

Querying Hive Tables with HiveQL Query data using the SELECT statement Hive translates the query into Map/Reduce jobs and applies the table schema to the underlying data files SELECT Col1, SUM(Col2) AS TotalCol2 FROM MyTable WHERE Col1 >= '2013-06-01' AND Col1 <= '2013-06-30' GROUP BY Col1 ORDER BY Col1;

Demo: Using Hive In this demonstration, you will see how to: Create Hive Tables Load Data into Hive Tables Query a Hive Table with HiveQL Drop a Hive Table

Using Hive in PowerShell The AzureHDInsightHiveJobDefinition cmdlet Create a job definition Use Query for explicit HiveQL statements, or File to reference a saved script Run the job with the Start-AzureHDInsightJob cmdlet The Invoke-Hive cmdlet Simpler syntax to run a HiveQL query

Demo: Using Hive in PowerShell In this demonstration, you will see how to: Use PowerShell to Run a HiveQL Command Use PowerShell to Query a Hive Table

Module Summary Hive enables Map/Reduce processing through SQL-like syntax Internal tables manage the lifetime of their data, External tables are metadata only Use HiveQL queries in PowerShell scripts to perform Hive operations and retrieve data