HIVE CSCE 587 Spring 2018.

Slides:



Advertisements
Similar presentations
Course Form In the main window (cf. Figure 2), click on Forms, then double-click on Create form by using wizard. Follow the sequence of actions below.
Advertisements

CC SQL Utilities.
1 After completing this lesson, you will be able to: Insert a table. Navigate and select cells within a table. Merge table cells. Insert and delete columns.
CS525: Special Topics in DBs Large-Scale Data Management MapReduce High-Level Langauges Spring 2013 WPI, Mohamed Eltabakh 1.
Click Here for Download the Installation Files Click Here for Guide How to Extract Installation Files.
Jeopardy Objects Navigation Buttons True/False Parts of a Report Vocabulary Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500 Final.
Micky Lina Zampichelli ELFADA project.  Introduction to PhotoFiltre  Download and install PhotoFiltre  Menus and Toolbars  How to resize a picture.
10 February Event Monitoring and Event File Maintenance.
Page Tuning. What Is Page Tuning? The page tuning process allows you to examine download times of each element on a page at a timing resolution of milliseconds.
Access Lesson 2 Creating a Database
What is ERoom? Created August 5, Agenda How to log in Directory structure –RFI –Class directories –Team directories How to download/upload files.
© 2010 IBM Corporation IBM Experience Modeler - Theme Editor Installing Python Image Library Presenter’s Name - Presenter’s Title DD Month Year.
Microsoft Access 2007 Microsoft Access 2007 Introduction to Database Programs.
1 Access Lesson 1 Microsoft Access Basics Microsoft Office 2010 Introductory Pasewark & Pasewark.
The basics of the Online Portal
BASIC QUERY CLASS N.C. State University Financials Reporting.
Session Objectives Object Types – Query, HTML Table Purpose of the Query and Explanation How to add a Query to a PTF Test Case 2 Session 5 - Query.
SQL Maestro Hello World IQ Associates. Contents Initial setup Hello World.
1 Access Lesson 1 Microsoft Access Basics Microsoft Office 2010 Introductory.
Web Technologies Website Development Trade & Industrial Education
Create Database Tables
First Screen : First window form will always remain open, for the user to select menu options. 1.
Website Development with Dreamweaver
Microsoft Office Access 2003 Số tiết: 30 tiết lý thuyết 60 tiết thực hành Giáo viên: Từ thị Xuân Hiền.
Microsoft Access 2000 Presentation 2 Creating Databases Part I (Creating Tables)
1. Profile settings 2. Messaging system 3. Downloading files 4. Uploading files 5. Creating groups 6. Calendar events.
1 Microsoft Project Pro CHAPTER 4 Viewing Project Information.
A NoSQL Database - Hive Dania Abed Rabbou.
Lesson 1: Exploring Access Learning Objectives After studying this lesson, you will be able to: Start Access and identify elements of the application.
Learning With Computers II (Level Orange) ©2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly.
SAN DIEGO SUPERCOMPUTER CENTER Inquisitor (inQ) Tutorial By: Roman Olschanowsky
This eCPIC Quick Guide has been developed to assist System Administrators with creating Hierarchy Grids in eCPIC. The Hierarchy Grid functionality allows.
Copyright 2008 McGraw-Hill Ryerson 1 TECHNOLOGY PLUG-IN T7 PROBLEM SOLVING USING ACCESS.
XP New Perspectives on Microsoft Access 2002 Tutorial 1 1 Microsoft Access 2002 Tutorial 1 – Introduction To Microsoft Access 2002.
Access Forms and Queries. Entering Data in Your Table  You can add data to your table in Datasheet view, by typing in the columns and rows.  This.
Lesson 01: Introduction to Database Software. At the end of this lesson, students should be able to: State the usage of database software. Start a database.
O FFICE M ANAGEMENT T OOL - II B BA -V I TH. Abdus Salam2 Week-7 Introduction to Query Introduction to Query Querying from Multiple Tables Querying from.
Click here to create a new account Click here to check the system for an existing account Enter the site by typing in your User ID and Password and clicking.
Microsoft Office 2010 is the newest version of Microsoft Office, offering features that provide users with better functionality and easier ways to work.
T7-1 LEARNING OUTCOMES – ACCESS PROBLEM SOLVING 1.Describe the process of using the Simple Query Wizard using Access 2.Describe the process of using the.
Producing a Mail Merged Letter Step 1 Create an Access database for Names and Addresses you can use the ‘Customers’ template in Group Work. Enter the necessary.
THE C PROGRAMMING ENVIRONMENT. Four parts of C environment  Main menu  Editor status line and edit window  Compiler message window  “Hot Keys” quick.
Key Applications Module Lesson 22 — Managing and Reporting Database Information Computer Literacy BASICS.
1 Working with MS SQL Server Beginning ASP.NET in C# and VB Chapter 12.
Presented by Rachael Libolt Wiki Workshop PB Educator Wiki. Retrieved,March 23,2008, from
Access Queries and Forms. Adding a New Field  To insert a field after you have saved your table, open Access, and open the table  It is easier to add.
Visual Database Creation with MySQL Workbench 도시정보시스템 설계
 Open your browser and navigate to
How to Add Pictures to SOF Gallery. Select and Click onto [PHOTO GALLERY] STEP 1.
INTRODUCTION TO DATABASES (MICROSOFT ACCESS)
Microsoft Access 2013 Bobby Wan.
Plug-In T7: Problem Solving Using Access 2007
A Warehousing Solution Over a Map-Reduce Framework
SQL MODELER - OPEN There are Three Ways to open the SQL Modeler
Current outstanding balance
Chapter 4 MS ACCESS DATABASE.
Siebel Open UI Features & Updates
Integrating JavaScript and HTML
Step 1 Click on VM icon.
MSIS 655 Advanced Business Applications Programming
Excel: Excel Basics Participation Project
Web SA: File Upload Function
CSE 491/891 Lecture 24 (Hive).
How to Open PowerPoint Maryam Fatima.
Learning Objectives: Creating a new Table Style
Creating a simple query in the Design View
Kronos Mobile User Guide – CST Drivers
Access Lesson 1 Microsoft Access Basics
Super User Training Lesson #2 Documents
Presentation transcript:

HIVE CSCE 587 Spring 2018

Step 1: Data Start by downloading the data https://raw.githubusercontent.com/hortonworks/data-tutorials/master/tutorials/hdp/how-to- process-data-with-apache-hive/assets/driver_data.zip There are two files in this data set that we saw last week with PIG: drivers.csv timesheet.csv

Step 2: Load the files into HDFS Unlike last week, we will use the GUI to load. Start by logging onto ambaria: vm-Hadoop-xx.cse.sc.edu:8080 use your maria_dev credentials

Step 2: Load the files into HDFS Click on the icon that resembles 3x3 grid on the menu bar (it is at the top of the window on the far right side) Select “Files View”

Step 2: Load the files into HDFS Navigate to /user/maria_dev 1. scroll to bottom of list to find user. 2. click on “user”. 3. scroll down list to find “maria_dev” 4. click on “maria_dev”

Navigating to /user/maria_dev You should see something like this, although you should also see the other files that you created last week with Hadoop and PIG.

Click on the “Upload” button Then select the “Browse” button and navigate to where you stored the files on the linux file system. Select drivers.csv to upload Do the same for timesheet.csv

Had you started with a tabula rasa, your directory would look like this:

3. HIVE View 2.0 Switch context from “Files” view to “Hive View 2.0:

Step 3.1 This brings up the query editor

Step 3.2: Create an empty table Enter: create table temp_drivers (col_value STRING); Then click on “Execute”

Result

Step 3. 3 Enter: LOAD DATA INPATH '/user/maria_dev/drivers Step 3.3 Enter: LOAD DATA INPATH '/user/maria_dev/drivers.csv' OVERWRITE INTO TABLE temp_drivers; Then click on “Execute”

Go back to the “Files” view What has changed. The file drivers Go back to the “Files” view What has changed? The file drivers.csv is no longer there  Loading the file into Hive has “consumed” drivers.csv

Take a peek at the table “temp_drivers” Enter: select Take a peek at the table “temp_drivers” Enter: select * from temp_drivers limit 10; then click on “Execute”

Step 3.4: extract the fields we want enter: CREATE TABLE drivers (driverId INT, name STRING, ssn BIGINT, location STRING, certified STRING, wageplan STRING); the click on “Execute”

Step 3.5: QUERY TO EXTRACT DATA FROM Temp_drivers insert overwrite table drivers SELECT regexp_extract(col_value, '^(?:([^,]*),?){1}', 1) driverId, regexp_extract(col_value, '^(?:([^,]*),?){2}', 1) name, regexp_extract(col_value, '^(?:([^,]*),?){3}', 1) ssn, regexp_extract(col_value, '^(?:([^,]*),?){4}', 1) location, regexp_extract(col_value, '^(?:([^,]*),?){5}', 1) certified, regexp_extract(col_value, '^(?:([^,]*),?){6}', 1) wageplan from temp_drivers;

Step 3.5: create a query to extract DATA FROM Temp_drivers insert overwrite table drivers SELECT regexp_extract(col_value, '^(?:([^,]*),?){1}', 1) driverId, regexp_extract(col_value, '^(?:([^,]*),?){2}', 1) name, regexp_extract(col_value, '^(?:([^,]*),?){3}', 1) ssn, regexp_extract(col_value, '^(?:([^,]*),?){4}', 1) location, regexp_extract(col_value, '^(?:([^,]*),?){5}', 1) certified, regexp_extract(col_value, '^(?:([^,]*),?){6}', 1) wageplan from temp_drivers;

Take a peek at the first 10 rows of the resulting table Enter: select Take a peek at the first 10 rows of the resulting table Enter: select * from drivers limit 10; Then click on “Execute”

Processing timesheet.csv creating similar tables from timesheet.csv CREATE TABLE temp_timesheet (col_value string); Processing timesheet.csv creating similar tables from timesheet.csv Start by creating temp_timesheet with the following command: CREATE TABLE temp_timesheet (col_value string); Then populate it with data from timesheet.csv LOAD DATA INPATH '/user/maria_dev/timesheet.csv' OVERWRITE INTO TABLE temp_timesheet; Finally, look at the first 10 lines of the table as a sanity check. Select * from temp_timesheet limit 10;

3.6 CREATE TABLE temp_timesheet (col_value string); LOAD DATA INPATH '/user/maria_dev/timesheet.csv' OVERWRITE INTO TABLE temp_timesheet;

Creating timesheet from temp_timesheet Start by creating an empty table timesheet Enter: CREATE TABLE timesheet (driverId INT, week INT, hours_logged INT , miles_logged INT); Then click on “Execute” Next populate the table by extracting columns fromm temp_timesheet

Extracting columns from temp_timesheet Enter: insert overwrite table timesheet SELECT regexp_extract(col_value, '^(?:([^,]*),?){1}', 1) driverId, regexp_extract(col_value, '^(?:([^,]*),?){2}', 1) week, regexp_extract(col_value, '^(?:([^,]*),?){3}', 1) hours_logged, regexp_extract(col_value, '^(?:([^,]*),?){4}', 1) miles_logged from temp_timesheet; Then click on “Execute”

Take a peek at the first 10 rows enter: select Take a peek at the first 10 rows enter: select * from timesheet limit 10; click on “Execute”

Now group timesheet data by driverID so that we can sum the hours logged and sum the miles logged Enter: SELECT driverId, sum(hours_logged), sum(miles_logged) FROM timesheet GROUP BY driverId; Then click on “Execute”

Results after grouping by driverID and summing logged hours and logged miles

Combine columns from drivers and timesheet tables Columns from drivers table: driverId name Columns from timesheets table: total_hours total_miles Join column: driverId

Combine columns from drivers and timesheet tables SELECT d.driverId, d.name, t.total_hours, t.total_miles from drivers d JOIN (SELECT driverId, sum(hours_logged)total_hours, sum(miles_logged)total_miles FROM timesheet GROUP BY driverId ) t ON (d.driverId = t.driverId);

Results after entering command and executing