Hive Index Yongqiang He Software Engineer Facebook Data Infrastructure Team.

Slides:



Advertisements
Similar presentations
Introduction to Apache HIVE
Advertisements

HBase and Hive at StumbleUpon
Session 2Introduction to Database Technology Data Types and Table Creation.
Hive Security Yongqiang He Software Engineer Facebook Data Infrastructure Team.
SQOOP HCatalog Integration
Cardinality How many rows? Distribution How many distinct values? density How many rows for each distinct value? Used by optimizer A histogram 200 steps.
CS525: Special Topics in DBs Large-Scale Data Management MapReduce High-Level Langauges Spring 2013 WPI, Mohamed Eltabakh 1.
Software and Services Group “Project Panthera”: Better Analytics with SQL, MapReduce and HBase Jason Dai Principal Engineer Intel SSG (Software and Services.
© Hortonworks Inc Daniel Dai Thejas Nair Page 1 Making Pig Fly Optimizing Data Processing on Hadoop.
Big Data Working with Terabytes in SQL Server Andrew Novick
Pig Contributors Workshop Agenda Introductions What we are working on Usability Howl TLP Lunch Turing Completeness Workflow Fun (Bocci ball)
Physical Database Design Data Migration/Conversion.
Introduction to Hive Liyin Tang
Hive: A data warehouse on Hadoop
SQL Data Definition II Stanislava Armstrong 1SQL Data Definition II.
Putting the Sting in Hive Page 1 Alan F.
A Guide to MySQL 3. 2 Objectives Start MySQL and learn how to use the MySQL Reference Manual Create a database Change (activate) a database Create tables.
HIVE Data Warehousing & Analytics on Hadoop Joydeep Sen Sarma, Ashish Thusoo Facebook Data Team.
CS525: Big Data Analytics MapReduce Languages Fall 2013 Elke A. Rundensteiner 1.
DAT702.  Standard Query Language  Ability to access and manipulate databases ◦ Retrieve data ◦ Insert, delete, update records ◦ Create and set permissions.
Lecture 6 Indexing Part 2 Column Stores. Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)--
A warehouse solution over map-reduce framework Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff.
Raghav Ayyamani. Copyright Ellis Horowitz, Why Another Data Warehousing System? Problem : Data, data and more data Several TBs of data everyday.
Hive – A Warehousing Solution Over a Map-Reduce Framework Presented by: Atul Bohara Feb 18, 2014.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
Processing Data using Amazon Elastic MapReduce and Apache Hive Team Members Frank Paladino Aravind Yeluripiti.
Oracle Data Block Oracle Concepts Manual. Oracle Rows Oracle Concepts Manual.
1 PHP and MySQL. 2 Topics  Querying Data with PHP  User-Driven Querying  Writing Data with PHP and MySQL PHP and MySQL.
Hive : A Petabyte Scale Data Warehouse Using Hadoop
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Cloud Computing Other High-level parallel processing languages Keke Chen.
NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.
Distributed Systems Fall 2014 Zubair Amjad. Outline Motivation What is Sqoop? How Sqoop works? Sqoop Architecture Import Export Sqoop Connectors Sqoop.
Hive Facebook 2009.
Introduction to SEQUEL. What is SEQUEL? Acronym for Structural English Query Language Acronym for Structural English Query Language Standard language.
SQL Server 7.0 Maintaining Referential Integrity.
Introduction to Sqoop. Table of Contents Sqoop - Introduction Integration of RDBMS and Sqoop Sqoop use case Sample sqoop commands Key features of Sqoop.
Data storing and data access. Plan Basic Java API for HBase – demo Bulk data loading Hands-on – Distributed storage for user files SQL on noSQL Summary.
 2004 Prentice Hall, Inc. All rights reserved. 1 Segment – 6 Web Server & database.
SQL Server Indexes Indexes. Overview Indexes are used to help speed search results in a database. A careful use of indexes can greatly improve search.
Hive – A Warehousing Solution Over a MapReduce Framework Bingbing Liu
A NoSQL Database - Hive Dania Abed Rabbou.
Chapter 5 MYSQL Database. Introduction to MYSQL MySQL is the world's most popular open-source database. Open source means that the source code, the programming.
Hive. What is Hive? Data warehousing layer on top of Hadoop – table abstractions SQL-like language (HiveQL) for “batch” data processing SQL is translated.
Nov 2006 Google released the paper on BigTable.
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
Cloudera Kudu Introduction
SQL Query Analyzer. Graphical tool that allows you to:  Create queries and other SQL scripts and execute them against SQL Server databases. (Query window)
Database Overview What is a database? What types of databases are there? How are databases more powerful than spreadsheets?
Apache Hive CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
IMPACT OF ORC COMPRESSION BUFFER SIZE Prasanth Jayachandran Member of Technical Staff – Apache Hive.
Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.
Dumps PDF Perform Data Engineering on Microsoft Azure HD Insight dumps.html Complete PDF File Download From.
Image taken from: slideshare
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Sqoop Mr. Sriram
A Warehousing Solution Over a Map-Reduce Framework
Hive Mr. Sriram
Hadoop EcoSystem B.Ramamurthy.
Rekha Singhal, Amol Khanapurkar, TCS Mumbai.
HIVE CSCE 587 Spring 2018.
September 11, Ian R Brooks Ph.D.
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
U-SQL Object Model.
CSE 491/891 Lecture 24 (Hive).
05 | Processing Big Data with Hive
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

Hive Index Yongqiang He Software Engineer Facebook Data Infrastructure Team

1 Create Index 2 Update Index / Rebuild Index 3 Use Index 4 Metastore upgrade script Agenda

Create Index CREATE INDEX index_name ON TABLE table_name (col_name…) AS ‘index handler class name’ [WITH DEFERRED REBUILD] [INDEXPROPERTIES (prop_key=prop_value, …)] [IN TABLE index_table_name] [[ROW FORMAT …] STORED AS …] EXAMPLE 1: CREATE TABLE src (key int, value string); CREATE INDEX src_index ON TABLE src(key) as 'COMPACT' WITH DEFERRED REBUILD STORED AS RCFILE; EXAMPLE 2: CREATE TABLE srcpart_rc (key int, value string) PARTITIONED BY (ds string, hr int) STORED AS RCFILE; CREATE INDEX src_part_index ON TABLE srcpart_rc (key) as 'COMPACT' WITH DEFERRED REBUILD;

Update Index / Rebuild Index ALTER INDEX index_name ON table_name [partitionSpec] REBUILD; EXAMPLE : ALTER INDEX src_index ON src REBUILD; ALTER INDEX src_part_index ON srcpart_rc REBUILD; ALTER INDEX src_part_index ON srcpart_rc partition(ds=‘ ’) REBUILD;

Use Index No Optimizer Available right now. Need to use index manually in query. (Working on an index optimizer to direct a query against index.) EXAMPLE : Original query: SELECT key, value FROM srcpart_rc WHERE key=100; Query with index: INSERT OVERWRITE DIRECTORY "/tmp/index_result" SELECT `_bucketname`, `_offsets` FROM default__srcpart_rc_srcpart_rc_index__ WHERE key=100; SET hive.index.compact.file=/tmp/index_result; SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; SELECT key, value FROM srcpart_rc WHERE key=100 ORDER BY key; designed to help queries with filter clause (point query, range query).

Metastore upgrade script MySQL: