Download presentation
Presentation is loading. Please wait.
Published byDomenic Lambert Modified over 8 years ago
1
MSBIC Hadoop Series Querying Data with Hive Bryan Smith email: bryan.smith@microsoft.com twitter: @smithbryanc
2
MSBIC Hadoop Series http://msbic.sqlpass.org/ Learn the basics of Hadoop through a combination of demonstration and lecture. Session participants are invited to follow along leveraging emulation environments and Azure-based clusters, the setting up of which we will address in our first session. March – Getting StartedAugust – Processing the Data with Pig April – Understanding the File SystemSeptember – Hadoop & MS BI May – Implementing MapReduce Jobs October – To Be Announced June – Querying the Data with Hive November – Loading Social Media Data July – On VacationDecember – DW Integration
3
Today’s Session Objectives: 1.Understand the basics of Hive 2.Demonstrate use of Hive with sample data set
4
MapReduce Job The Hive Data Warehouse SELECT MyCol, COUNT(*) FROM MyTable GROUP BY MyCol;
5
Demonstration
6
Demo Script 1: Create Database show databases; create database ufo location ‘/ufo.db’; dfs –ls /;
7
Demo Script 2: Create & Load Table use ufo; create table sightings ( dateobs string, daterpt string, `location` string, shape string, duration string, `description` string) row format delimited fields terminated by '\t‘; load data inpath '/demo/ufo/in/ufo_awesome.tsv' overwrite into table sightings; dfs –ls /ufo.db; dfs –ls /ufo.db/sightings; dfs –ls /demo/ufo/in;
8
Demo Script 3: Query Table select * from sightings limit 10; selectsubstring(dateobs, 0, 4) as year, shape, count(*) from sightings group by year, shape; create table SightingsSummary as selectsubstring(dateobs, 0, 4) as year, shape, count(*) from sightings group by year, shape;
9
Managed vs. External Tables Managed Tables Table definition & associated data files managed by Hive Loaded data files moved to table- associated folders Dropping table drops data files Use for transformed data only needed by Hive External Tables Table definition only managed by Hive Loaded data files remain in original location Dropping table does not drop data files Use for initial staging or when data needs to be accessible across wide range of applications
10
File Formats Default input format is row delimited input & output Default format is tab-delimited input and Cntrl-A delimited field output File access controlled by LazySimpleSerDe (default SerDe) Default data types include… int, bigint, tinyint, smallint, float, double, boolean, string, binary, timestamp Complex structures supported with array, map & struct types
11
HCatalog Table & storage management layer for Hadoop Database defs, table defs, etc. presented through accessible interface Integrated with Hive but accessible via Hive, Pig & MapReduce Stored by default in Apache Derby database Other databases can be substituted for better performance, HA, etc.
12
A Few Key Points Object definitions are not case-sensitive… But string comparisons and HDFS references are Names conflicting with reserved keywords can be employed using the `grave accent`
13
Reserved Keywords AddCommentFloatLinesPartitionsString AllCreateFormatLoadReanmeTable AlterDataFromLocalReduceTables AndDateFullLocationRegexpTablesample ArrayDatetimeFunctionMapReplaceTblproperties AsDelimitedGroupMsckRightTblproperties AscDescInpathNotRlikeTemporary BigintDescribeInputformatNullRowTerminated BinaryDirectoryInsertOfSelectTextfile BooleanDistinctIntOnSequencefileTimestamp BucketDistributeIntoOrSerdeTinyint BucketsDoubleIsOrderSerdepropertiesTo ByDropItemsOutSetTransform CastExplainJoinOuterShowTrue ClusterExtendedKeysOutputformatSmallintUnion ClusteredExternalLeftOverwriteSortUsing CollectionFalseLikePartitionSortedWhere ColumnsFieldsLimitPartitionedStoredWith
14
Resources
15
Today’s Session Objectives: 1.Understand the basics of Hive 2.Demonstrate use of Hive with sample data set
16
For Next Session Topic: Processing Data with Pig Requested Action(s): Come with working HDInsight Emulator Load sample data sets into HDFS on Emulator
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.