David J. DeWitt Microsoft Jim Gray Systems Lab Madison, Wisconsin graysystemslab.com.

Slides:



Advertisements
Similar presentations
 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
Advertisements

MapReduce.
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
Clydesdale: Structured Data Processing on MapReduce Jackie.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
HADOOP ADMIN: Session -2
Hadoop & Cheetah. Key words Cluster  data center – Lots of machines thousands Node  a server in a data center – Commodity device fails very easily Slot.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
THE HOG LANGUAGE A scripting MapReduce language. Jason Halpern Testing/Validation Samuel Messing Project Manager Benjamin Rapaport System Architect Kurry.
H ADOOP DB: A N A RCHITECTURAL H YBRID OF M AP R EDUCE AND DBMS T ECHNOLOGIES FOR A NALYTICAL W ORKLOADS By: Muhammad Mudassar MS-IT-8 1.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
HAMS Technologies 1
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
HAMS Technologies 1
Distributed Systems Fall 2014 Zubair Amjad. Outline Motivation What is Sqoop? How Sqoop works? Sqoop Architecture Import Export Sqoop Connectors Sqoop.
Whirlwind Tour of Hadoop Edward Capriolo Rev 2. Whirlwind tour of Hadoop Inspired by Google's GFS Clusters from systems Batch Processing High.
MapReduce High-Level Languages Spring 2014 WPI, Mohamed Eltabakh 1.
An Introduction to HDInsight June 27 th,
Indexing HDFS Data in PDW: Splitting the data from the index VLDB2014 WSIC、Microsoft Calvin
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
Hadoop implementation of MapReduce computational model Ján Vaňo.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
PolyBase in SQL Server 16 David J. DeWitt Rimma V. Nehme
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Next Generation of Apache Hadoop MapReduce Owen
PolyBase Query Hadoop with ease Sahaj Saini SQL Server, Microsoft.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
An Introduction To Big Data For The SQL Server DBA.
BIG DATA/ Hadoop Interview Questions.
Apache Hadoop on Windows Azure Avkash Chauhan
PolyBase Query Hadoop with ease Sahaj Saini Program Manager, Microsoft.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
Image taken from: slideshare
About Hadoop Hadoop was one of the first popular open source big data technologies. It is a scalable fault-tolerant system for processing large datasets.
Hadoop.
HADOOP ADMIN: Session -2
Chapter 10 Data Analytics for IoT
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Hadoop MapReduce Framework
MapReduce Types, Formats and Features
Introduction to MapReduce and Hadoop
Introduction to HDFS: Hadoop Distributed File System
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Ministry of Higher Education
Introduction to Apache
Data processing with Hadoop
Distributed Systems CS
Charles Tappert Seidenberg School of CSIS, Pace University
MAPREDUCE TYPES, FORMATS AND FEATURES
Map Reduce, Types, Formats and Features
Pig Hive HBase Zookeeper
Presentation transcript:

David J. DeWitt Microsoft Jim Gray Systems Lab Madison, Wisconsin graysystemslab.com

Gaining Insight in the Two Universe World 2  Many businesses now have data in both universes RDBMS Hadoop  What is the best solution for answering questions that span the two? Combine Insight  In my “Big Data” keynote a year ago I introduced the idea of an enterprise data manager as a solution Polybase goal: make it easy to answer questions that require data from both universes What I did not tell you is that we were building one called Polybase

FutureFuture ProductizationProductization AnnouncementsAnnouncements No Conclusions, please Not a Keynote!! Disclaimer 3 Future Polybase versions continue to be designed, developed, and evaluated at the Microsoft Gray Systems Lab in Madison, WI. The SQL PDW Team does all the hard work of productizing our prototypes. Ted announced “Phase 1” as a product Quentin demoed the “Phase 2” prototype Just because I am going to tell you about Phases 2 and 3 you should NOT reach any conclusions about what features might find their way into future products Don’t forget I am just a techie type and I have ZERO control over what will actually ship. Not a keynote talk. I thought you might enjoy having a deep dive into how Polybase works.

Talk Outline 4

The Hadoop Ecosystem HDFS Map/ Reduce Hive & Pig Sqoop Zookeeper Avro (Serialization) HBase ETL Tools BI Reporting RDBMS 5

HDFS – Hadoop Distributed File System HDFS Map/ Reduce Hive Sqoop 6

HDFS Node Types 7 NameNode Master BackupNode CheckpointNode or (backups) DataNode Slaves

Hadoop MapReduce (MR) HDFS Map/ Reduce Hive Sqoop 8 map()  reduce() sub-divide & conquer sub-divide & conquer combine & reduce cardinality combine & reduce cardinality

MapReduce DoWork() … … … Output MAP REDUCE

MapReduce Layer HDFS Layer hadoop-namenode hadoop- datanode1 hadoop- datanode2 hadoop- datanode3 hadoop- datanode4 MapReduce Components 10 JobTracker TaskTracker JobTracker controls TaskTracker nodes Master Slaves - Manages job queues and scheduling - Maintains and controls TaskTrackers - Moves/restarts map/reduce tasks if needed Execute individual map and reduce tasks as assigned by JobTracker DataNode NameNode MapReduceLayerHDFSLayer How Does This All Fit With HDFS?

Hive HDFS Map/ Reduce Hive Sqoop 11

Hadoop Summary So Far HDFS – distributed, scalable fault tolerant file system MapReduce – a framework for writing fault tolerant, scalable distributed applications Hive – a relational DBMS that stores its tables in HDFS and uses MapReduce as its target execution language Next, Sqoop – a library and framework for moving data between HDFS and a relational DBMS HDFS Map/ Reduce HiveSqoop 12

Sqoop Use Case #1 – As a Load/Unload Utility SQL Server … Hadoop Cluster 13 Sqoop etc. Instead transfers should: a)Take place in parallel. b)Go directly from Hadoop DataNodes to PDW Compute nodes. Transfers data from Hadoop (in & out). Gets serialized through both Sqoop process and PDW Control Node.

Sqoop Use Case #2 - As a DB Connector 14 Sqoop Map/ Reduce Job SQL Server …

Q: SELECT a,b,c FROM T WHERE P RDBMS Map 1 Sqoop Map 2 Sqoop Map 3 Sqoop Cnt Map tasks wants the results of the query: Step (2): Sqoop generates unique query Q’ for each Map task: SELECT a,b,c FROM T WHERE P ORDER BY a,b,c Limit L, Offset X Sqoop’s Limitations as a DB Connector Step (3): Each of the 3 Map tasks runs its query Q’ T In general, with M Map tasks, table would be In general, with M Map tasks, table T would be scanned M + 1 times!!!!!! Performance is bound to be pretty bad as table T gets scanned 4 times! Each map() must see a distinct subset of the result X is different for each Map task. Example, assume Cnt is 100 and 3 Map instances are to be used For Map 1 For Map 2 For Map 3 L=33 L=34 X=0 X=33 X=66 15

Hadoop Summary HDFS – distributed, scalable fault tolerant file system MapReduce – a framework for writing fault tolerant, scalable distributed applications Hive – a relational DBMS that stores its tables in HDFS and uses MapReduce as its target execution language Sqoop – a library and framework for moving data between HDFS and a relational DBMS HDFS Map/ Reduce HiveSqoop 16

Gaining Insight in the Two Universe World 17 What is the best solution for answering questions that span the two? RDBMS Hadoop Combine Insight Assume that you have data in both universes

SqoopPolybase SQL SERVER PDW 18 Export Leverage PDW and Hadoop to run queries against RDBMS and HDFS

19 Approach Drawbacks Do analysis using MapReduce Sqoop used to pull data from RDBMS Must be able to program MapReduce jobs Sqoop as a DB connector has terrible performance Data might not fit in RDBMS Bad performance as load not parallelized Do analysis using the RDBMS Sqoop used to export HDFS data into RDBMS Alternative #1 - Sqoop Export

20 Polybase – A Superior Alternative DB HDFS Polybase = SQL Server PDW V2 querying HDFS data, in-situ Polybase Standard T-SQL query language. Eliminates need for writing MapReduce jobs Leverages PDW’s parallel query execution framework Data moves in parallel directly between Hadoop’s DataNodes and PDW’s compute nodes Exploits PDW’s parallel query optimizer to selectively push computations on HDFS data as MapReduce jobs (Phase 2 release)

Polybase Assumptions SQL Server … DataNode HDFS data could be on some other Hadoop cluster DN Hadoop Cluster PDW compute nodes can also be used as HDFS data nodes Nor the format of HDFS files (i.e. TextFile, RCFile, custom, …) 1. Polybase makes no assumptions about where HDFS data is 2. Nor any assumptions about the OS of data nodes Text Format Sequence File Format RCFile Format Custom Format …

Polybase “Phases”

Polybase Phase 1 (PDW V2) 23 Hadoop HDFS DB SQL in, results out Hadoop HDFS DB SQL in, results stored in HDFS Key Technical Challenges:

Challenge #1 – Parallel Data Transfers SQL Server … DN Hadoop Cluster 24 Sqoop

SQL Server PDW Architecture 25

DMS Role #1 - Shuffling 26 DMS SQL Server DMS SQL Server DMS SQL Server DMS SQL Server DMS instances will “shuffle” (redistribute) a table when: 1)It is not hash partitioned on the column being used for a join 2) =||= on the group-by column in an aggregate query

DMS Role #2 – Loading 27 DMS SQL Server DMS SQL Server DMS SQL Server DMS SQL Server Input File DMS PDW Landing Zone DMS instance on landing zone distributes (RR) input records to DMS instances on compute nodes DMS instances on compute nodes: 1. Convert records to binary format 2. Redistribute each record by hashing on partitioning key of the table

HDFS Bridge in Polybase DMS SQL Server DMS SQL Server HDFS Hadoop Cluster 1. DMS instances (one per PDW compute node) extended to host an “HDFS Bridge” 2. HDFS Bridge hides complexity of HDFS. DMS components reused for type conversions 3. All HDFS file types (text, sequence, RCFiles) supported through the use of appropriate RecordReaders by the HDFS Bridge HDFS 28 PDW Node

SQL Server DMS SQL Server DMS Reading HDFS Files in Parallel 29 HDFS NameNode HDFS DataNode Block buffers NameNod e returns locations of blocks of file NameNod e returns locations of blocks of file 2.Existing DMS components are leveraged, as necessary, for doing type conversions 1.HDFS Bridge employs Hadoop RecordReaders to obtain records from HDFS  enables all HDFS file types to be easily supported 3.After conversion, DMS instances redistribute each record by hashing on the partitioning key of the table – identical to what happens in a load

Phase 1 Technical Challenges 30 DMS processes extended with HDFS Bridge HDFS Bridge uses standard RecordReaders (from InputFormats) and RecordWriters (from OutputFormats)

Challenge #3 – Imposing Structure 31 Unless pure text, all HDFS files consist of a set of records These records must have some inherent structure to them if they are to be useful “structure”  each record consists of one or more fields of some known type A MapReduce job typically uses a Java class to specify the structure of its input records “external table” Polybase employs the notion of an “external table”

Phase 2 Syntax Example CREATE HADOOP CLUSTER GSL_HDFS_CLUSTER WITH (namenode=‘localhost’, nnport=9000 jobtracker=‘localhost’, jtport = 9010); CREATE HADOOP FILEFORMAT TEXT_FORMAT WITH (INPUTFORMAT = 'org.apache.hadoop.mapreduce.lib.input.TextInputFormat', OUTPUTFORMAT = 'org.apache.hadoop.mapreduce.lib.output.TextOutputFormat', ROW_DELIMITER = '0x7c0x0d0x0a', COLUMN_DELIMITER = '0x7c‘); CREATE EXTERNAL TABLE hdfsCustomer ( c_custkey bigint not null, c_name varchar(25) not null, c_address varchar(40) not null, c_nationkey integer not null, … ) WITH (LOCATION =hdfs('/tpch1gb/customer.tbl’, GSL_HDFS_CLUSTER, TEXT_FORMAT); 32 HDFS file path Disclaimer: for illustrative purposes only

Polybase Phase 1 - Example #1 Execution plan generated by PDW query optimizer: CREATE temp table T On PDW compute nodes DMS SHUFFLE FROM HDFS Hadoop file read into T HDFS parameters passed into DMS RETURN OPERATION Select * from T where T.c_nationkey =3 and T.c_acctbal < 0 33

Polybase Phase 1 - Example #2 CREATE temp table T On PDW compute nodes DMS SHUFFLE FROM HDFS From hdfsCustomer into T HDFS parameters passed into DMS ON OPERATION Insert into pdwCustomer select * from T CREATE table pdwCustomer On PDW compute nodes 34 Fully parallel load from HDFS into PDW! CTAS back to HDFS also supported Execution plan generated by query optimizer:

Polybase Phase 1 - Example #3 35 Select c.*. o.* from Customer c, oTemp o where c.c_custkey = o.o_custkey and c_nationkey = 3 and c_acctbal < 0 RETURN OPERATION From hdfsOrders into oTemp DMS SHUFFLE FROM HDFS on o_custkey On PDW compute nodes Execution plan generated by query optimizer:

Polybase Phase 1 - Wrap-Up 36 DB Hadoop HDFS Hadoop HDFS Hadoop HDFS Key Limitations: 1.Data is always pulled into PDW to be processed 2.Does not exploit the computational resources of the Hadoop cluster

Polybase “Phases”

HDFS 38 Hadoop DB MapReduce Cost-based decision on how much computation to push SQL operations on HDFS data pushed into Hadoop as MapReduce jobs Polybase Phase 2 (PDW V2 AU1/AU2) Remember: 1)This is what is working in our prototype branch 2)No commitment to ship any/all this functionality 3)But it was demoed by Quentin on Thursday ….

Polybase Phase 2 Query Execution 39 Query is parsed “External tables” stored on HDFS are identified Query is parsed “External tables” stored on HDFS are identified Parallel QO is performed Statistics on HDFS tables are used in the standard fashion Parallel QO is performed Statistics on HDFS tables are used in the standard fashion DSQL plan generator walks optimized query plan converting subtrees whose inputs are all HDFS files into sequence of MapReduce jobs Java code generated uses a library of PDW-compatible type conversions routines to insure semantic capability DSQL plan generator walks optimized query plan converting subtrees whose inputs are all HDFS files into sequence of MapReduce jobs Java code generated uses a library of PDW-compatible type conversions routines to insure semantic capability PDW Engine Service submits MapReduce jobs (as a JAR file) to Hadoop cluster. Leverage computational capabilities of Hadoop cluster PDW Engine Service submits MapReduce jobs (as a JAR file) to Hadoop cluster. Leverage computational capabilities of Hadoop cluster

Phase 2 Challenge – Semantic Equivalence HDFS Data Hadoop MR Execution DMS SHUFFLE FROM HDFS PDW Query Execution Output Polybase Phase 2 splits query execution between Hadoop and PDW. Java expression semantics differ from the SQL language in terms of types, nullability, etc. Semantics (ie. results) should not depend on which alternative the query optimizer picks PDW Query Execution DMS SHUFFLE FROM HDFS Output Only Phase 1 Plan Alternative plans in Phase 2

Polybase Phase 2 - Example #1 Execution plan: 41 Run MR Job on Hadoop Apply filter and computes aggregate on hdfsCustomer. Output left in hdfsTemp What really happens here? Step 1) QO compiles predicate into Java and generates a MapReduce job Step 2) QE submits MR job to Hadoop cluster

42 Key components: 1) Job tracker One per cluster Manages cluster resources Accepts & schedules MR jobs 2) Task Tracker One per node Runs Map and Reduce tasks Restarts failed tasks In Polybase Phase 2, PDW Query Executor submits MR job to the Hadoop Job Tracker Hadoop Nodes MapReduce Review PDW Query Executor

The MR Job in a Little More Detail … … … Reducer DataNode Mapper Mapper C_ACCTBAL < 0 Mapper C_ACCTBAL < 0 Mapper C_ACCTBAL < 0 Reducer Output is left in hdfsTemp

DMS SHUFFLE FROM HDFS Read hdfsTemp into T Polybase Phase 2 - Example #1 Execution plan: CREATE temp table T On PDW compute nodes RETURN OPERATION Select * from T 44 Run MR Job on Hadoop Apply filter and computes aggregate on hdfsCustomer. Output left in hdfsTemp 1.Predicate and aggregate pushed into Hadoop cluster as a MapReduce job 2.Query optimizer makes a cost-based decision on what operators to push hdfsTemp

Polybase Phase 2 - Example #2 45 Select c.*. o.* from Customer c, oTemp o where c.c_custkey = o.o_custkey RETURN OPERATION Read hdfsTemp into oTemp, partitioned on o_custkey DMS SHUFFLE FROM HDFS on o_custkey On PDW compute nodes Execution plan : Run Map Job on Hadoop Apply filter to hdfsOrders. Output left in hdfsTemp 1.Predicate on orders pushed into Hadoop cluster 2.DMS shuffle insures that the two tables are “like-partitioned” for the join

PDW statistics extended to provided detailed column- level stats on external tables stored in HDFS files PDW query optimizer extended to make cost-based decision on what operators to push Java code generated uses library of PDW-compatible type conversions to insure semantic capability Extends capabilities of Polybase Phase 1 by pushing operations on HDFS files as MapReduce jobs What are the performance benefits of pushing work?

Test Configuration SQL Server … DN Hadoop Cluster 47 PDW Cluster: 16 Nodes Commodity HP Servers 32GB memory Ten 300GB SAS Drives SQL Server 2008 running in a VM on Windows 2012 Hadoop Cluster 48 Nodes Same hardware & OS Isotope (HDInsight) Hadoop distribution Networking 1 Gigabit Ethernet to top of rack switches (Cisco 2350s) 10 Gigabit rack-to-rack Nodes distributed across 6 racks

Test Database 48

Selection on HDFS table Polybase Phase 2 Polybase Phase 1 Crossover Point: Above a selectivity factor of ~80%, PB Phase 2 is slower PB.1 PB.2 PB.1 PB.2 49

Join HDFS Table with PDW Table Polybase Phase 1 Polybase Phase 2 PB.1 PB.2 PB.1 50

Join Two HDFS Tables PB.1 PB.2 H PB.2 P PB.2 H PB.1 PB.2 H PB.2 P PB.2P – Selections on T1 and T2 pushed to Hadoop. Join performed on PDW PB.1 – All operators on PDW PB.2H – Selections & Join on Hadoop 51

Up to 10X performance improvement! A cost-based optimizer is clearly required to decide when an operator should be pushed Optimizer must also incorporate relative cluster sizes in its decisions Split query processing really works!

Polybase “Phases”

Hadoop V2 (YARN) 54 YARN Client Hadoop V1 – Job tracker can only run MR jobs Hadoop V2 (Yarn) – Job tracker has been refactored into: 1) Resource manager One per cluster Manages cluster resources 2) Application Master One per job type Hadoop V2 clusters capable of executing a variety of job types MPI MapReduce Trees of relational operators!

Polybase Phase 3 55 HDFSDB Relational operators PDW YARN Application Master Key Ideas: PDW generates relational operator trees instead of MapReduce jobs How much and which part of query tree is executed in Hadoop vs. PDW is again decided by the PDW QO

Some Polybase Phase 3 Musings use Hadoop cluster to provide additional fault- tolerance for PDW. e.g. one could replicate all PDW tables on HDFS of interesting query optimization challenges effort just beginning

Summary 57 RDBM S Hadoop Combine Insight SQL SERVER PDW

SQL SERVER w. Polybase But, I am a “PC SQL Server”’ Needless to say don’t ask me how soon

Acknowledgments + many more

Thanks 60