Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building Petabyte scale Interactive Data Warehouse on Azure HDInsight

Similar presentations


Presentation on theme: "Building Petabyte scale Interactive Data Warehouse on Azure HDInsight"— Presentation transcript:

1 Building Petabyte scale Interactive Data Warehouse on Azure HDInsight
5/13/ :27 PM BRK3355 Building Petabyte scale Interactive Data Warehouse on Azure HDInsight Ashish Thapliyal Principal Program Manager Azure HDInsight Dharmesh Kakadia Software Engineer Azure HDInsight © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

2 Building blocks of PB scale DW
5/13/ :27 PM Building blocks of PB scale DW © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

3 Building PB scale DW on Azure HDInsight
5/13/ :27 PM Building PB scale DW on Azure HDInsight Data Lake with no limits Rich Options to bring Data in to the Data Lake Extract and Transform the Data Blazing fast query engine to serve out complex DW style queries Rich tools for end users to gain Insights Security © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4 Fundamental Requirements from Enterprise DW
5/13/ :27 PM Fundamental Requirements from Enterprise DW User Experience Security Interactive Query Performance Query Concurrency Manageability Cost Data Volume (Scale) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5 Open Source Hadoop Journey into Enterprise Data Warehouse solutions
5/13/ :27 PM Open Source Hadoop Journey into Enterprise Data Warehouse solutions © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6 Evolution of Big Data into DW solutions
5/13/ :27 PM Evolution of Big Data into DW solutions Basic Business Intelligence Architecture Designed to process end-to-end within a given SLA Interactive Query Performance Query Concurrency Data Volume Active Directory © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7 Evolution of Big Data into DW solutions
5/13/ :27 PM Evolution of Big Data into DW solutions Bottlenecks start to appear Problem #1 – Data Volume bottlenecks both ends Increased data volume puts pressure on SMP ingestion and queries Data minimization / incremental scheduling / over engineered ETL Interactive Query Performance Query Concurrency Data Volume © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8 Evolution of Big Data into DW solutions
5/13/ :27 PM Evolution of Big Data into DW solutions Bottlenecks solved via Big Data “v1” - MPP Solution # 1 – Parallelize Everything Migrate to MPP Platforms (PDW, Teradata, Neteeza, Greenplum, Vertica) Massive Parallel Processing Interactive Query Performance Query Concurrency (mostly) Data Volume © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

9 Evolution of Big Data into DW solutions
5/13/ :27 PM Evolution of Big Data into DW solutions More Speed + More Data == More Problems Problem #2 Scaling MPP starts getting expensive Doesn’t support modern workloads / multi-workload Massive Parallel Processing Interactive Query Performance Query Concurrency Data Volume © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

10 Evolution of Big Data into DW solutions
5/13/ :27 PM Evolution of Big Data into DW solutions Offload DW tasks to Big Data Solution #2 - Hadoop HDFS provides low cost storage alternative Commercial RDBMS vendors deliver connectivity © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

11 Evolution of Big Data into DW solutions
5/13/ :27 PM Evolution of Big Data into DW solutions Business could not replace Data Warehouse without interactive Problem #3 Interactive Performance Concurrency Security © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

12 Fundamental Requirements from Enterprise DW
5/13/ :27 PM Fundamental Requirements from Enterprise DW ? User Experience ? Security Interactive Query Performance Query Concurrency ? ? ? Manageability Cost Data Volume (Scale) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

13 HDInsight Introduction
5/13/ :27 PM HDInsight Introduction © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

14 Microsoft Tech Summit FY17
5/13/ :27 PM Fully-managed Hadoop and Spark for the cloud. 99.9% SLA 100% Open Source Hortonworks data platform Clusters up and running in minutes Familiar BI tools, interactive open source notebooks Multiple IDE tooling support, including remote debugging 63% lower TCO than deploy your own Hadoop on-premises* Scale clusters on demand Secure Hadoop and Spark via Active Directory and Ranger Best in class monitoring and predictive operations via OMS Native Integration with leading ISVs Azure HDInsight Open source analytics service for the Enterprise *IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight” © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

15 5/13/ :27 PM HDInsight Demo © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

16 Building PB scale DW on Azure
5/13/ :27 PM Building PB scale DW on Azure Data Lake with no limits Rich Options to bring Data in to the Data Lake Extract and Transform the Data Blazing fast query engine to serve out complex DW style queries Rich tools for end users to gain Insights Security © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

17 Data Lake Options East US 2, Central US, North Europe All Data Centers
5/13/ :27 PM Data Lake Options Capability ADLS Azure Blob Geographic Availability East US 2, Central US, North Europe All Data Centers HDFS Yes (Web HDFS) No Scale No Limit on Bandwidth or Storage size Limits -5PB Storage -50GBps Bandwidth File Folder level ACL’s Yes No [Object Store] Role Based Access Encryption Geo- Replication Yes [LRS, GRS, RA-GRS] Cost [1PB] $40K HoT $20K COOL $16K GA Date Nov 16th 2016 Feb 1st 2010 © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

18 Higher scale storage accounts [Just Announced]
Resource New Limit Max capacity for Blob storage accounts 5PB (10x increase) Max TPS/IOPS for Blob storage accounts 50K (2.5x increase) Max ingress for Blob storage accounts 50Gbps (2.5-10x increase) Max egress for Blob storage accounts 50Gbps (2.5-5x increase)

19 What to pick? HDInsight works with both storage types
5/13/ :27 PM What to pick? HDInsight works with both storage types Possible Combinations ADLS Only WASB Only ADLS+WASB both attached to a cluster Ultimately it boils down to Geographic Feasibility Existing Data File Folder level ACL’s © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

20 Building PB scale DW on Azure
5/13/ :27 PM Building PB scale DW on Azure Data lake with no limits Rich Options to bring Data in to the Data Lake Extract and Transform the Data Blazing fast query engine to serve out complex DW style queries Rich tools for end users to gain Insights Security and Operationalization © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

21 Bring Data to Data Lake Streaming Data
5/13/ :27 PM Bring Data to Data Lake Streaming Data Event Hub, Kafka, Storm, Spark Streaming Data from traditional Sources Born in Azure On Premises, Other clouds © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

22 Streaming Data HDInsight Kafka/ Event Hub Apache Storm/Spark Streaming
CosmosDB/ HBase/Phoenix Azure Blob Store

23 Data from Traditional Sources
5/13/ :27 PM Data from Traditional Sources Azure Data Factory Oozie + Sqoop Ecosystem Partners [Talend, Talena] Import Export Service One time/occasional large data load AzCopy © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

24 ADF Data Movement Orchestrate, monitor & schedule
5/13/ :27 PM ADF Data Movement Orchestrate, monitor & schedule compose data processing, storage & movement services (on premises & cloud) Automatic infrastructure mgmt combine pipeline intent w/ resource allocation & mgmt data movement as a service (global footprint & on premises) Single pane of glass one place to manage your network of data flows © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

25 ADF – Supported Data Sources
Databases Amazon Redshift DB2 MySQL Oracle PostgreSQL SAP Business Warehouse* SAP HANA SQL Server Sybase Teradata Azure Azure Blob storage Azure Cosmos DB (DocumentDB API) Azure Data Lake Store Azure SQL Database Azure SQL Data Warehouse Azure Search Index Azure Table storage File Systems Amazon S3 File System FTP HDFS SFTP NoSQL Cassandra MongoDB Others Generic HTTP Generic OData Generic ODBC Salesforce Web Table (table from HTML) GE Historian

26 HDInsight/Azure Data Factory Pattern 1
5/13/ :27 PM HDInsight/Azure Data Factory Pattern 1 Source 1 HDInsight Cluster Result Set 1 Source 2 HDInsight Cluster Result Set 2 Source 3 HDInsight Cluster Result Set 3 Source n HDInsight Cluster Result Set n Each Pipeline Invokes HDInsight Cluster for ETL Recommended Pattern If each pipeline is producing sufficiently large data © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

27 HDInsight/Azure Data Factory Pattern 2
5/13/ :27 PM HDInsight/Azure Data Factory Pattern 2 Source 1 Result Set 1 Source 2 Result Set 2 HDInsight Cluster Source 3 Result Set 3 Source n Result Set n All Pipelines use same HDInsight Cluster Recommended Pattern - If data volume for ETL is low for Individual pipeline © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

28 HDInsight/Azure Data Factory Pattern 3
5/13/ :27 PM HDInsight/Azure Data Factory Pattern 3 Source 1 HDInsight Cluster Result Set 1 Source 2 Result Set 1 Source 3 HDInsight Cluster Result Set 1 Source n Result Set n Mix Mode, depending upon data volume © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

29 Building PB scale DW on Azure
5/13/ :27 PM Building PB scale DW on Azure Data lake with no limits Rich Options to bring Data in to the Data Lake Powerful and Flexible engines in HDInsight to Extract and Transform the Data Optimal File Formats Blazing fast query engine to serve out complex DW style queries Rich tools for end users to gain Insights Security © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

30 HDInsight Best Practices
5/13/ :27 PM HDInsight Best Practices External Metastore Storage Account Sharding Don’t forget Good old Hive Best Practices Table Partitioning *De-normalizing data Bucketing Compress Map/reduce output ORC File Format Vectorization © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

31 HDInsight Metastore Options
5/13/ :27 PM HDInsight Metastore Options Default Custom © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

32 SSH Ambari YarnUI HMaster UI Spark History Server Notebooks
5/13/ :27 PM SSH Ambari YarnUI HMaster UI Spark History Server Notebooks Subnet VNET © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

33 HDInsight Metastore Options
5/13/ :27 PM HDInsight Metastore Options Default Custom No additional cost Bring your own Azure SQL DB Scope – Cluster Lifecycle Scope – Beyond Cluster lifecycle 5 DTU [Database Transaction limit] Bring your own [S2 or above] Sharing - No Sharing – Multiple clusters can share Caution: - Different Hive versions More reading © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

34 5/13/ :27 PM File Formats ORC - Optimized Row Columnar, is a file format that provides a highly efficient way to store Hive data on Hadoop Pros High level of compression (~1/3 of csv) High level of speed on read Cons Conversion to ORC is time consuming process Not a Human Readable format Write speed equal or marginally better then text With Text caching in Hive on LLAP, read perf has gained significantly © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

35 Have you considered ACID?
5/13/ :27 PM Have you considered ACID? Myth Write Once, Read Many times, Never update Reality Bad data flows into the system Individual Rows needs to be updated Good News Hive 2.0 supports transactions with ORC files © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

36 Building PB scale DW in Azure
5/13/ :27 PM Building PB scale DW in Azure Data lake with no limits Rich Options to bring Data in to the Data Lake Powerful and Flexible engines in HDInsight to Extract and Transform the Data Blazing fast query engine to serve out complex DW style queries Rich tools for end users to gain Insights Security © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

37 Evolution of Big Data into DW solutions
5/13/ :27 PM Evolution of Big Data into DW solutions Designed to process end-to-end within a given SLA Interactive Query Performance Query Concurrency Data Volume Active Directory © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

38 Fundamental Requirements from Enterprise DW
5/13/ :27 PM Fundamental Requirements from Enterprise DW ? User Experience ? Security Interactive Query Performance Query Concurrency ? ? Operationalization Cost Data Volume (Scale) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

39 X Query Engine Move Data to SQL DW, SQL or other relational stores
5/13/ :27 PM Query Engine Move Data to SQL DW, SQL or other relational stores Serve queries from Data Lake [Interactive Query] Extract Transform Load Extract Transform Load X © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

40 Considerations for data movement
5/13/ :27 PM Considerations for data movement Pros Existing usage pattern Millisecond level SLAs on queries Cons Managing schemas across different systems Data load times can be expensive Data freshness challenges Operational challenges of managing multiple systems Cost © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

41 Use HDInsight Interactive Query [LLAP]
5/13/ :27 PM Use HDInsight Interactive Query [LLAP] Job Orchestration / Azure Data Factory HiveView 2.0/ODBC/JDBC/Zeppelin/VS Code/ Visual Studio Common Azure SQL (Metastore) ETL & Batch Cluster Interactive Query Cluster Data Lake [Azure Blob Storage & ADLS] Templeton/HiveServer2 HiveServer2 Oozie Hive Yarn Sqoop Pig Map Reduce Tez © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

42 5/13/ :27 PM So How Fast is it? © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

43 HDInsight Interactive Query Demo
5/13/ :27 PM HDInsight Interactive Query Demo TPCDS data set Azure Blob Store 64 Node Cluster (D14_V2 VM) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

44 TPCDS 1 TB [Comparison with Hive on Tez]
5/13/ :27 PM TPCDS 1 TB [Comparison with Hive on Tez] © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

45 TPCDS – Total Time [1 TB], All 99 queries
4X

46 Shared In-Memory Cache
LLAP and Hive 5/13/ :27 PM Transparent to users, BI tools, etc. – HS2/JDBC is the access point SQL Queries YARN Cluster Query Coordinators LLAP Daemon LLAP Daemon LLAP Daemon LLAP Daemon Shared In-Memory Cache ODBC JDBC QC QC Query Executors Query Executors Query Executors Query Executors HiveServer2 DAGs QC ADLS WASB © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

47 What if underlying data changes?
5/13/ :27 PM What if underlying data changes? BLOB Store SSD DRAM Cache eviction is based on source file last modified date (External Tables) Every External Table query will check modified date, and reload if a new file has arrived Segments are reloaded to cache if modified date changes © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

48 Why Interactive Query (LLAP)?
5/13/ :27 PM Maturity LLAP is Hive Supports all file formats that Hive does [ORC, Parquet, JSON, AVRO , CSV, TSV…..] Hive Operators used for processing, same compiler, etc. Rich SQL Support, Joins Speed Intelligent In-Memory Caching Hybrid model combining daemons and containers for fast, concurrent execution of analytical workloads Asynchronous IO Optimized for cloud Remote storage aware architecture Metastore caching Scale up/Scale down handling © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

49 100 TB TPCDS Benchmark Runs all 99 queries
5/13/ :27 PM 100 TB TPCDS Benchmark Runs all 99 queries Cluster [100 D14_V2 Worker Nodes] Custom SQL Metastore: S2 Storage – Azure Blob Store Out of the box settings © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

50 Interactive Query TPCDS (100 TB) Hot Runs
71 % queries under 2 min 41 % queries under 30 sec

51 Fundamental Requirements from Enterprise DW
5/13/ :27 PM Fundamental Requirements from Enterprise DW ? User Experience ? Security Interactive Query Performance Query Concurrency ? Manageability Cost Data Volume (Scale) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

52 5/13/ :27 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

53 Interactive Query - Concurrency
Ability to support concurrent users 64 concurrent queries [Default is up to 32] Only limited by cluster size and resources Multi cluster Architecture let’s you add more clusters

54 Demo – Query Concurrency
5/13/ :27 PM Demo – Query Concurrency © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

55 5/13/ :27 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

56 How can I run my own TPCDS?
5/13/ :27 PM How can I run my own TPCDS? HDInsight TPCDS Test Bench Automated Data generation and Querying Hive LLAP Spark Presto © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

57 How do I serve even more concurrent users?
5/13/ :27 PM How do I serve even more concurrent users? © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

58 HDInsight Multi- Cluster Architecture
5/13/ :27 PM HDInsight Multi- Cluster Architecture Metastore ETL Cluster[Hive, Spark] Interactive Query 1 Interactive Query 2 Interactive Query 3 ADLS/Azure Blob Store © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

59 HDInsight Multi- Cluster Architecture
5/13/ :27 PM HDInsight Multi- Cluster Architecture Metastore ETL Cluster[Hive, Spark] Interactive Query Spark Presto ADLS/Azure Blob Store © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

60 Comparing OSS Big Data Engines [DW Scenario]
Capability Interactive Query Spark SQL Presto Interactive Query Speed High Medium Scale Low Caching Yes No Intelligent Cache Eviction Complex Fact to Fact Joins ACID Query Concurrency Row , Column level security Yes [Apache Ranger+ AAD] Rich end user Tools Language Support SQL, UDF SQL, Scala, Python SQL Data Source Connector Support Storage Handlers Data Sources High number of connectors +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1

61 Comparing OSS Big Data Engines [DW Scenario]

62 Fundamental Requirements from Enterprise DW
5/13/ :27 PM Fundamental Requirements from Enterprise DW ? User Experience ? Security Interactive Query Performance Query Concurrency Manageability Cost Data Volume (Scale) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

63 Building PB scale DW in Azure
5/13/ :27 PM Building PB scale DW in Azure Data lake with no limits Rich Options to bring Data in to the Data Lake Powerful and Flexible engines in HDInsight to Extract and Transform the Data Blazing fast query engine to serve out complex DW style queries Rich tools for end users to gain Insights Security © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

64 Interactive Query Cluster
5/13/ :27 PM Rich user experience Power BI Excel Tableau Interactive Query Cluster ODBC Beeline JDBC Visual Studio Hive CLI Visual Studio Code SQuirreL SQL Apache Zeppelin © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

65 5/13/ :27 PM Demo © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

66 Fundamental Requirements from Enterprise DW
5/13/ :27 PM Fundamental Requirements from Enterprise DW User Experience ? Security Interactive Query Performance Query Concurrency Manageability Cost Data Volume (Scale) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

67 Building PB scale DW in Azure
5/13/ :27 PM Building PB scale DW in Azure Data lake with no limits Rich Options to bring Data in to the Data Lake Powerful and Flexible engines in HDInsight to Extract and Transform the Data Blazing fast query engine to serve out complex DW style queries Rich tools for end users to gain Insights Security © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

68 Interactive Query Security – rings of defense
Microsoft Ignite 2016 5/13/ :27 PM Interactive Query Security – rings of defense Perimeter level security Virtual network Network security (i.e. firewalls) Gateway Service Tunneling Authentication Kerberos Active directory Authorization Hive policies File and folder level ACLS Data security rest HTTPS/TLS In-transit © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

69 Fundamental Requirements from Enterprise DW
5/13/ :27 PM Fundamental Requirements from Enterprise DW User Experience Security Interactive Query Performance Query Concurrency Manageability Cost Data Volume (Scale) © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

70 XBOX Active Directory Interactive Query Performance Query Concurrency
5/13/ :27 PM XBOX XBOX Telemetry Event Hub Blob Storage HDInsight Hive Spokes (SQL/ADW/POWER BI/Hive Queries) Data PBs of ORC Data on Blob Storage ADF 100s Azure Data Factories (ADF) 1000s active pipelines in ADF with all sorts of activities (Hive, Sql, Blob, Custom ,etc.) HDI (D14) – Shared Hive Metastore Multiple Processing Clusters Multiple Query Clusters Designed to process end-to-end within a given SLA Interactive Query Performance Query Concurrency Data Volume Active Directory © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

71 5/13/ :27 PM Thank You © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

72 Please evaluate this session
Tech Ready 15 5/13/2018 Please evaluate this session From your Please expand notes window at bottom of slide and read. Then Delete this text box. PC or tablet: visit MyIgnite Phone: download and use the Microsoft Ignite mobile app Your input is important! © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

73


Download ppt "Building Petabyte scale Interactive Data Warehouse on Azure HDInsight"

Similar presentations


Ads by Google