Building BI Solutions with SQL Server PDW AU3 Ruwen Hess Senior Program Manager Microsoft Corporation DBI321.

Slides:



Advertisements
Similar presentations
Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.
Advertisements

Introduction to ETL Using Microsoft Tools By Dr. Gabriel.
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN TechTalk Beste Skalierbarkeit dank massiv.
Garrett Edmondson Data Warehouse Architect Blue Granite Inc.
Microsoft Data Warehouse Vision Massive Scalability at Low Cost Improved Business Agility and Alignment Democratized Business Intelligence Hardware.
High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
A Fast Growing Market. Interesting New Players Lyzasoft.
Danny Tambs Solution Architect. VOLUME (Size) VARIETY (Structure) VELOCITY (Speed)
Microsoft Ignite /16/2017 5:47 PM
Exploring SQL Server Data Tier Applications Gert Drapers Principal Group Program Manager Microsoft Corporation Adam Mahood Program Manager.
Enterprise Information Management (EIM): Bringing Together SSIS, DQS, and MDS Matt Masson Senior Program Manager Microsoft Corporation Matthew Roche Senior.
Copying, Managing, and Transforming Data With DTS.
Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server.
April 10-12, Chicago, IL PDW Architecture Gets Real: Customer Implementations Brian Walker | Microsoft Corporation PDW Center of Excellence Murshed Zaman.
SQL Server Parallel Data Warehouse: Supporting Large Scale Analytics José Blakeley, Software Architect Database Systems Group, Microsoft Corporation.
Convergence /20/2017 © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
SQL Server Warehousing (Fast Track 4.0 & PDW)
Overview of SQL Server Alka Arora.
Native Support for Web Services  Native Web services access  Enables cross platform interoperability  Reduces middle-tier dependency (no IIS)  Simplifies.
DAT336 SQL Server “Yukon” – The Future of Business Intelligence Jason Carlson Product Unit Manager SQL Server Microsoft Corporation Brian Welcker Microsoft.
The Dirty Dozen: Windows PowerShell Scripts for the Busy DBA Ike Ellis.
DBI332 ilikesql brianwmitchelll UNSTRUCTURED UNBALANCED UNPREDICTABLE.
Fundamentals of Database Chapter 7 Database Technologies.
Distributed Systems Fall 2014 Zubair Amjad. Outline Motivation What is Sqoop? How Sqoop works? Sqoop Architecture Import Export Sqoop Connectors Sqoop.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
An Introduction to HDInsight June 27 th,
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN SQL Server 2012 Parallel Data Warehouse.
DAT 360: DTS in SQL Server 2000 Best Practices Euan Garden Group Manager, SQL Server Microsoft Corporation.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Welcome November 2012 Vorstellung Parallel.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
PolyBase in SQL Server 16 David J. DeWitt Rimma V. Nehme
Rushabh Mehta Managing Director (India) | Solid Quality Mentors
+1 (425) Business Continuity Solutions for SQL Database* applications in Windows Azure Alexander (Sasha) Nosov Principal Program Manager Microsoft.
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
SMP MPP with PDW ** Workload requirements usually drive the architecture decision.
SQL Server 2012 Session: 1 Session: 4 SQL Azure Data Management Using Microsoft SQL Server.
Microsoft Analytics Platform System Stefan Cronjaeger, Microsoft.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
SQL Server 2008 R2 Parallel Data Warehouse: Under the Hood Brian Mitchell Senior Premier Field Engineer.
Apache Hadoop on Windows Azure Avkash Chauhan
Integrating SQL Server FileTables, Property Search, and FTS/Semantic Search Bob Beauchemin Developer Skills Partner SQLskills.
Scaling PostgreSQL with GridSQL. Who Am I? Jim Mlodgenski – Co-organizer of NYCPUG – Founder of Cirrus Technologies – Former Chief Architect of EnterpriseDB.
Riccardo Muti Microsoft Corporation
Data Platform and Analytics Foundational Training
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
6/19/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
The New Possibilities in Microsoft Business Intelligence
9/13/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Data Warehousing: SQL Server Parallel Data Warehouse AU3 update
A developers guide to Azure SQL Data Warehouse
9/23/2018 1:04 AM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Enriching your BI Semantic Models with Data Analysis Expressions (DAX)
11/16/ :06 AM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
TechEd /19/ :10 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
Server & Tools Business
TechEd /23/ :44 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
A developers guide to Azure SQL Data Warehouse
TechEd /24/2018 6:19 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
Ch 4. The Evolution of Analytic Scalability
Kasper de Jonge Microsoft Corporation
TechEd /4/2018 3:19 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
TechEd /11/ :54 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered.
Building the Perfect BI Semantic Model for Power View
Building Self-Service BI Applications Using PowerPivot
Microsoft Analytics Platform System 03 – Distribution Theory & Design
Enriching your BI Semantic Models with Data Analysis Expressions (DAX)
Running Reporting Services in SharePoint Integrated Mode: How and Why
Presentation transcript:

Building BI Solutions with SQL Server PDW AU3 Ruwen Hess Senior Program Manager Microsoft Corporation DBI321

Source: TDWI Report – Next Generation DW Data Warehousing has shifted almost entirely towards the appliance model due to speed of the balanced appliance and scalability of scale out (MPP) solutions. Jim Cobelius, Forrester Research

Source: MS internal analysis, DBSMIT Cloud Market Opportunity Forecast CAGR -0.3% 26.2% 7.1% Share(‘15) 4.6% 5.0% 30.0% 60.4% 7.1%

Scale out Scalable Standards Based Flexible Cost Effective

CONTROL RACK DATA RACK Control Node (query submitted here) Management Node Landing Zone Backup Node Query is executed on all nodes Multiple queries are simultaneously executed across all nodes PDW supports querying while data is loading

Time Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Product Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Mktg Campaign Dim Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End PDW Compute Nodes Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold

Time Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Product Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Mktg Campaign Dim Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End PD TD MD SD PD TD MD SD PD TD MD SD PD TD MD SD Smaller Dimension Tables are Replicated on Every Compute Node

Time Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Product Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Mktg Campaign Dim Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End PD TD MD SD PD TD MD SD PD TD MD SD PD TD MD SD SF-1 SF-2 SF-3 SF-4 Larger Fact Table is Hash Distributed Across All Compute Nodes SF-1 SF-2 SF-3 SF-4

SQL Server PDW Appliance

Shuffle Movement DMS Redistributes the data by color values in parallel. Compute Node 1 Compute Node 2 Example: Select [color], SUM([qty]) from [Store Sales] group by [color]; Example: Select [color], SUM([qty]) from [Store Sales] group by [color]; Return Ss_idcolorqty Store Sales 1 Red5 3 Blue11 5 Red12 7 Green7 Ss_idcolorqty Store Sales 2 Red8 4 Blue10 6 Yellow12 Distributed Table Temp_1 Red5 12 Red8 Green7 Temp_1 Blue11 Yellow12 Blue10 colorqty colorqty Hash Blue21 Red25 Green7 Yellow12 colorqty Hash Parallel Merge and Aggregate

Legend: Control Node Client Interface (JDBC, ODBC, OLE-DB, ADO.NET) Client Interface (JDBC, ODBC, OLE-DB, ADO.NET) DMS Manager PDW Engine … Compute Node 1 DMS Core PDW Agent Landing Zone Node Bulk Data Loader PDW Agent Management Node Active Directory PDW Agent Compute Node 2 DMS Core PDW Agent Compute Node 10 DMS Core PDW Agent PDW service Data Movement ServiceDMS= Parallel Data WarehousePDW= ETL Interface ETL Interface Data Rack (up to 4)Control Rack

SQL Server Compatibility BI, Analytics, & ETL Integration Performance At Scale Broader functionality Full Alignment Less work for the same results Do the same work more efficiently Native Support for -Analysis Services -Reporting Services -PowerPivot Lay the foundation for broad connectivity support

Control Node

Shell Appliance (SQL Server) Shell Appliance (SQL Server) Engine Service Plan Steps Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Control Node SELECT foo

Shell Appliance (SQL Server) Shell Appliance (SQL Server) Engine Service Plan Steps MEMO Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Control Node SELECT Return

1. Simplification and space exploration Query standardization and simplification (e.g. column reduction, predicates push-down) Logical space exploration (e.g. join re-ordering, local/global aggregation) Space expansion (e.g. bushy trees – dealing with intermediate resultsets) Physical space exploration Serializing MEMO into binary XML (logical plans) De-serializing binary XML into PDW Memo 2. Parallel optimization and pruning Injecting data move operations (expansion) Costing different alternatives Pruning and selecting lowest cost distributed plan 3. SQL Generation Generating SQL Statements to be executed

(l_o = o_o) O (o_o) LI (l_o) (l_o = o_o) shuffle (l_pk) O (o_o) LI (l_o) (l_pk = p_pk) broadcast P (p_pk) SELECT * from orders JOIN lineitem on (o_orderkey = l_orderkey) JOIN part on (l_partkey = p_partkey) WHERE p_name like '%smoke%'; P (p_pk)

Seconds Queries 5x improvement in terms of total elapsed time out of the box

Goal Eliminate CPU utilization spent on data conversions Further parallelize operations during data moves Functionality Using ODBC instead of ADO.NET for reading and writing data Minimizing appliance resource utilization for data moves Benefits Better resource, CPU, utilization 6x or more faster move operations Increased concurrency Mixed workload (loads + queries)

SQL PDW Clients (ODBC, OLE-DB, ADO.NET) SQL Server Clients (ADO.NET, ODBC, OLE-DB, JDBC) TDS Server: , Server: , SequeLink Goal ‘Look’ just like a normal SQL Server Better integration with other BI tools Functionality Use existing SQL Server drivers to connect to SQL Server PDW Implement SQL Server TDS protocol Named Parameter support SQLCMD connectivity to PDW Benefits Use known tools and proven technology stack Existing SQL Server ’eco-system’ 2x performance improvement for return operations 5x reduction of connection time

Goal Support common scenarios of code encapsulation and reuse in Reporting and ETL Functionality System and user-defined stored procedures Invocation using RPC or EXECUTE Control flow logic, input parameters Benefits Enables common logic re-use Big impact for Reporting Services scenarios Allows porting existing scripts Increases compatibility with SQL Server Syntax CREATE { PROC | PROCEDURE } [dbo.]procedure_name [ data_type } [ = default ] ] [,...n ] AS { [ BEGIN ] sql_statement [;] [...n ] [ END ] } [;] ALTER { PROC | PROCEDURE } [dbo.]procedure_name [ data_type } [ = default ] ] [,...n ] AS { [ BEGIN ] sql_statement [;] [...n ] [ END ] } [;] DROP { PROC | PROCEDURE } { [dbo.]procedure_name } [;] [ { EXEC | EXECUTE } ] { { [database_name.][schema_name.]procedure_name } [{ value }] [,...n ] } [;] { EXEC | EXECUTE } ( | [ N ]'tsql_string' } [ +...n ] ) [;] Unsupported Functionality Stored Proc NestingOutput Params Return Try-Catch

Goal Support local and international data Functionality Fixed server level collation User-defined column level collation Supporting all Windows collations Allow COLLATE clauses in Queries and DML Benefits Store all the data in PDW w/ additional querying flexibility Existing T-SQL DDL and Query scripts SQL Server alignment and functionality Syntax CREATE TABLE T ( c1 varchar(3) COLLATE traditional_Spanish_ci_ai, c2 varchar(10) COLLATE …) SELECT c1 COLLATE Latin1_General_Bin2 FROM T SELECT * FROM T ORDER BY c1 COLLATE Latin1_General_Bin2 Unsupported Functionality  Cannot specify DB collation during DB creation  Cannot alter column collations for existing tables

Connector for Hadoop Bi-directional (import/export) interface between MSFT Hadoop and PDW Delimited file support Adapter uses existing PDW tools (bulk loader, dwsql) Low cost solution that handles all the data: structured and unstructured Additional agility, flexibility and choice Connector for Informatica Connector providing PDW source and target (mappings, transformations) Informatica uses PDW bulk loader for fast loads Leverage existing toolset and knowledge Connector for Business Objects

Seconds Queries

Portal ETL PDW Operational DB’s

Infiniband GBit link

demo PowerPivot with SQL Server PDW … just like any other SQL Server

Sensor/ RFID Data Blogs, Docs Web Data HADOOP

Sensor/ RFID Data Blogs, Docs Web Data SQL Server PDW Interactive BI/Data Visualization SQOOP Application Programmers DBMS Admin Power BI Users

… Landing Zone Compute Node 1 Compute Node 8 HDFS … PDW- configuration file PDW Hadoop Connector SQOOP export with source (HDFS path) & target (PDW DB & table) 1. FTP Server Copies incoming data on Landing Zone Read HDFS data via mappers Invokes ‘DWLoader’ Telnet Server 4.4. Control Node Compute Nodes Windows/ PDW Linux/ Hadoop 5.5.

demo Hadoop Sqoop Connector with SQL Server PDW … integrating unstructured data into your end-to-end DW/BI solution

Q1 Q2 Q3 Q4Q1 Q2 Improved node manageability Better performance and reduced overhead OEM requests Programmability Batches Control flow Variables Temp tables QDR infiniband switch Onboard Dell Columnar store index Stored procedures Integrated Authentication PowerView integration Workload management LZ/BU redundancy Windows 8 SQL Server 2012 Hardware refresh CALENDAR YEAR 2011 CALENDAR YEAR 2012 Cost based optimizer Native SQL Server drivers, including JDBC Collations More expressive query language Data Movement Services performance SCOM pack Stored procedures (subset) Half-rack 3rd party integration (Informatica, MicroStrategy, Business Objects, HADOOP) Q4 V-Next Appliance Update 3 Appliance Update 1 Shipped Appliance Update 2 Q3 Shipped

DBI209 – Big Data, Big Deal Lots of BI Tool Specific Related Sessions (PowerPivot, Analysis services, Etc.) Breakthrough Insights: Big Data Analytics & Data Warehousing Demo Station PDW Deep Dive Session Online from TechEd 2010

#msTechEd mva Microsoft Virtual Academy SQL Server 2012 Eval Copy Get Certified! Hands-On Labs

Connect. Share. Discuss. Learning Microsoft Certification & Training Resources TechNet Resources for IT Professionals Resources for Developers

Evaluations Submit your evals online