The Data Warehouse of the Future Where to Now? 1.

Slides:



Advertisements
Similar presentations
Module 8 Importing and Exporting Data. Module Overview Transferring Data To/From SQL Server Importing & Exporting Table Data Inserting Data in Bulk.
Advertisements

Big Data Working with Terabytes in SQL Server Andrew Novick
Roger Breu SQL Server PDW Solution Sales Microsoft Western Europe Microsoft Solutions for Big Data | Oct 17th 2013 From Numbers.
High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
Technical BI Project Lifecycle
Danny Tambs Solution Architect. VOLUME (Size) VARIETY (Structure) VELOCITY (Speed)
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Meanwhile RAM cost continues to drop Moore’s Law on total CPU processing power holds but in parallel processing… CPU clock rate stalled… Because.
Microsoft Ignite /16/2017 4:08 PM
1.Increasing data volumes 2.New data sources and types 3.Real-time data 4.Cloud-born data 5.Hybrid infrastructures “…data warehousing has reached.
Microsoft Ignite /16/2017 5:47 PM
Fundamentals, Design, and Implementation, 9/e Chapter 11 Managing Databases with SQL Server 2000.
4 New Insights through Big Data New World of Big Data & DW – Yet another ‘Hype’? 5 … data warehousing has reached the most significant tipping point.
Copying, Managing, and Transforming Data With DTS.
Business Intelligence Overview Marc Schöni Technical Solution Professional | Business Intelligence Microsoft Switzerland.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
IST722 Data Warehousing Business Intelligence Development with SQL Server Analysis Services and Excel 2013 Michael A. Fudge, Jr.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
MySQL. Dept. of Computing Science, University of Aberdeen2 In this lecture you will learn The main subsystems in MySQL architecture The different storage.
An Introduction to HDInsight June 27 th,
Data Management Console Synonym Editor
INTRODUCING SQL SERVER 2012 COLUMNSTORE INDEXES Exploring and Managing SQL Server 2012 Database Engine Improvements.
PolyBase in SQL Server 16 David J. DeWitt Rimma V. Nehme
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
Please note that the session topic has changed
Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.
SQL Server 2016 New Innovations. Microsoft Data Platform Relational Beyond Relational On-premises Cloud Comprehensiv e Connected Choice SQL Server Azure.
PolyBase Query Hadoop with ease Sahaj Saini SQL Server, Microsoft.
Azure SQL DW – Elastic Data Analytics in the cloud Josh Sivey | Microsoft TSP #492 | Phoenix.
MSBIC Hadoop Series Hadoop & Microsoft BI Bryan Smith
SQL Server Evolution New innovations Jen Underwood Sr. Program Manager of Business Intelligence & Analytics Microsoft George Walters Sr. Technical Solutions.
Making Data Work for Everyone Gordon Phillips May 28, 2014.
An Introduction To Big Data For The SQL Server DBA.
PolyBase Query Hadoop with ease Sahaj Saini Program Manager, Microsoft.
Redmond Protocols Plugfest 2016 Casey Karst PolyBase in SQL Server 2016.
Introduction to Database Programming with Python Gary Stewart
New BI Features SQL 2016 New features of SQL 2016.
Polybase and Time Travel (Temporal Tables) Stan Geiger #492 | Phoenix 2016.
Supervisor : Prof . Abbdolahzadeh
With Temporal Tables and More
Data Platform and Analytics Foundational Training
PolyBase: T-SQL Reaching Beyond the Database
Katowice,
Microsoft /2/2018 3:42 PM BRK3129 Query Big Data using the Expanded T-SQL footprint with PolyBase in SQL Server 2016 Casey Karst Program Manager.
Temporal Databases Microsoft SQL Server 2016
Introduction to SQL 2016 Temporal Tables
The Model Architecture with SQL and Polybase
Presented by: Warren Sifre
Polybase Didn’t That Go Out in the 70’s Stan Geiger.
The Data Warehouse of the Future
A developers guide to Azure SQL Data Warehouse
Traveling in time with SQL Server 2017
A developers guide to Azure SQL Data Warehouse
Azure SQL DWH: Tips and Tricks for developers
20 Questions with Azure SQL Data Warehouse
Azure SQL DWH: Optimization
Azure SQL DWH: Tips and Tricks for developers
Azure SQL DWH: Tips and Tricks for developers
Chapter 11 Managing Databases with SQL Server 2000
Applying Data Warehouse Techniques
Applying Data Warehouse Techniques
Reinhard Flügel Possiblities and Limitations of System-Versioned Temporal Tables beyond the Basics.
Reinhard Flügel Possiblities and Limitations of System-Versioned Temporal Tables beyond the Basics.
Reinhard Flügel Possiblities and Limitations of System-Versioned Temporal Tables beyond the Basics.
SQL Server 2016 High Performance Database Offer.
Presentation transcript:

The Data Warehouse of the Future Where to Now? 1

Data Lake or Data Tsunami? 2

Where in the world are we? 3 … data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. – Gartner, “The State of Data Warehousing in 2012” Data sources ETL Data warehouse BI and analytics

Is a Data Warehouse “Old School”? Traditional BI is built on traditional architecture.

Is a Data Warehouse “Old School”?  Predefined reports and dashboards are designed to answer questions tailored to individual roles within the organization.  Interactive reports and dashboards rely on the IT department or “super users”  In order to collect data from disparate systems, you need to land them in a common data store. Then you connect your analytics platform.  ELT, Not ETL 5

The Cool Kid’s Data Warehouse 6

The Data Warehouse of the Future?  Diverse Big Data  Workload Centric Approach  Data stored on multiple platforms  Physically distributed data warehouse -data warehouse appliances -columnar RDBMSs -NoSQL databases -MapReduce tools, and HDFS. 7

The Data Warehouse of the Future…Its Here! 8

SQL Server Technology Drivers PolyBase JSON Data Temporal Tables In Memory Table ColumnStore Index

PolyBase 10

PolyBase  Use T-SQL to store data in SQL Server from Hadoop or Azure as tables.  Knowledge of Hadoop or Azure is not required to use.  Pushes computation to where data resides  Export relational data into Hadoop or Azure 11

PolyBase - External Tables, Data Sources & File Formats 12 SQL Server w/ PolyBase Social Apps Sensor &RFID Mobile Apps Web Apps Data Scientists, BI Users, DB Admins Your Apps PowerPivot PowerView PolyBase Split-Based Query Processing External Table External Data Source External File Format Hadoop Relational DW

PolyBase Scenarios  Querying -Run T-SQL over HDFS -Combine data from different Hadoop clusters -Join relational with non-relational data  ETL -Subset of Hadoop in Columnar Format -Enable data aging scenarios to more economic storage  Allows building of multi-temperate DW platforms -SQL Server acts as hot query engine processing most recent data sets -Aged data immediately accessible via external tables -No need to groom data  Hybrid (Azure Integration) -Mesh-up on-premise and cloud apps -Bridge between on-premise and Azure 13

PolyBase 1.Create external data source (Hadoop). 2.Create external file format (delimited text file). 3.Create external table pointing to file stored in Hadoop. 14 CREATE EXTERNAL TABLE [dbo].[CarSensor_Data] ( [SensorKey] int NOT NULL, [CustomerKey] int NOT NULL, [GeographyKey] int NULL, [Speed] float NOT NULL, [YearMeasured] int NOT NULL ) WITH (LOCATION='/Demo/car_sensordata.tbl', DATA_SOURCE = hdp2, FILE_FORMAT = ff2, REJECT_TYPE = VALUE, REJECT_VALUE = 0 CREATE EXTERNAL DATA SOURCE hdp2 with ( TYPE = HADOOP, LOCATION ='hdfs://10.xxx.xx.xxx:xxxx', RESOURCE_MANAGER_LOCATION='10.xxx.xx.xxx:xxxx') CREATE EXTERNAL FILE FORMAT ff2 WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)

PolyBase - Ad-Hoc Query joining relational with Hadoop data Who drives faster than 35 Miles > joining structured customer data stored in SQL Server with sensor data 15 SELECT DISTINCT Insured_Customers.FirstName, Insured_Customers.LastName, Insured_Customers.YearlyIncome, Insured_Customers.MaritalStatus into Fast_Customers from Insured_Customers INNER JOIN ( select * from CarSensor_Data where Speed > 35 ) as SensorD ON Insured_Customers.CustomerKey = SensorD.CustomerKey ORDER BY YearlyIncome CREATE CLUSTERED COLUMNSTORE INDEX CCI_FastCustomers ON Fast_Customers;

JSON Data  What is JSON { "ProductID":709, "Name":"Mountain Bike Socks, M", "Color":"White", "Reviews":[ { "Reviewer":{ "Name":"John Smith", }, "ReviewDate":" T00:00:00", "Rating":5, "ModifiedDate":" T00:00:00" } ] } 16 Product Reviews (1,n)

JSON Data – Export data as JSON  Ability to format query results as JSON text 17 = ( SELECT 1 as firstKey, getdate() as dateKey, “Value of key” as thirdKey FOR JSON PATH) -- Result is: { "firstKey": 1, "dateKey": " :35:21", "thirdKey" : “Value of key" }

JSON Data Transform JSON text to relational table 18 SELECT Number, Customer, Date, Quantity FROM OPENJSON '$.OrdersArray') WITH ( Number varchar(200), Date datetime, Customer varchar(200), Quantity int ) AS is a text variable that contains an array of JSON objects in the property OrdersArray as it is shown in the following example: '{"OrdersArray": [ {"Number":1, "Date": "8/10/2012", "Customer": "Adventure works", "Quantity": 1200}, {"Number":4, "Date": "5/11/2012", "Customer": "Adventure works", "Quantity": 100}, {"Number":6, "Date": "1/3/2012", "Customer": "Adventure works", "Quantity": 250}, {"Number":8, "Date": "12/7/2012", "Customer": "Adventure works", "Quantity": 2200} ]}'

JSON DATA 19 NumberDateCustomerQuantity 18/10/2012Adventure works /11/2012Adventure works100 61/3/2012Adventure works /7/2012Adventure works2200

JSON Data  In PATH mode, you can use the dot syntax to format nested output. 20

Temporal Tables  Temporal Table is really two tables. -Data Table -Historical Table (PERIOD)  A temporal table can be defined as a table for which PERIOD definition exists comprising of system columns  Slowly Changing Dimension -Data Table is Type 1 -Historical Table is Type 2  Recover accidental data changes 21

Temporal Tables  Requirements/Limitations -Primary Key -Two columns (start and end date as datetime2) -In-Memory tables cannot be used -INSERT and UPDATE not allowed on SYSTEM_TIME period columns -History Table data cannot be changed. -Regular queries only affect data in the current table. 22

Temporal Tables Example: 23 CREATE TABLE dbo.TestTemporal (ID int primary key, A int, B int, C AS A*B, SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL, SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL, PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime) ) WITH (SYSTEM_VERSIONING = ON);

Temporal Tables 24

Temporal Tables  The SELECT statement FROM clause has a new clause FOR SYSTEM_TIME with four temporal-specific sub-clauses to query data across the current and history tables. -Point in time: AS OF -Exclusive bounds: FROM TO -Inclusive lower bound, exclusive upper bound: BETWEEN AND -Inclusive bounds: CONTAINED IN (, ) 25

Temporal Tables  For example, if you want to look at the values active for customer 27 on the first of the year: … FROM Customer FOR SYSTEM_TIME AS OF ' ' WHERE CustomerID = 27  If instead you want to see every version of the users records for that day you could write: … FROM Customer FOR SYSTEM_TIME BETWEEN ' ' AND ' 'WHERE CustomerID = 27 26

In-Memory Tables  Held in memory at all times.  Lock Free Writes  A single Columnstore index allowed -Defined at table creation -Include all columns in base table -Cannot be a filtered index  Types -SCHEMA_AND_DATA -SCHEMA_ONLY 27

In-Memory Tables – ETL example  Data Warehouse data loading -Time Series data (date and value) -Multiple Files (nightly reload) -Calculate Correlation  SSIS for ETL -Load Time 14 hrs -Tried Parallel processing of Packages  SSIS and Bulk Insert -T-SQL Bulk Insert from File -Achieved 20% improvement  28

In-Memory Tables  In-Memory Staging Tables -Solution scaled linearly -Minimized writing data and log files -No disk writes, other than the final merge command  Execute T-SQL commands asynchronously 29 “With my final solution, I was able to re-process all data series in under 15 minutes.”

Columnstore Index  A columnstore is data that is logically organized as a table with rows and columns, and physically stored in a column-wise data format.  A rowstore is data that is logically organized as a table with rows and columns, and then physically stored in a row-wise data format.  A clustered columnstore index is the physical storage for the entire table. 30

Columnstore Index  Standard for storing and querying large data warehousing fact tables  Uses column-based data storage and query processing  Up to 10x -Query Performance -Data Compression  In SQL 2016 you can define one nonclustered index on a clustered columnstore index. 31

Columnstore Index  Example: 32 CREATE TABLE t_account ( accountkey int NOT NULL, Accountdescription nvarchar (50), accounttype nvarchar(50), unitsold int ); GO --Store the table as a columnstore. CREATE CLUSTERED COLUMNSTORE INDEX taccount_cci ON t_account; GO --Add a nonclustered index. CREATE UNIQUE INDEX taccount_nc1 ON t_account (accountKey);

Try any of our tools for free!