The Data Warehouse of the Future Where to Now? 1.

The Data Warehouse of the Future Where to Now? 1

Data Lake or Data Tsunami? 2

Where in the world are we? 3 … data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. – Gartner, “The State of Data Warehousing in 2012” Data sources ETL Data warehouse BI and analytics

Is a Data Warehouse “Old School”? Traditional BI is built on traditional architecture.

Is a Data Warehouse “Old School”?  Predefined reports and dashboards are designed to answer questions tailored to individual roles within the organization.  Interactive reports and dashboards rely on the IT department or “super users”  In order to collect data from disparate systems, you need to land them in a common data store. Then you connect your analytics platform.  ELT, Not ETL 5

The Cool Kid’s Data Warehouse 6

The Data Warehouse of the Future?  Diverse Big Data  Workload Centric Approach  Data stored on multiple platforms  Physically distributed data warehouse -data warehouse appliances -columnar RDBMSs -NoSQL databases -MapReduce tools, and HDFS. 7

The Data Warehouse of the Future…Its Here! 8

SQL Server Technology Drivers PolyBase JSON Data Temporal Tables In Memory Table ColumnStore Index

PolyBase 10

PolyBase  Use T-SQL to store data in SQL Server from Hadoop or Azure as tables.  Knowledge of Hadoop or Azure is not required to use.  Pushes computation to where data resides  Export relational data into Hadoop or Azure 11

PolyBase - External Tables, Data Sources & File Formats 12 SQL Server w/ PolyBase Social Apps Sensor &RFID Mobile Apps Web Apps Data Scientists, BI Users, DB Admins Your Apps PowerPivot PowerView PolyBase Split-Based Query Processing External Table External Data Source External File Format Hadoop Relational DW

PolyBase Scenarios  Querying -Run T-SQL over HDFS -Combine data from different Hadoop clusters -Join relational with non-relational data  ETL -Subset of Hadoop in Columnar Format -Enable data aging scenarios to more economic storage  Allows building of multi-temperate DW platforms -SQL Server acts as hot query engine processing most recent data sets -Aged data immediately accessible via external tables -No need to groom data  Hybrid (Azure Integration) -Mesh-up on-premise and cloud apps -Bridge between on-premise and Azure 13

PolyBase 1.Create external data source (Hadoop). 2.Create external file format (delimited text file). 3.Create external table pointing to file stored in Hadoop. 14 CREATE EXTERNAL TABLE [dbo].[CarSensor_Data] ( [SensorKey] int NOT NULL, [CustomerKey] int NOT NULL, [GeographyKey] int NULL, [Speed] float NOT NULL, [YearMeasured] int NOT NULL ) WITH (LOCATION='/Demo/car_sensordata.tbl', DATA_SOURCE = hdp2, FILE_FORMAT = ff2, REJECT_TYPE = VALUE, REJECT_VALUE = 0 CREATE EXTERNAL DATA SOURCE hdp2 with ( TYPE = HADOOP, LOCATION ='hdfs://10.xxx.xx.xxx:xxxx', RESOURCE_MANAGER_LOCATION='10.xxx.xx.xxx:xxxx') CREATE EXTERNAL FILE FORMAT ff2 WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)

PolyBase - Ad-Hoc Query joining relational with Hadoop data Who drives faster than 35 Miles > joining structured customer data stored in SQL Server with sensor data 15 SELECT DISTINCT Insured_Customers.FirstName, Insured_Customers.LastName, Insured_Customers.YearlyIncome, Insured_Customers.MaritalStatus into Fast_Customers from Insured_Customers INNER JOIN ( select * from CarSensor_Data where Speed > 35 ) as SensorD ON Insured_Customers.CustomerKey = SensorD.CustomerKey ORDER BY YearlyIncome CREATE CLUSTERED COLUMNSTORE INDEX CCI_FastCustomers ON Fast_Customers;

JSON Data  What is JSON { "ProductID":709, "Name":"Mountain Bike Socks, M", "Color":"White", "Reviews":[ { "Reviewer":{ "Name":"John Smith", "Email":"john@fourthcoffee.com" }, "ReviewDate":"2007-10-20T00:00:00", "Rating":5, "ModifiedDate":"2007-10-20T00:00:00" } ] } 16 Product Reviews (1,n)

JSON Data – Export data as JSON  Ability to format query results as JSON text 17 SET @json = ( SELECT 1 as firstKey, getdate() as dateKey, “Value of key” as thirdKey FOR JSON PATH) -- Result is: { "firstKey": 1, "dateKey": "2016-06-15 11:35:21", "thirdKey" : “Value of key" }

JSON Data Transform JSON text to relational table 18 SELECT Number, Customer, Date, Quantity FROM OPENJSON (@JSalestOrderDetails, '$.OrdersArray') WITH ( Number varchar(200), Date datetime, Customer varchar(200), Quantity int ) AS OrdersArray @JSalesOrderDetails is a text variable that contains an array of JSON objects in the property OrdersArray as it is shown in the following example: '{"OrdersArray": [ {"Number":1, "Date": "8/10/2012", "Customer": "Adventure works", "Quantity": 1200}, {"Number":4, "Date": "5/11/2012", "Customer": "Adventure works", "Quantity": 100}, {"Number":6, "Date": "1/3/2012", "Customer": "Adventure works", "Quantity": 250}, {"Number":8, "Date": "12/7/2012", "Customer": "Adventure works", "Quantity": 2200} ]}'

JSON DATA 19 NumberDateCustomerQuantity 18/10/2012Adventure works1200 45/11/2012Adventure works100 61/3/2012Adventure works250 812/7/2012Adventure works2200

JSON Data  In PATH mode, you can use the dot syntax to format nested output. 20

Temporal Tables  Temporal Table is really two tables. -Data Table -Historical Table (PERIOD)  A temporal table can be defined as a table for which PERIOD definition exists comprising of system columns  Slowly Changing Dimension -Data Table is Type 1 -Historical Table is Type 2  Recover accidental data changes 21

Temporal Tables  Requirements/Limitations -Primary Key -Two columns (start and end date as datetime2) -In-Memory tables cannot be used -INSERT and UPDATE not allowed on SYSTEM_TIME period columns -History Table data cannot be changed. -Regular queries only affect data in the current table. 22

Temporal Tables Example: 23 CREATE TABLE dbo.TestTemporal (ID int primary key, A int, B int, C AS A*B, SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL, SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL, PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime) ) WITH (SYSTEM_VERSIONING = ON);

Temporal Tables 24

Temporal Tables  The SELECT statement FROM clause has a new clause FOR SYSTEM_TIME with four temporal-specific sub-clauses to query data across the current and history tables. -Point in time: AS OF -Exclusive bounds: FROM TO -Inclusive lower bound, exclusive upper bound: BETWEEN AND -Inclusive bounds: CONTAINED IN (, ) 25

Temporal Tables  For example, if you want to look at the values active for customer 27 on the first of the year: … FROM Customer FOR SYSTEM_TIME AS OF '2015-1-1' WHERE CustomerID = 27  If instead you want to see every version of the users records for that day you could write: … FROM Customer FOR SYSTEM_TIME BETWEEN '2015-1- 1' AND '2015-1-2'WHERE CustomerID = 27 26

In-Memory Tables  Held in memory at all times.  Lock Free Writes  A single Columnstore index allowed -Defined at table creation -Include all columns in base table -Cannot be a filtered index  Types -SCHEMA_AND_DATA -SCHEMA_ONLY 27

In-Memory Tables – ETL example  Data Warehouse data loading -Time Series data (date and value) -Multiple Files (nightly reload) -Calculate Correlation  SSIS for ETL -Load Time 14 hrs -Tried Parallel processing of Packages  SSIS and Bulk Insert -T-SQL Bulk Insert from File -Achieved 20% improvement  28

In-Memory Tables  In-Memory Staging Tables -Solution scaled linearly -Minimized writing data and log files -No disk writes, other than the final merge command  Execute T-SQL commands asynchronously 29 “With my final solution, I was able to re-process all data series in under 15 minutes.”

Columnstore Index  A columnstore is data that is logically organized as a table with rows and columns, and physically stored in a column-wise data format.  A rowstore is data that is logically organized as a table with rows and columns, and then physically stored in a row-wise data format.  A clustered columnstore index is the physical storage for the entire table. 30

Columnstore Index  Standard for storing and querying large data warehousing fact tables  Uses column-based data storage and query processing  Up to 10x -Query Performance -Data Compression  In SQL 2016 you can define one nonclustered index on a clustered columnstore index. 31

Columnstore Index  Example: 32 CREATE TABLE t_account ( accountkey int NOT NULL, Accountdescription nvarchar (50), accounttype nvarchar(50), unitsold int ); GO --Store the table as a columnstore. CREATE CLUSTERED COLUMNSTORE INDEX taccount_cci ON t_account; GO --Add a nonclustered index. CREATE UNIQUE INDEX taccount_nc1 ON t_account (accountKey);

Try any of our tools for free! Twitter: @MSBI_Stan Email: stan.geiger@idera.com www.idera.com

The Data Warehouse of the Future Where to Now? 1.

Similar presentations

Presentation on theme: "The Data Warehouse of the Future Where to Now? 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Data Warehouse of the Future Where to Now? 1.

Similar presentations

Presentation on theme: "The Data Warehouse of the Future Where to Now? 1."— Presentation transcript:

Similar presentations

About project

Feedback