Download presentation
Presentation is loading. Please wait.
Published byOctavia Wiggins Modified over 8 years ago
1
The Data Warehouse of the Future Where to Now? 1
2
Data Lake or Data Tsunami? 2
3
Where in the world are we? 3 … data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. – Gartner, “The State of Data Warehousing in 2012” Data sources ETL Data warehouse BI and analytics
4
Is a Data Warehouse “Old School”? Traditional BI is built on traditional architecture.
5
Is a Data Warehouse “Old School”? Predefined reports and dashboards are designed to answer questions tailored to individual roles within the organization. Interactive reports and dashboards rely on the IT department or “super users” In order to collect data from disparate systems, you need to land them in a common data store. Then you connect your analytics platform. ELT, Not ETL 5
6
The Cool Kid’s Data Warehouse 6
7
The Data Warehouse of the Future? Diverse Big Data Workload Centric Approach Data stored on multiple platforms Physically distributed data warehouse -data warehouse appliances -columnar RDBMSs -NoSQL databases -MapReduce tools, and HDFS. 7
8
The Data Warehouse of the Future…Its Here! 8
9
SQL Server Technology Drivers PolyBase JSON Data Temporal Tables In Memory Table ColumnStore Index
10
PolyBase 10
11
PolyBase Use T-SQL to store data in SQL Server from Hadoop or Azure as tables. Knowledge of Hadoop or Azure is not required to use. Pushes computation to where data resides Export relational data into Hadoop or Azure 11
12
PolyBase - External Tables, Data Sources & File Formats 12 SQL Server w/ PolyBase Social Apps Sensor &RFID Mobile Apps Web Apps Data Scientists, BI Users, DB Admins Your Apps PowerPivot PowerView PolyBase Split-Based Query Processing External Table External Data Source External File Format Hadoop Relational DW
13
PolyBase Scenarios Querying -Run T-SQL over HDFS -Combine data from different Hadoop clusters -Join relational with non-relational data ETL -Subset of Hadoop in Columnar Format -Enable data aging scenarios to more economic storage Allows building of multi-temperate DW platforms -SQL Server acts as hot query engine processing most recent data sets -Aged data immediately accessible via external tables -No need to groom data Hybrid (Azure Integration) -Mesh-up on-premise and cloud apps -Bridge between on-premise and Azure 13
14
PolyBase 1.Create external data source (Hadoop). 2.Create external file format (delimited text file). 3.Create external table pointing to file stored in Hadoop. 14 CREATE EXTERNAL TABLE [dbo].[CarSensor_Data] ( [SensorKey] int NOT NULL, [CustomerKey] int NOT NULL, [GeographyKey] int NULL, [Speed] float NOT NULL, [YearMeasured] int NOT NULL ) WITH (LOCATION='/Demo/car_sensordata.tbl', DATA_SOURCE = hdp2, FILE_FORMAT = ff2, REJECT_TYPE = VALUE, REJECT_VALUE = 0 CREATE EXTERNAL DATA SOURCE hdp2 with ( TYPE = HADOOP, LOCATION ='hdfs://10.xxx.xx.xxx:xxxx', RESOURCE_MANAGER_LOCATION='10.xxx.xx.xxx:xxxx') CREATE EXTERNAL FILE FORMAT ff2 WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)
15
PolyBase - Ad-Hoc Query joining relational with Hadoop data Who drives faster than 35 Miles > joining structured customer data stored in SQL Server with sensor data 15 SELECT DISTINCT Insured_Customers.FirstName, Insured_Customers.LastName, Insured_Customers.YearlyIncome, Insured_Customers.MaritalStatus into Fast_Customers from Insured_Customers INNER JOIN ( select * from CarSensor_Data where Speed > 35 ) as SensorD ON Insured_Customers.CustomerKey = SensorD.CustomerKey ORDER BY YearlyIncome CREATE CLUSTERED COLUMNSTORE INDEX CCI_FastCustomers ON Fast_Customers;
16
JSON Data What is JSON { "ProductID":709, "Name":"Mountain Bike Socks, M", "Color":"White", "Reviews":[ { "Reviewer":{ "Name":"John Smith", "Email":"john@fourthcoffee.com" }, "ReviewDate":"2007-10-20T00:00:00", "Rating":5, "ModifiedDate":"2007-10-20T00:00:00" } ] } 16 Product Reviews (1,n)
17
JSON Data – Export data as JSON Ability to format query results as JSON text 17 SET @json = ( SELECT 1 as firstKey, getdate() as dateKey, “Value of key” as thirdKey FOR JSON PATH) -- Result is: { "firstKey": 1, "dateKey": "2016-06-15 11:35:21", "thirdKey" : “Value of key" }
18
JSON Data Transform JSON text to relational table 18 SELECT Number, Customer, Date, Quantity FROM OPENJSON (@JSalestOrderDetails, '$.OrdersArray') WITH ( Number varchar(200), Date datetime, Customer varchar(200), Quantity int ) AS OrdersArray @JSalesOrderDetails is a text variable that contains an array of JSON objects in the property OrdersArray as it is shown in the following example: '{"OrdersArray": [ {"Number":1, "Date": "8/10/2012", "Customer": "Adventure works", "Quantity": 1200}, {"Number":4, "Date": "5/11/2012", "Customer": "Adventure works", "Quantity": 100}, {"Number":6, "Date": "1/3/2012", "Customer": "Adventure works", "Quantity": 250}, {"Number":8, "Date": "12/7/2012", "Customer": "Adventure works", "Quantity": 2200} ]}'
19
JSON DATA 19 NumberDateCustomerQuantity 18/10/2012Adventure works1200 45/11/2012Adventure works100 61/3/2012Adventure works250 812/7/2012Adventure works2200
20
JSON Data In PATH mode, you can use the dot syntax to format nested output. 20
21
Temporal Tables Temporal Table is really two tables. -Data Table -Historical Table (PERIOD) A temporal table can be defined as a table for which PERIOD definition exists comprising of system columns Slowly Changing Dimension -Data Table is Type 1 -Historical Table is Type 2 Recover accidental data changes 21
22
Temporal Tables Requirements/Limitations -Primary Key -Two columns (start and end date as datetime2) -In-Memory tables cannot be used -INSERT and UPDATE not allowed on SYSTEM_TIME period columns -History Table data cannot be changed. -Regular queries only affect data in the current table. 22
23
Temporal Tables Example: 23 CREATE TABLE dbo.TestTemporal (ID int primary key, A int, B int, C AS A*B, SysStartTime datetime2 GENERATED ALWAYS AS ROW START NOT NULL, SysEndTime datetime2 GENERATED ALWAYS AS ROW END NOT NULL, PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime) ) WITH (SYSTEM_VERSIONING = ON);
24
Temporal Tables 24
25
Temporal Tables The SELECT statement FROM clause has a new clause FOR SYSTEM_TIME with four temporal-specific sub-clauses to query data across the current and history tables. -Point in time: AS OF -Exclusive bounds: FROM TO -Inclusive lower bound, exclusive upper bound: BETWEEN AND -Inclusive bounds: CONTAINED IN (, ) 25
26
Temporal Tables For example, if you want to look at the values active for customer 27 on the first of the year: … FROM Customer FOR SYSTEM_TIME AS OF '2015-1-1' WHERE CustomerID = 27 If instead you want to see every version of the users records for that day you could write: … FROM Customer FOR SYSTEM_TIME BETWEEN '2015-1- 1' AND '2015-1-2'WHERE CustomerID = 27 26
27
In-Memory Tables Held in memory at all times. Lock Free Writes A single Columnstore index allowed -Defined at table creation -Include all columns in base table -Cannot be a filtered index Types -SCHEMA_AND_DATA -SCHEMA_ONLY 27
28
In-Memory Tables – ETL example Data Warehouse data loading -Time Series data (date and value) -Multiple Files (nightly reload) -Calculate Correlation SSIS for ETL -Load Time 14 hrs -Tried Parallel processing of Packages SSIS and Bulk Insert -T-SQL Bulk Insert from File -Achieved 20% improvement 28
29
In-Memory Tables In-Memory Staging Tables -Solution scaled linearly -Minimized writing data and log files -No disk writes, other than the final merge command Execute T-SQL commands asynchronously 29 “With my final solution, I was able to re-process all data series in under 15 minutes.”
30
Columnstore Index A columnstore is data that is logically organized as a table with rows and columns, and physically stored in a column-wise data format. A rowstore is data that is logically organized as a table with rows and columns, and then physically stored in a row-wise data format. A clustered columnstore index is the physical storage for the entire table. 30
31
Columnstore Index Standard for storing and querying large data warehousing fact tables Uses column-based data storage and query processing Up to 10x -Query Performance -Data Compression In SQL 2016 you can define one nonclustered index on a clustered columnstore index. 31
32
Columnstore Index Example: 32 CREATE TABLE t_account ( accountkey int NOT NULL, Accountdescription nvarchar (50), accounttype nvarchar(50), unitsold int ); GO --Store the table as a columnstore. CREATE CLUSTERED COLUMNSTORE INDEX taccount_cci ON t_account; GO --Add a nonclustered index. CREATE UNIQUE INDEX taccount_nc1 ON t_account (accountKey);
33
Try any of our tools for free! Twitter: @MSBI_Stan Email: stan.geiger@idera.com www.idera.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.