Data Warehousing Enhancements Dr Keith Burns Data Architect DPE, Microsoft Ltd.

Slides:

Advertisements

Similar presentations

Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.

Advertisements

SQL Server 2012 New Performance Tuning Tools. Who am I? Menzo Steinhorst Senior Premier Field Engineer SQLRAP, WS+ SQL Server Performance Tuning, WS+

Microsoft Core Systems What’s new in Windows Server 2008, Exchange Server 2007, and SQL Server 2008 Rob Campbell, Monica DeZulueta, Walter Nichols, and.

Performance and Scalability. Optimizing PerformanceScaling UpScaling Out.

Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)

Brian Alderman | MCT, CEO / Founder of MicroTechPoint Pete Harris | Microsoft Senior Content Publisher.

Project Management Database and SQL Server Katmai New Features Qingsong Yao

Balmukund Lakhani Technical Lead – SQL Support Team

ISV Proposition Keith Burns Data Architect Microsoft UK

Keith Burns Microsoft UK Mission Critical Database.

Matt Masson| Senior Program Manager

Connect with life Vinod Kumar M Technology Evangelist | Microsoft

Jeremy Boyd Director – Mindscape MSDN Regional Director

Graeme Scott – Technology Solution Professional Reduce Infrastructure Costs & Increase Productivity with SQL Server 2008.

Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.

Dual Partitioning for improved performance in VLDBs Ashwin Rao Karavadi, Rakesh Parida Microsoft IT.

1DBTest2008. Motivation Background Relational Data Warehousing (DW) SQL Server 2008 Starjoin improvement Testing Challenge Extending Enterprise-class.

Performance and Scalability. Performance and Scalability Challenges Optimizing PerformanceScaling UpScaling Out.

Course Topics Administering SQL Server 2012 Jump Start 01 | Install and Configure SQL Server04 | Manage Data 02 | Maintain Instances and Databases05 |

2 An Overview of SQL Server 2008 New Features Jeremy Boyd Mindscape MSDN Regional Director & MVP – SQL Server DAT302.

 Michael Rys Principal Lead Program Manager Microsoft Corporation BB16.

Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.

Microsoft SQL Server 2008 Spotlight on Cost 12 Ways to Reduce Costs with Microsoft SQL Server 2008 Name Title Microsoft Corporation.

Upgrading to SQL Server 2008 Graeme Scott Technology Solution Professional Microsoft Corporation.

Pedro Azevedo Lopes Premier Field Engineer Microsoft Corporation.

SharePoint enhancements through SQL Server RSS integration with SharePoint What’s New Elimination of IIS

06 | Modifying Data in SQL Server Brian Alderman | MCT, CEO / Founder of MicroTechPoint Tobias Ternstrom | Microsoft SQL Server Program Manager.

Praveen Srivatsa Director| AstrhaSoft Consulting blogs.asthrasoft.com/praveens |

demo © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names.

Connect with life Vinod Kumar Technology Evangelist - Microsoft

Chapter 4 Logical & Physical Database Design

Connect with life Nauzad Kapadia Quartz Systems

INTRODUCING SQL SERVER 2012 COLUMNSTORE INDEXES Exploring and Managing SQL Server 2012 Database Engine Improvements.

Don Vilen Program Manager, SQL Server Microsoft Corporation DAT304.

demo QueryForeign KeyInstance /sm:body()/x:Order/x:Delivery/y:TrackingId1Z

Your Data Any Place, Any Time Performance and Scalability.

Data Management Conference Performance & Scalability Simon Sabin London September 29th.

Microsoft Confidential Jon Jahren Produktsjef Applikasjonsplattform Microsoft.

Leverage SQL Server 2008 in Your.Net Code with Visual Studio 2008 SP1 David Sackstein John Bryce Training

Praveen Srivatsa Director| AstrhaSoft Consulting blogs.asthrasoft.com/praveens |

Introducing Application and Multi-Server Management.

Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.

Comprehensive Flexible Global Storage and Search Responsive Available Secure Manageable Federation Coordination Consolidation Transformation Synchronization.

Enable Operational Analytics (HTAP) in SQL Server 2016 and Azure SQL Database Sunil Agarwal Principal Program Manager, SQL Server Product Tiger Team

Use relational database as a service

Data Platform and Analytics Foundational Training

System Center Marketing

Operational Analytics in SQL Server 2016 and Azure SQL Database

System Center Marketing

AlwaysOn Readable Secondary

What’s New in SQL Server 2016 Master Data Services

Taking your application to memory

Installation and database instance essentials

Introduction to SQL Server Management for the Non-DBA

Required 9s and data protection: introduction to sql server 2012 alwayson, new high availability solution Santosh Balasubramanian Senior Program Manager.

Your Data Any Place, Any Time

Taking your application to memory

SQL Server 2016 Query Data Store

11/29/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.

Andi Comisioneru Principal Group Program Manager Microsoft Corporation

Microsoft SQL Server 2014 for Oracle DBAs Module 7

In-Memory OLTP for Database Developers

Sunil Agarwal | Principal Program Manager

Andi Comisioneru Principal Group Program Manager Microsoft Corporation

TechEd /28/2019 7:27 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.

Andrew Fryer Microsoft UK

5/8/2019 3:20 AM bQuery-Tool 3.0 A new and elegant way to create queries and ad-hoc reports on your Baan/Infor ERP LN data. This Baan session is a query.

Making Windows Azure Relevant to IT Professionals

© 2008 Microsoft Corporation. All rights reserved

Presentation transcript:

Data Warehousing Enhancements Dr Keith Burns Data Architect DPE, Microsoft Ltd.

 Transparent Data Encryption  External Key Management  Data Auditing  Pluggable CPU  Transparent Failover for Database Mirroring  Declarative Management Framework  Server Group Management  Streamlined Installation  Enterprise System Management  Performance Data Collection  System Analysis  Data Compression  Query Optimization Modes  Resource Governor  Entity Data Model  LINQ  Visual Entity Designer  Entity Aware Adapters  SQL Server Change Tracking  Synchronized Programming Model  Visual Studio Support  SQL Server Conflict Detection  FILESTREAM data type  Integrated Full Text Search  Sparse Columns  Large User Defined Types  Date/Time Data Type  LOCATION data type  SPATIAL data type  Virtual Earth Integration  Partitioned Table Parallelism  Query Optimizations  Persistent Lookups  Change Data Capture  Backup Compression  MERGE SQL Statement  Data Profiling  Star Join  Enterprise Reporting Engine  Internet Report Deployment  Block Computations  Scale out Analysis  BI Platform Management  Export to Word and Excel  Author reports in Word and Excel  Report Builder Enhancements  TABLIX  Rich Formatted Data  Personalized Perspectives  … and many more Microsoft SQL Server 2008

 Transparent Data Encryption  External Key Management  Data Auditing  Pluggable CPU  Transparent Failover for Database Mirroring  Declarative Management Framework  Server Group Management  Streamlined Installation  Enterprise System Management  Performance Data Collection  System Analysis  Data Compression  Query Optimization Modes  Resource Governor  Entity Data Model  LINQ  Visual Entity Designer  Entity Aware Adapters  SQL Server Change Tracking  Synchronized Programming Model  Visual Studio Support  SQL Server Conflict Detection  FILESTREAM data type  Integrated Full Text Search  Sparse Columns  Large User Defined Types  Date/Time Data Type  LOCATION data type  SPATIAL data type  Virtual Earth Integration  Partitioned Table Parallelism  Query Optimizations  Persistent Lookups  Change Data Capture  Backup Compression  MERGE SQL Statement  Data Profiling  Star Join  Enterprise Reporting Engine  Internet Report Deployment  Block Computations  Scale out Analysis  BI Platform Management  Export to Word and Excel  Author reports in Word and Excel  Report Builder Enhancements  TABLIX  Rich Formatted Data  Personalized Perspectives  … and many more Microsoft SQL Server 2008

MERGE New DML statement that combines multiple DML operations −Building block for more efficient ETL −SQL-2006 compliant implementation

MERGE New DML statement that combines multiple DML operations −Building block for more efficient ETL −SQL-2006 compliant implementation XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX Source Source can be any table or query

MERGE New DML statement that combines multiple DML operations −Building block for more efficient ETL −SQL-2006 compliant implementation XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX SourceTarget XXXXX X XXXX XXX Target can be any table or updateable view

MERGE New DML statement that combines multiple DML operations −Building block for more efficient ETL −SQL-2006 compliant implementation XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX SourceTarget XXXXX X XXXX XXX XX XXX XXX If source matches target, UPDATE UPDATE

MERGE New DML statement that combines multiple DML operations −Building block for more efficient ETL −SQL-2006 compliant implementation XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX SourceTarget XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX XXX If no match, INSERT

MERGE New DML statement that combines multiple DML operations −Building block for more efficient ETL −SQL-2006 compliant implementation XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX SourceTarget XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX XXX If source not matched, DELETE DELETE

MERGE MERGE Stock S USING Trades T ON S.Stock = T.Stock WHEN MATCHED AND (Qty + Delta = 0) THEN DELETE -- delete stock if Qty reaches 0 WHEN MATCHED THEN -- delete takes precedence on update UPDATE SET Qty += Delta WHEN NOT MATCHED THEN INSERT VALUES (Stock, Delta)

MERGE MERGE Stock S USING Trades T ON S.Stock = T.Stock WHEN MATCHED AND (Qty + Delta = 0) THEN DELETE -- delete stock if Qty reaches 0 WHEN MATCHED THEN -- delete takes precedence on update UPDATE SET Qty += Delta WHEN NOT MATCHED THEN INSERT VALUES (Stock, Delta) OUTPUT $action, T.Stock, inserted.Delta;

INSERT over DML Ability to have INSERT statement consume results of DML −Enhancement over OUTPUT INTO clause DML OUTPUT can be filtered with a WHERE clause −Data accessing predicates not allowed (sub-queries, data accessing UDFs and full- text) Why? −History tracking of slowly changing dimensions −Dumping DML data stream to a secondary table for post-processing

INSERT over DML INSERT INTO Books (ISBN, Price, Shelf, EndValidDate) SELECT ISBN, Price, Shelf, GetDate() FROM (MERGE Books T USING WeeklyChanges AS S ON T.ISBN = S.ISBN AND T.EndValidDate IS NULL WHEN MATCHED AND (T.Price <> S.Price OR T.Shelf <> S.Shelf) THEN UPDATE SET Price = S.Price, Shelf = S.Shelf WHEN NOT MATCHED THEN INSERT VALUES(S.ISBN, S.Price, S.Shelf, NULL) OUTPUT $action, S.ISBN, Deleted.Price, Deleted.Shelf ) Changes(Action, ISBN, Price, Shelf) WHERE Action = 'UPDATE’;

MERGE statement

Logging Enhancements Minimal logging = log only what is strictly necessary for rollback −Normally individual rows are logged −Page allocations are sufficient to UNDO insertions Recovery model must be simple or bulk- logged Previous releases −CREATE INDEX −SELECT INTO −BULK INSERT/BCP with TABLOCK

Logging Enhancements SQL Server 2008 −INSERT into table supports minimal logging −3X-5X Performance Boost over fully logged INSERT Run Time

Logging demo

Change Data Capture Mechanism to easily track changes on a table −Changes captured from the log asynchronously −Information on what changed at the source Table-Valued Functions (TVF) to query change data −Easily consumable from Integration Services XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX XXXXX XXX XXX XXXX XXX XXXXXXXXXX X XXX XXXX XXXXX XXXXXX XXX Source Table Transaction Log Change Table CDC Functions Capture Process

Change Data Capture Simon Sabin Onarc Consulting

Data Compression Problem: −Database sizes are growing −Storage costs are becoming the dominant hardware cost Main goal: Shrink DW fact tables Secondary goal: Improve query performance Enabled per table or index Tradeoff on CPU usage

Data Compression

SQL Server 2005 SP2 −VarDecimal storage Enables decimal values to be stored as variable-length data

Data Compression Fixed-length Column SQL Server 2008 extends the logic to all fixed-length data types −e.g. int, bigint, etc.

Data Compression Prefix Compression A prefix list is stored in the page for common prefixes Individual values are replaced by −Token for the prefix −Suffix for the value C D

Data Compression Dictionary Compression A common value dictionary is stored in the page Common values are replaced by tokens 1.5X to 7X compression ratio for real DW fact data anticipated, depending on data C D C D

Table: Orders Partitioned on a weekly basis on OrderDate Monday Morning Run Weekly Report Great Response Time Happy Users Tuesday Morning Run Weekly Report Poor Response Time Unhappy Users Why? Partitioned Table Parallelism

SQL Server 2005 query −One partition => multiple threads −Multiple partitions => single thread / partition SQL Server 2008 query −Multiple partitions => all threads utilised −Far more predictable query performance

Paritition Aligned Index Views SQL Server 2005: −Select ProductName, count(*) from ProductSales group by ProductName −Index view is not partition aligned −Drop index view before switching partitions SQL Server 2008 −Index views can be partition aligned −Basically:- −Create view with SCHEMABINDING as in 2005 −Create index on the view but add on “filegroup” clause −Do this for both tables in switch statement − −Gives performance of index view without having to drop views which switching partitions.

SQL Server 2005 strategies SQL Server 2008 additional query plans considered Table Scan Star Join Query Processing

Fact Table Scan Dimension 2 Dimension 1 Hash Join Star Join Query Processing

Fact Table Scan Dimension 2 Dimension 1 Hash Join Bitmap Filter SQL Server 2005 can create one bitmap filter Star Join Query Processing

Fact Table Scan Dimension 2 Dimension 1 Hash Join Bitmap Filter 2 Bitmap Filter 1 SQL Server 2008 can create multiple bitmap filters Star Join Query Processing

Fact Table Scan Dimension 2 Dimension 1 Hash Join Bitmap Filter 1 Bitmap Filter 2 SQL Server 2008 can move and reorder the filters Star Join Query Processing

Grouping Sets Extension to the GROUP BY clause Ability to define multiple groupings in the same query Produces a single result set that is equivalent to a UNION ALL of differently grouped rows SQL 2006 standard compatible Makes aggregation querying and reporting easier and faster SELECT a, b, c, d, SUM(sales) FROM Table GROUP BY GROUPING SETS ((a,b,c,), (c,d), ())

Example (GROUPING SETS) -- Use UNION ALL on dual SELECT statements SELECT customerType,Null as TerritoryID,MAX(ModifiedDate) FROM Sales.Customer GROUP BY customerType UNION ALL SELECT Null as customerType,TerritoryID,MAX(ModifiedDate) FROM Sales.Customer GROUP BY TerritoryID order by TerritoryID -- Use GROUPING SETS on single SELECT statement SELECT customerType, TerritoryID, max(ModifiedDate) FROM Sales.Customer GROUP BY GROUPING SETS ((customerType), (TerritoryID)) order by customerType

Backup Compression Pain points: −Keeping disk-based backups online is expensive −Backups take longer, windows are shrinking SQL Server 2008 −WITH COMPRESSION clause to BACKUP −Less storage required to keep backups online −Backups run significantly faster, as less IO is done −Restore automatically detects compression and adjusts accordingly

SQL Server SQL 2005 Resource Management Single resource pool Database engine doesn’t differentiate workloads Best effort resource sharing Backup Admin Tasks Executive Reports OLTP Activity Ad-hoc Reports Workloads Memory, CPU, Threads, … Resources

SQL Server Resource Governor – Workloads Ability to differentiate workloads −e.g. app_name, login Per-request limits −Max memory % −Max CPU time −Grant timeout −Max Requests Resource monitoring Memory, CPU, Threads, … Resources Admin Workload Backup Admin Tasks OLTP Workload OLTP Activity Report Workload Ad-hoc Reports Executive Reports

SQL Server Memory, CPU, Threads, … Resources Admin Workload Backup Admin Tasks OLTP Workload OLTP Activity Report Workload Ad-hoc Reports Executive Reports Resource Governor – Importance A workload can have an importance label −Low −Medium −High Gives resource allocation preference to workloads based on importance High

Resource Governor – Pools Resource pool: A virtual subset of physical database engine resources Provides controls to specify −Min Memory % −Max Memory % −Min CPU % −Max CPU % −Max DOP Resource monitoring Up to 20 resource pools SQL Server Min Memory 10% Max Memory 20% Max CPU 20% Min Memory 10% Max Memory 20% Max CPU 20% Admin Workload Backup Admin Tasks OLTP Workload OLTP Activity Report Workload Ad-hoc Reports Executive Reports High Max CPU 90% Application Pool Admin Pool

Resource Governor Putting it all together Workloads are mapped to Resource Pools (n : 1) Online changes of groups/pools SQL Server 2005 = default group + default pool Main Benefit Prevent run-away queries SQL Server Min Memory 10% Max Memory 20% Max CPU 20% Min Memory 10% Max Memory 20% Max CPU 20% Admin Workload Backup Admin Tasks OLTP Workload OLTP Activity Report Workload Ad-hoc Reports Executive Reports High Max CPU 90% Application Pool Admin Pool

Resource Governor Martin Bell Carillon Software Systems Limited

New Date and Time data types Date Only From 1/1/0001 to 1/1/ bytes Date Time Only Optional precision up to 100 nanoseconds 3 to 5 bytes (default 5bytes ie full resolution) Time Timezone aware UTC datetime Optional Precision up to 100 nanoseconds 8 to 10 bytes (default 10bytes ie full resolution) DateTimeOffset Large Date Range Optional Precision up to 100 nanoseconds 6 to 8 bytes (default 8bytes ie full resolution) DateTime2 Plus assorted new date time functions eg SYSDATETIMEOFFSET()

Sparse Column Storage IDColumnValue 1Q1C 1Q21 1Q109 2Q1B 2Q34 2Q5Low 3Q1C 3Q76 3Q85 PKQ1Q2Q3Q4Q5Q6Q7Q8Q9Q10 1C19 2B4Low 3C EHighBlue 6C A2Red 9A36 Desired schema Typical Solution The problem −Need to store spare data −Possibly 100’s of columns −Typically only few % are populated

Sparse Columns “Sparse” as a storage attribute on a column −0 bytes for a NULL, 4 byte overhead for non-NULL −No change in Query/DML behavior −Same limitations as normal tables eg 1024 columns Wide Table -defining a “Sparse Column Set ” −An un-typed XML column, with a published format −Logical grouping for all sparse columns in a table −Select * returns all non-sparse-columns, sparse column set (XML) −Allows generic retrieval/update of all sparse columns as a set −30,000 sparse columns allowed in a table (2Gb), 1000 indexes // Sparse as a storage attibute in Create/Alter table statements Create Table Products(Id int, Type nvarchar(16)…, Resolution int SPARSE, ZoomLength int SPARSE); // Create a sparse column set Create Table Products(Id int, Type nvarchar(16)…, Resolution int SPARSE, ZoomLength int SPARSE, Properties XML COLUMN_SET FOR ALL_SPARSE_COLUMNS); // Sparse as a storage attibute in Create/Alter table statements Create Table Products(Id int, Type nvarchar(16)…, Resolution int SPARSE, ZoomLength int SPARSE); // Create a sparse column set Create Table Products(Id int, Type nvarchar(16)…, Resolution int SPARSE, ZoomLength int SPARSE, Properties XML COLUMN_SET FOR ALL_SPARSE_COLUMNS);

Filtered Indexes Filtered Indexes and Statistics −Indexing a portion of the data in a table −Filtered/co-related statistics creation and usage −Query/DML Optimization to use Filtered indexes and Statistics −Restricted to non-clustered indexes Benefits −Lower storage and maintenance costs for large number of indexes −Query/DML Performance Benefits: IO only for qualifying rows // Create a Filtered Indexes Create Index ZoomIdx on Products(ZoomLength) where Type = ‘Camera’; // Optimizer will pick the filtered index when query predicates match Select ProductId, Type, Resolution, ZoomLength where Type = ‘Camera’

 Transparent Data Encryption  External Key Management  Data Auditing  Pluggable CPU  Transparent Failover for Database Mirroring  Declarative Management Framework  Server Group Management  Streamlined Installation  Enterprise System Management  Performance Data Collection  System Analysis  Data Compression  Query Optimization Modes  Resource Governor  Entity Data Model  LINQ  Visual Entity Designer  Entity Aware Adapters  SQL Server Change Tracking  Synchronized Programming Model  Visual Studio Support  SQL Server Conflict Detection  FILESTREAM data type  Integrated Full Text Search  Sparse Columns  Large User Defined Types  Date/Time Data Type  LOCATION data type  SPATIAL data type  Virtual Earth Integration  Partitioned Table Parallelism  Query Optimizations  Persistent Lookups  Change Data Capture  Backup Compression  MERGE SQL Statement  Data Profiling  Star Join  Enterprise Reporting Engine  Internet Report Deployment  Block Computations  Scale out Analysis  BI Platform Management  Export to Word and Excel  Author reports in Word and Excel  Report Builder Enhancements  TABLIX  Rich Formatted Data  Personalized Perspectives  … and many more Microsoft SQL Server 2008

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.