Data Warehousing Enhancements Dr Keith Burns Data Architect DPE, Microsoft Ltd.
Transparent Data Encryption External Key Management Data Auditing Pluggable CPU Transparent Failover for Database Mirroring Declarative Management Framework Server Group Management Streamlined Installation Enterprise System Management Performance Data Collection System Analysis Data Compression Query Optimization Modes Resource Governor Entity Data Model LINQ Visual Entity Designer Entity Aware Adapters SQL Server Change Tracking Synchronized Programming Model Visual Studio Support SQL Server Conflict Detection FILESTREAM data type Integrated Full Text Search Sparse Columns Large User Defined Types Date/Time Data Type LOCATION data type SPATIAL data type Virtual Earth Integration Partitioned Table Parallelism Query Optimizations Persistent Lookups Change Data Capture Backup Compression MERGE SQL Statement Data Profiling Star Join Enterprise Reporting Engine Internet Report Deployment Block Computations Scale out Analysis BI Platform Management Export to Word and Excel Author reports in Word and Excel Report Builder Enhancements TABLIX Rich Formatted Data Personalized Perspectives … and many more Microsoft SQL Server 2008
Transparent Data Encryption External Key Management Data Auditing Pluggable CPU Transparent Failover for Database Mirroring Declarative Management Framework Server Group Management Streamlined Installation Enterprise System Management Performance Data Collection System Analysis Data Compression Query Optimization Modes Resource Governor Entity Data Model LINQ Visual Entity Designer Entity Aware Adapters SQL Server Change Tracking Synchronized Programming Model Visual Studio Support SQL Server Conflict Detection FILESTREAM data type Integrated Full Text Search Sparse Columns Large User Defined Types Date/Time Data Type LOCATION data type SPATIAL data type Virtual Earth Integration Partitioned Table Parallelism Query Optimizations Persistent Lookups Change Data Capture Backup Compression MERGE SQL Statement Data Profiling Star Join Enterprise Reporting Engine Internet Report Deployment Block Computations Scale out Analysis BI Platform Management Export to Word and Excel Author reports in Word and Excel Report Builder Enhancements TABLIX Rich Formatted Data Personalized Perspectives … and many more Microsoft SQL Server 2008
MERGE New DML statement that combines multiple DML operations −Building block for more efficient ETL −SQL-2006 compliant implementation
MERGE New DML statement that combines multiple DML operations −Building block for more efficient ETL −SQL-2006 compliant implementation XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX Source Source can be any table or query
MERGE New DML statement that combines multiple DML operations −Building block for more efficient ETL −SQL-2006 compliant implementation XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX SourceTarget XXXXX X XXXX XXX Target can be any table or updateable view
MERGE New DML statement that combines multiple DML operations −Building block for more efficient ETL −SQL-2006 compliant implementation XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX SourceTarget XXXXX X XXXX XXX XX XXX XXX If source matches target, UPDATE UPDATE
MERGE New DML statement that combines multiple DML operations −Building block for more efficient ETL −SQL-2006 compliant implementation XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX SourceTarget XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX XXX If no match, INSERT
MERGE New DML statement that combines multiple DML operations −Building block for more efficient ETL −SQL-2006 compliant implementation XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX SourceTarget XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX XXX If source not matched, DELETE DELETE
MERGE MERGE Stock S USING Trades T ON S.Stock = T.Stock WHEN MATCHED AND (Qty + Delta = 0) THEN DELETE -- delete stock if Qty reaches 0 WHEN MATCHED THEN -- delete takes precedence on update UPDATE SET Qty += Delta WHEN NOT MATCHED THEN INSERT VALUES (Stock, Delta)
MERGE MERGE Stock S USING Trades T ON S.Stock = T.Stock WHEN MATCHED AND (Qty + Delta = 0) THEN DELETE -- delete stock if Qty reaches 0 WHEN MATCHED THEN -- delete takes precedence on update UPDATE SET Qty += Delta WHEN NOT MATCHED THEN INSERT VALUES (Stock, Delta) OUTPUT $action, T.Stock, inserted.Delta;
INSERT over DML Ability to have INSERT statement consume results of DML −Enhancement over OUTPUT INTO clause DML OUTPUT can be filtered with a WHERE clause −Data accessing predicates not allowed (sub-queries, data accessing UDFs and full- text) Why? −History tracking of slowly changing dimensions −Dumping DML data stream to a secondary table for post-processing
INSERT over DML INSERT INTO Books (ISBN, Price, Shelf, EndValidDate) SELECT ISBN, Price, Shelf, GetDate() FROM (MERGE Books T USING WeeklyChanges AS S ON T.ISBN = S.ISBN AND T.EndValidDate IS NULL WHEN MATCHED AND (T.Price <> S.Price OR T.Shelf <> S.Shelf) THEN UPDATE SET Price = S.Price, Shelf = S.Shelf WHEN NOT MATCHED THEN INSERT VALUES(S.ISBN, S.Price, S.Shelf, NULL) OUTPUT $action, S.ISBN, Deleted.Price, Deleted.Shelf ) Changes(Action, ISBN, Price, Shelf) WHERE Action = 'UPDATE’;
MERGE statement
Logging Enhancements Minimal logging = log only what is strictly necessary for rollback −Normally individual rows are logged −Page allocations are sufficient to UNDO insertions Recovery model must be simple or bulk- logged Previous releases −CREATE INDEX −SELECT INTO −BULK INSERT/BCP with TABLOCK
Logging Enhancements SQL Server 2008 −INSERT into table supports minimal logging −3X-5X Performance Boost over fully logged INSERT Run Time
Logging demo
Change Data Capture Mechanism to easily track changes on a table −Changes captured from the log asynchronously −Information on what changed at the source Table-Valued Functions (TVF) to query change data −Easily consumable from Integration Services XXXXX XXX XXX XXXX XXX XXXXXXXXXX XX XXXX XXXXX XXX XX XXXXX XXX XXX XXXX XXX XXXXXXXXXX X XXX XXXX XXXXX XXXXXX XXX Source Table Transaction Log Change Table CDC Functions Capture Process
Change Data Capture Simon Sabin Onarc Consulting
Data Compression Problem: −Database sizes are growing −Storage costs are becoming the dominant hardware cost Main goal: Shrink DW fact tables Secondary goal: Improve query performance Enabled per table or index Tradeoff on CPU usage
Data Compression
SQL Server 2005 SP2 −VarDecimal storage Enables decimal values to be stored as variable-length data
Data Compression Fixed-length Column SQL Server 2008 extends the logic to all fixed-length data types −e.g. int, bigint, etc.
Data Compression Prefix Compression A prefix list is stored in the page for common prefixes Individual values are replaced by −Token for the prefix −Suffix for the value C D
Data Compression Dictionary Compression A common value dictionary is stored in the page Common values are replaced by tokens 1.5X to 7X compression ratio for real DW fact data anticipated, depending on data C D C D
Table: Orders Partitioned on a weekly basis on OrderDate Monday Morning Run Weekly Report Great Response Time Happy Users Tuesday Morning Run Weekly Report Poor Response Time Unhappy Users Why? Partitioned Table Parallelism
SQL Server 2005 query −One partition => multiple threads −Multiple partitions => single thread / partition SQL Server 2008 query −Multiple partitions => all threads utilised −Far more predictable query performance
Paritition Aligned Index Views SQL Server 2005: −Select ProductName, count(*) from ProductSales group by ProductName −Index view is not partition aligned −Drop index view before switching partitions SQL Server 2008 −Index views can be partition aligned −Basically:- −Create view with SCHEMABINDING as in 2005 −Create index on the view but add on “filegroup” clause −Do this for both tables in switch statement − −Gives performance of index view without having to drop views which switching partitions.
SQL Server 2005 strategies SQL Server 2008 additional query plans considered Table Scan Star Join Query Processing
Fact Table Scan Dimension 2 Dimension 1 Hash Join Star Join Query Processing
Fact Table Scan Dimension 2 Dimension 1 Hash Join Bitmap Filter SQL Server 2005 can create one bitmap filter Star Join Query Processing
Fact Table Scan Dimension 2 Dimension 1 Hash Join Bitmap Filter 2 Bitmap Filter 1 SQL Server 2008 can create multiple bitmap filters Star Join Query Processing
Fact Table Scan Dimension 2 Dimension 1 Hash Join Bitmap Filter 1 Bitmap Filter 2 SQL Server 2008 can move and reorder the filters Star Join Query Processing
Grouping Sets Extension to the GROUP BY clause Ability to define multiple groupings in the same query Produces a single result set that is equivalent to a UNION ALL of differently grouped rows SQL 2006 standard compatible Makes aggregation querying and reporting easier and faster SELECT a, b, c, d, SUM(sales) FROM Table GROUP BY GROUPING SETS ((a,b,c,), (c,d), ())
Example (GROUPING SETS) -- Use UNION ALL on dual SELECT statements SELECT customerType,Null as TerritoryID,MAX(ModifiedDate) FROM Sales.Customer GROUP BY customerType UNION ALL SELECT Null as customerType,TerritoryID,MAX(ModifiedDate) FROM Sales.Customer GROUP BY TerritoryID order by TerritoryID -- Use GROUPING SETS on single SELECT statement SELECT customerType, TerritoryID, max(ModifiedDate) FROM Sales.Customer GROUP BY GROUPING SETS ((customerType), (TerritoryID)) order by customerType
Backup Compression Pain points: −Keeping disk-based backups online is expensive −Backups take longer, windows are shrinking SQL Server 2008 −WITH COMPRESSION clause to BACKUP −Less storage required to keep backups online −Backups run significantly faster, as less IO is done −Restore automatically detects compression and adjusts accordingly
SQL Server SQL 2005 Resource Management Single resource pool Database engine doesn’t differentiate workloads Best effort resource sharing Backup Admin Tasks Executive Reports OLTP Activity Ad-hoc Reports Workloads Memory, CPU, Threads, … Resources
SQL Server Resource Governor – Workloads Ability to differentiate workloads −e.g. app_name, login Per-request limits −Max memory % −Max CPU time −Grant timeout −Max Requests Resource monitoring Memory, CPU, Threads, … Resources Admin Workload Backup Admin Tasks OLTP Workload OLTP Activity Report Workload Ad-hoc Reports Executive Reports
SQL Server Memory, CPU, Threads, … Resources Admin Workload Backup Admin Tasks OLTP Workload OLTP Activity Report Workload Ad-hoc Reports Executive Reports Resource Governor – Importance A workload can have an importance label −Low −Medium −High Gives resource allocation preference to workloads based on importance High
Resource Governor – Pools Resource pool: A virtual subset of physical database engine resources Provides controls to specify −Min Memory % −Max Memory % −Min CPU % −Max CPU % −Max DOP Resource monitoring Up to 20 resource pools SQL Server Min Memory 10% Max Memory 20% Max CPU 20% Min Memory 10% Max Memory 20% Max CPU 20% Admin Workload Backup Admin Tasks OLTP Workload OLTP Activity Report Workload Ad-hoc Reports Executive Reports High Max CPU 90% Application Pool Admin Pool
Resource Governor Putting it all together Workloads are mapped to Resource Pools (n : 1) Online changes of groups/pools SQL Server 2005 = default group + default pool Main Benefit Prevent run-away queries SQL Server Min Memory 10% Max Memory 20% Max CPU 20% Min Memory 10% Max Memory 20% Max CPU 20% Admin Workload Backup Admin Tasks OLTP Workload OLTP Activity Report Workload Ad-hoc Reports Executive Reports High Max CPU 90% Application Pool Admin Pool
Resource Governor Martin Bell Carillon Software Systems Limited
New Date and Time data types Date Only From 1/1/0001 to 1/1/ bytes Date Time Only Optional precision up to 100 nanoseconds 3 to 5 bytes (default 5bytes ie full resolution) Time Timezone aware UTC datetime Optional Precision up to 100 nanoseconds 8 to 10 bytes (default 10bytes ie full resolution) DateTimeOffset Large Date Range Optional Precision up to 100 nanoseconds 6 to 8 bytes (default 8bytes ie full resolution) DateTime2 Plus assorted new date time functions eg SYSDATETIMEOFFSET()
Sparse Column Storage IDColumnValue 1Q1C 1Q21 1Q109 2Q1B 2Q34 2Q5Low 3Q1C 3Q76 3Q85 PKQ1Q2Q3Q4Q5Q6Q7Q8Q9Q10 1C19 2B4Low 3C EHighBlue 6C A2Red 9A36 Desired schema Typical Solution The problem −Need to store spare data −Possibly 100’s of columns −Typically only few % are populated
Sparse Columns “Sparse” as a storage attribute on a column −0 bytes for a NULL, 4 byte overhead for non-NULL −No change in Query/DML behavior −Same limitations as normal tables eg 1024 columns Wide Table -defining a “Sparse Column Set ” −An un-typed XML column, with a published format −Logical grouping for all sparse columns in a table −Select * returns all non-sparse-columns, sparse column set (XML) −Allows generic retrieval/update of all sparse columns as a set −30,000 sparse columns allowed in a table (2Gb), 1000 indexes // Sparse as a storage attibute in Create/Alter table statements Create Table Products(Id int, Type nvarchar(16)…, Resolution int SPARSE, ZoomLength int SPARSE); // Create a sparse column set Create Table Products(Id int, Type nvarchar(16)…, Resolution int SPARSE, ZoomLength int SPARSE, Properties XML COLUMN_SET FOR ALL_SPARSE_COLUMNS); // Sparse as a storage attibute in Create/Alter table statements Create Table Products(Id int, Type nvarchar(16)…, Resolution int SPARSE, ZoomLength int SPARSE); // Create a sparse column set Create Table Products(Id int, Type nvarchar(16)…, Resolution int SPARSE, ZoomLength int SPARSE, Properties XML COLUMN_SET FOR ALL_SPARSE_COLUMNS);
Filtered Indexes Filtered Indexes and Statistics −Indexing a portion of the data in a table −Filtered/co-related statistics creation and usage −Query/DML Optimization to use Filtered indexes and Statistics −Restricted to non-clustered indexes Benefits −Lower storage and maintenance costs for large number of indexes −Query/DML Performance Benefits: IO only for qualifying rows // Create a Filtered Indexes Create Index ZoomIdx on Products(ZoomLength) where Type = ‘Camera’; // Optimizer will pick the filtered index when query predicates match Select ProductId, Type, Resolution, ZoomLength where Type = ‘Camera’
Transparent Data Encryption External Key Management Data Auditing Pluggable CPU Transparent Failover for Database Mirroring Declarative Management Framework Server Group Management Streamlined Installation Enterprise System Management Performance Data Collection System Analysis Data Compression Query Optimization Modes Resource Governor Entity Data Model LINQ Visual Entity Designer Entity Aware Adapters SQL Server Change Tracking Synchronized Programming Model Visual Studio Support SQL Server Conflict Detection FILESTREAM data type Integrated Full Text Search Sparse Columns Large User Defined Types Date/Time Data Type LOCATION data type SPATIAL data type Virtual Earth Integration Partitioned Table Parallelism Query Optimizations Persistent Lookups Change Data Capture Backup Compression MERGE SQL Statement Data Profiling Star Join Enterprise Reporting Engine Internet Report Deployment Block Computations Scale out Analysis BI Platform Management Export to Word and Excel Author reports in Word and Excel Report Builder Enhancements TABLIX Rich Formatted Data Personalized Perspectives … and many more Microsoft SQL Server 2008
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.