Biml Recipes: Automatically Create T-SQL Scripts for Common Tasks Scott Currie @scottcurrie @bimlscript
Samples We Will Cover Generation of T-SQL merge statements that removes all the drudgery of manually mapping columns - including complex SCD column handling. Stale data detection that uses Biml to create queries that display ranges for all date/time columns in each table of a target database - perfect for retiring tables from WorkDB and other ad hoc environments. Sample data creation that automatically produces test data based on DDL schema information from your data model. And much more Assumption Testing (with bonus pivot tables) Incremental schema change detection
But First A Brief Biml Refresher What is Biml? Biml Syntax Biml API Linq Biml Utility Methods
What is Biml?
XML Syntax Biml is XML based, declarative Root element Uses elements, attributes to describe BI solution Root element Root element contains collections of root objects Connections, Tables, Dimensions, Facts, Cubes, Packages, etc. Individual objects are defined in these collections
BimlScript Coding <# Code Block #> <#+ Module Level Code Block #> <#= Inline Code Block #> Calls the .NET ToString() method on whatever expression is in the block Can’t be a statement, void, or null <#@ Directive #>
Language Biml is a homoiconic language Fully Documented Every language element corresponds directly to an object in the Biml API Model Objects can be programmatically accessed or modified through the Biml API Always In Sync: Changes through the API automatically update Biml code and vice versa Fully Documented Biml API Documentation http://varigence.com/Documentation/Language/Index Every page in the Biml Language Documentation has a link to the corresponding Biml API Type, e.g. http://www.varigence.com/Documentation/Language/Element/AstPackageNode
Typing Conventions All types that correspond to language elements are of the format Ast----Node Namespaces Varigence.Languages.Biml.* Special Nodes AstRootNode AstNamedNode AstNode
Language Integrated Query Allows flexible querying and transformation of sets in .NET languages Two ways to use LINQ SQL-like syntax from m in MyCollection where m.Name == “Test Combine extensions and lambda expressions MyCollection.Where(m=>m.Name == “Test”) If you are proficient in LINQ, you already have 80-90% of the .NET programming skills you will need to master BimlScript
SQL-like Syntax var tableNames = from t in RootNode.Tables where t.Schema != null && t.Schema.Name == “dbo” select t.Name foreach (var tableName in tableNames) { …. }
Extensions and Lambda Expressions foreach (var tableName in RootNode.Tables .Where(t => t.Schema != null && t.Schema.Name == “dbo”) .Select(t => t.Name)) { …. }
LINQ Resources LINQ Portal 101 LINQ Samples LINQPad LINQ Cheat Sheets http://msdn.microsoft.com/en-us/library/dd264799.aspx 101 LINQ Samples http://code.msdn.microsoft.com/101-LINQ-Samples-3fb9811b LINQPad http://www.linqpad.net/ LINQ Cheat Sheets http://download.damieng.com/dotnet/LINQToSQLCheatSheet.pdf http://aspnetresources.com/downloads/linq_standard_query_operators.pdf
Examples of Utility Methods Concatenate a collection RootNode.Tables.Collapse(table => table.Name, “|") Get Biml representation of a node or collection Table.GetBiml() RootNode.Tables.GetBiml() Schema Qualified Name Table.SchemaQualifiedName Get a list of columns Table.GetColumnList(column => column.IsUsedInKey, “sales”, “[“, “]”); Get DDL for table creation Table.GetTableSql() Table.GetDropAndCreateDdl()
Let’s Get To The Recipes I’m hungry for code! Let’s Get To The Recipes
Stale Data Detection: Scenario Your org is retiring a few old data marts and replacing them with a unified data mart. Documented requirements are being implemented. BUT, there is a WorkDB that analysts have been using against the old data marts for the past 10 years. Some of those tables are unused. Others are undocumented components of business critical “shadow applications.” The WorkDB is not configured to track any login information whatsoever. How do you eliminate unused tables so you can document what remains?
Stale Data Detection: A Partial Solution Inspect the schema of all tables. Identify all columns with a datetime type for each table. Write a query for each table that finds the latest datetime value stored in any of its datetime columns. Put the results into a spreadsheet. Find those that contain “stale” data. Publish that list to stakeholders with a deadline – after which the tables are dropped*. * In reality, you would probably move those tables to a locked down schema so that they could be restored, if something important breaks.
Stale Data Detection: Implementation Options Lots of manual effort Data Profiling Tools Heavy duty dynamic SQL Biml <# foreach (var table in RootNode.OleDbConnections["Source"].GenerateTableNodes()) { #> <# var dateColumns = table.Columns.Where(item => item.DataType == DbType.DateTime || item.DataType == DbType.DateTime2).Select(item => "SELECT MAX(" + item.QualifiedName + ") AS Foo FROM " + table.SchemaQualifiedName); #> <# var query = "SELECT MAX(Foo) As MaxData FROM (" + string.Join(" UNION ALL ", dateColumns) + ") AS a"; #> <#=table.Name#>,<#if (dateColumns.Any()) { #><#=ExternalDataAccess.GetDataTable(RootNode.OleDbConnections["Source"].ConnectionString, query).Rows[0][0]#><# } else { #>NULL<# } #> <# } #>
Demo Stale Data Detection
Merge Statements: Scenario You have to write a bunch of merge statements to move data transient to persisted staging process tables to target tables etc. You are tired of typing the same column names over and over again
Merge Statements: Solution Read the schema of the source and target tables (if different) Autogenerate the repetitive parts of the merge statement BONUS: Access a metadata source that identifies SCD Type 2 columns for special handling
Merge Statements: Implementation Options Lots of manual effort Heavy duty dynamic SQL Biml Merge pseudo-Task Merge T-SQL
Merge statement auto-generation Demo Merge statement auto-generation
Sample Data Creation: Scenario You just created an empty schema It would nice to have some data in there for testing purposes
Sample Data Creation: Solution Read the schema of the target database Identify column types (and potentially other metadata) Use that metadata to create randomized values suitable for the column Create an INSERT statement that adds the sample data to the table
Sample Data Creation: Implementation Options Lots of manual effort Data Generator Tool Heavy duty dynamic SQL or .NET programming Biml
Demo Sample Data Creation
Incremental Deployment: Scenario You have made changes in the Dev environment that need to be deployed to Test or Prod The changes were not made via scripts that you can easily run Changes came from autogenerated schema Changes were made by a third party Your org doesn’t use a migration process
Incremental Deployment: Solution Read the schema of the source and target environment Identify the differences For each difference, create the matching DDL to change the schema Run the generated DDL
Incremental Deployment: Implementation Options Lots of manual effort Schema compare tool Heavy duty dynamic SQL or .NET coding Biml
Incremental Deployment Demo Incremental Deployment
Assumption Testing: Scenario Your organization has purchased a new “book of business” Its data is stored in a different finance system than you use You have migrated the data into your existing system Verify that the migration didn’t break anything If there are issues, identify them
Assumption Testing: Solution Create metadata that encodes analyst rules that define correct behavior Run queries representing those rules against the new and old systems – noting that the logic and data mappings may need significant modification Load the results into your favorite data analysis tool (Excel) Find any mismatches and confer with Data Governance
Assumption Testing: Implementation Options Lots of manual effort Data comparison tool Heavy duty dynamic SQL or .NET coding Biml
Demo Assumption Testing