Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager Microsoft Corporation.

Slides:



Advertisements
Similar presentations
Tips and Tricks for Building Rich Reports in SQL Server 2012 Reporting Services Bob Meyers Senior Program Manager Microsoft Corporation DBI307.
Advertisements

Kevin Donovan Program Manager, Office BI Microsoft Corporation
4/14/ :52 PM DBI405 Troubleshooting SQL Server Spatial Query Performance: A Deep Dive into Spatial Indexing Michael Rys Principal Program Manager.
Windows Azure SQL Reporting Dany Hoter Senior Program Manager Microsoft Corporation Ola Lavi Software Development Engineer Microsoft Corporation.
Building the Fastest SQL Servers Brent Ozar Microsoft Certified Solutions Master (MCSM) Brent Ozar PLF, LLC DBI328.
1 Lecture 12: XQuery in SQL Server Monday, October 23, 2006.
XML Data in MS SQL Server Query and Modification Steven Blundy, Duc Duong, Abhishek Mukherji, Bartlett Shappee CS561.
Module 9 Designing an XML Strategy. Module 9: Designing an XML Strategy Designing XML Storage Designing a Data Conversion Strategy Designing an XML Query.
Exploring SQL Server Data Tier Applications Gert Drapers Principal Group Program Manager Microsoft Corporation Adam Mahood Program Manager.
SQL Server AlwaysOn: Active Secondaries Luis Vargas Program Manager Microsoft Corporation DBI312.
Business Continuity Solutions for SQL Database* applications on Windows Azure Alexander (Sasha) Nosov Principal Program Manager Microsoft.
Enterprise Information Management (EIM): Bringing Together SSIS, DQS, and MDS Matt Masson Senior Program Manager Microsoft Corporation Matthew Roche Senior.
Practical Uses and Optimization of New T-SQL Features in Microsoft SQL Server 2012 Tobias Ternstrom DBI308.
XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Denny Cherry twitter.com/mrdenny.
SQL Azure Administration and Application Self-Servicing Michal Lesiczka Program Manager Microsoft Corporation Vinod Jagannathan Program Manager Microsoft.
4/19/2017 7:47 PM DBI311 Microsoft SQL Server Data Tools: Database Development from Zero to Sixty Gert Drapers Principal Group Program Manager.
Configuring Kerberos for Microsoft SharePoint 2010 BI in 7 Steps (SQL Server 2012) Chuck Heinzelman Senior Program Manager – BPD CX Microsoft Corporation.
Module 17 Storing XML Data in SQL Server® 2008 R2.
DAT319 XML In The Database The XML Data Type In SQL Server 2005 (Code Named "Yukon") Michael Rys Program Manager SQL Server XML Technologies Microsoft.
Using XML in SQL Server 2005 NameTitleCompany. XML Overview Business Opportunity The majority of all data transmitted electronically between organizations.
DBA Developer. Responsibilities  Designing Relational databases  Developing interface layer Environment Microsoft SQL Server,.NET SQL Layer: Stored.
SQL Server 2005: Deep Dive On XML And XQuery Michael Rys DAT405 Program Manager, SQL Server XML Technologies Microsoft Corporation.
Taking Microsoft SQL Server into the World of Spatial Data Management Milan Stojic, Michael Rys Program Managers
The Dirty Dozen: Windows PowerShell Scripts for the Busy DBA Ike Ellis.
Sofia, Bulgaria | 9-10 October Using XQuery to Query and Manipulate XML Data Stephen Forte CTO, Corzen Inc Microsoft Regional Director NY/NJ (USA) Stephen.
IBM Almaden Research Center © 2006 IBM Corporation On the Path to Efficient XML Queries Andrey Balmin, Kevin Beyer, Fatma Özcan IBM Almaden Research Center.
 Michael Rys Principal Lead Program Manager Microsoft Corporation BB16.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Taking Microsoft SQL Server into the World of Spatial Data Management Michael Rys Principal Program Manager Microsoft DBI324.
Winter 2006Keller Ullman Cushing8–1 Turning in Assignments Please turn in hard copy (use only in the direst of circumstances). I am not your secretary.
XQL, OQL and SQL Xia Tang Sixin Qian Shijun Shen Feb 18, 2000.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
DAT 379 XML Today And Tomorrow Mark Fussell Lead Program Manager Microsoft Corporation.
5/24/01 Leveraging SQL Server 2000 in ColdFusion Applications December 9, 2003 Chris Lomvardias SRA International
Table Indexing for the.NET Developer Denny Cherry twitter.com/mrdenny.
Module 18 Querying XML Data in SQL Server® 2008 R2.
Denny Cherry twitter.com/mrdenny.
SQL SERVER DAYS 2011 Table Indexing for the.NET Developer Denny Cherry twitter.com/mrdenny.
Tips and Tricks: Effectively Manage Your SharePoint Farm with BI Kevin Donovan Program Manager Microsoft Corporation DBI306.
SQL Server 2005: Extending the Type System with XML.
Session 1 Module 1: Introduction to Data Integrity
SQL Server 2005 XML Datatype David Wilson Ohio North SQL Server Special Interest Group July 12, 2007.
Microsoft SQL Server Data Tools: Database Development from Zero to Sixty Gert Drapers Principal Group Program Manager Microsoft Corporation.
Module 3: Using XML. Overview Retrieving XML by Using FOR XML Shredding XML by Using OPENXML Introducing XQuery Using the xml Data Type.
+1 (425) Business Continuity Solutions for SQL Database* applications in Windows Azure Alexander (Sasha) Nosov Principal Program Manager Microsoft.
Building the Fastest SQL Servers Brent Ozar Microsoft Certified Solutions Master (MCSM) Brent Ozar PLF, LLC DBI328.
Microsoft Confidential Jon Jahren Produktsjef Applikasjonsplattform Microsoft.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.
Execution Plans Detail From Zero to Hero İsmail Adar.
Integrating SQL Server FileTables, Property Search, and FTS/Semantic Search Bob Beauchemin Developer Skills Partner SQLskills.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
DAT318 - Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager SQL Server.
Riccardo Muti Microsoft Corporation
Using XML in SQL Server and Azure SQL Database
Running Reporting Services in SharePoint Integrated Mode: How and Why
Enriching your BI Semantic Models with Data Analysis Expressions (DAX)
12/9/2018 6:15 AM © 2004 Microsoft Corporation. All rights reserved.
Introduction to Database Systems CSE 444 Lecture 12 More Xquery and Xquery in SQL Server April 25, 2008.
TechEd /18/2019 2:43 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Building the Perfect BI Semantic Model for Power View
Building Self-Service BI Applications Using PowerPivot
Enriching your BI Semantic Models with Data Analysis Expressions (DAX)
Running Reporting Services in SharePoint Integrated Mode: How and Why
Lecture 12: XQuery in SQL Server
Introduction to Database Systems CSE 444 Lecture 12 Xquery in SQL Server October 22, 2007.
XML? What’s this doing in my database? Adam Koehler
Presentation transcript:

Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager Microsoft DBI404

XML Scenarios and when to store XML XML Design Optimizations General Optimizations XML Datatype method Optimizations XQuery Optimizations XML Index Optimizations Futures: Selective XML Index Preview Interesting Query Optimization Patterns

→Transport, Store, and Query XML data

Does the data fit the relational model? Is the data semi- structured? Is the data a document? Query into the XML? Search within the XML? Is the XML constrained by schemas? Shred the XML into relations Shred the structured XML into relations, store semistructured aspects as XML and/or sparse col Define a full-text index Use primary and secondary XML indexes as needed Constrain XML if validation cost is ok Store as XMLStore as varbinary(max) No Open schema Yes Promote frequently queried properties relationally Future: Selective XML Index structured Shred known sparse data into sparse columns Known sparse

XML Parser XML Validation XML data type (binary XML) Schema Collection XML Relational XML Schemata OpenXML/nodes() FOR XML with TYPE directive Rowsets XQuery XML-DML Node Table PATH Index PROP Index VALUE Index PRIMARY XML INDEX XQuery Wide Sparse Table SELECTIVE XML INDEX Secondary Indices

Map SQL value and type into XQuery values and types in context of XQuery or XML-DML sql:variable(): accesses a SQL variable/parameter int select * from T where sql:column(): accesses another column value tables: T(key int, x xml), S(key int, val int) select * from T join S on T.key=S.key where Restrictions in SQL Server: No XML, CLR UDT, datetime, or deprecated text/ntext/image

Improving Slow XQueries, Bad FOR XML

BAD: CAST( CAST(xmldoc.query('/a/b/text()') as nvarchar(500)) as int) GOOD: xmldoc.value('(/a/b/text())[1]', 'int') BAD: 'nvarchar(50)') GOOD: 'nvarchar(50)')

Use exist() method, sql:column()/sql:variable() and an XQuery comparison for checking for a value or joining if secondary XML indices on PXI is present(*) BAD: select doc from doc_tab join authors on doc.value('(/doc/mainauthor/lname/text())[1]', 'nvarchar(50)') = lastname GOOD: select doc from doc_tab join authors on 1 = doc.exist('/doc/mainauthor/lname/text()[. = sql:column("lastname")]') (*) otherwise, value() method is most of the time more efficient

nodes() without XML index is a Table-valued function (details later) Bad cardinality estimates can lead to bad plans BAD: select 'int') as CustID, 'nvarchar(50)') as CName from as N(c) where Customer.ID = 'int') BETTER (if only one wrapper doc element): select 'int') as CustID, 'nvarchar(50)') as CName from as D(d) cross apply d.nodes('customer') as N(c) where Customer.ID = 'int') Use temp table (insert into #temp select … from nodes()) or Table-valued parameter instead of XML to get better estimates

Use subqueries BAD: SELECT CASE isnumeric (doc.value( '(/doc/customer/order/price)[1]', 'nvarchar(32)')) WHEN 1 THEN doc.value( '(/doc/customer/order/price)[1]', 'decimal(5,2)') ELSE 0 END FROM T GOOD: SELECT CASE isnumeric (Price) WHEN 1 THEN CAST(Price as decimal(5,2)) ELSE 0 END FROM (SELECT doc.value( '(/doc/customer/order/price)[1]', 'nvarchar(32)')) as Price FROM T) X Use subqueries also with NULLIF()

XQuery Parser Static Typing Algebrization XML Schema Collection Metadata Static Phase Runtime Optimization and Execution of physical Op Tree Dynamic Phase XML and rel. Indices Static Optimization of combined Logical and Physical Operation Tree SQL Parser Algebrization Static Typing SELECT x.query('…'), y FROM T WHERE …

IDTAG IDNodeType-IDVALUEHID (TITLE)Element2 (xs:string)Bad Bugs#title#section#book XMLReader node table format example (simplified)

Serializer UDX serializes the query result as XML XQuery String UDX evaluates the XQuery string() function XQuery Data UDX evaluates the XQuery data() function Check UDX validates XML being inserted UDX name visible in SSMS properties window

Value comparisons, XQuery casts and value() method casts require atomization of item attribute: = 42]  = 42] Atomic typed element: /person[age = 42]  /person[data(age) = 42] Untyped, mixed content typed element (adds UDX): /person[age = 42]  /person[data(age) = 42]  /person[string(age) = 42] If only one text node for untyped element (better): /person[age/text() = 42]  /person[data(age/text()) = 42] value() method on untyped elements: value('/person/age', 'int')  value('/person/age/text()', 'int') String() aggregates all text nodes, prohibits index use

Value comparisons require casts and type promotion Untyped attribute: = 42]  = 42] Untyped text node(): /person[age/text() = 42]  /person[xs:decimal(age/text()) = 42] Typed element (typed as xs:int): /person[salary = 3e4]  /person[xs:double(salary) = 3e4] Casting is expensive and prohibits index lookup Tips to avoid casting Use appropriate types for comparison (string for untyped) Use schema to declare type

Single paths are more efficient than twig paths Avoid predicates in the middle of path expressions = " "]/author[first-name = "Davis"]  = " "] "∩" /book/author[first-name = "Davis"] Move ordinals to the end of path expressions Make sure you get the same semantics! /a[1]/b[1] ≠ (/a/b)[1] ≠ /a/b[1] is better than

Use context item in predicate to lengthen path in exist() Existential quantification makes returned node irrelevant BAD: SELECT * FROM docs WHERE 1 = xCol.exist ('/book/subject[text() = "security"]') GOOD: SELECT * FROM docs WHERE 1 = xCol.exist ('/book/subject/text()[. = "security"]') BAD: SELECT * FROM docs WHERE 1 = xCol.exist > 9.99 < 49.99]') GOOD: SELECT * FROM docs WHERE 1 = xCol.exist > 9.99 and. < 49.99]') This does not work with or-predicate

Most frequent offender: parent axis with nodes() BAD: select 'int') as CustID, 'int') as OrdID from T cross apply x.nodes('/doc/customer/orders') as N(o) GOOD: select 'int') as CustID, 'int') as OrdID from T cross apply x.nodes('/doc/customer') as N1(c) cross apply c.nodes('orders') as N2(o)

Avoid descendant axes and // in the middle of path expressions if the data structure is known. // still can use the HID lookup, but is less efficient XQuery construction performs worse than FOR XML BAD: SELECT notes.query(' { {sql:column("name")}, / } ') FROM Customers WHERE cid=1 GOOD: SELECT cid as name, notes as "*" FROM Customers WHERE cid=1 FOR XML PATH('Customer'), TYPE

Create XML index on XML column CREATE PRIMARY XML INDEX idx_1 ON docs (xDoc) Create secondary indexes on tags, values, paths Creation: Single-threaded only for primary XML index Multi-threaded for secondary XML indexes Uses: Primary Index will always be used if defined (not a cost based decision) Results can be served directly from index SQL’s cost based optimizer will consider secondary indexes Maintenance: Primary and Secondary Indices will be efficiently maintained during updates Only subtree that changes will be updated No online index rebuild  Clustered key may lead to non-linear maintenance cost  Schema revalidation still checks whole instance

insert into Person values (42, ' Bad Bugs Nobody loves bad bugs. Tree Frogs All right-thinking people love tree frogs. ')

CREATE PRIMARY XML INDEX PersonIdx ON Person (Pdesc) Assumes typed data; Columns and Values are simplified, see VLDB 2004 paper for details PKXIDTAG IDNodeType-IDVALUEHID 4211 (book)Element1 (bookT)null#book (ISBN)Attribute (section)Element3 (sectionT)null#section#book (TITLE)Element2 (xs:string)Bad Bugs#title#section#book Text--Nobody loves bad bugs. #text()#section#book (section)Element3 (sectionT)null#section#book (title)Element2 (xs:string)Tree frogs#title#section#book Text--All right-thinking people #text()#section#book (bold)Element4 (boldT)love#bold#section#book Text--tree frogs#text()#section#book

PKXIDNIDTIDVALUELVALUEHIDxsinil… idx 1Binary XML 2 3 XML Column in table T(id, x) Primary XML Index (1 per XML column) Clustered on Primary Key (of table T), XID Non-clustered Secondary Indices (n per primary Index) Value Index Path Index Property Index

XQueries And XML Indices

SELECT * FROM docs WHERE 1 = xCol.exist(' /book/section/title/text()[contains(.,"optimization")] ') SELECT * FROM docs WHERE contains(xCol, 'optimization') AND 1 = xCol.exist(' /book/section/title/text()[contains(.,"optimization")] ')

Futures Selective XML Indices

Demo Selective XML Index

Customer SegmentSize reduction Perf improvement Banking~17x~11-28x Life science~3x~2-13x Insurance~190x~ x

Session Takeaways

Optimization whitepapers Newsgroups & Forum: microsoft.public.sqlserver.xml Find Me Later At… My My Weblog:

Q&A

Michael Rys

#msTechEd mva Microsoft Virtual Academy SQL Server 2012 Eval Copy Get Certified! Hands-On Labs

Connect. Share. Discuss. Learning Microsoft Certification & Training Resources TechNet Resources for IT Professionals Resources for Developers

Evaluations Submit your evals online