Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager Microsoft Corporation.

Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager Microsoft Corporation @SQLServerMike DBI404

XML Scenarios and when to store XML XML Design Optimizations General Optimizations XML Datatype method Optimizations XQuery Optimizations XML Index Optimizations Futures: Selective XML Index Preview Interesting Query Optimization Patterns

→Transport, Store, and Query XML data

Does the data fit the relational model? Is the data semistructured? Is the data a document? Query into the XML? Search within the XML? Is the XML constrained by schemas? Shred the XML into relations Shred the structured XML into relations, store semistructured aspects as XML and/or sparse col Define a full-text index Use primary and secondary XML indexes as needed Constrain XML if validation cost is ok Store as XMLStore as varbinary(max) No Open schema Yes Promote frequently queried properties relationally Future: Selective XML Index structured Shred known sparse data into sparse columns Known sparse

XML Parser XML Validation XML data type (binary XML) Schema Collection XML Relational XML Schemata OpenXML/nodes() FOR XML with TYPE directive Rowsets XQuery XML-DML Node Table PATH Index PROP Index VALUE Index PRIMARY XML INDEX XQuery Wide Sparse Table SELECTIVE XML INDEX Secondary Indices

Map SQL value and type into XQuery values and types in context of XQuery or XML-DML sql:variable(): accesses a SQL variable/parameter declare @value int set @value=42 select * from T where T.x.exist('/a/b[@id=sql:variable("@value")]')=1 sql:column(): accesses another column value tables: T(key int, x xml), S(key int, val int) select * from T join S on T.key=S.key where T.x.exist('/a/b[@id=sql:column("S.val")]')=1 Restrictions in SQL Server: No XML, CLR UDT, datetime, or deprecated text/ntext/image

Improving Slow XQueries, Bad FOR XML

BAD: CAST( CAST(xmldoc.query('/a/b/text()') as nvarchar(500)) as int) GOOD: xmldoc.value('(/a/b/text())[1]', 'int') BAD: node.query('.').value('@attr', 'nvarchar(50)') GOOD: node.value('@attr', 'nvarchar(50)')

Use exist() method, sql:column()/sql:variable() and an XQuery comparison for checking for a value or joining if secondary XML indices on PXI is present(*) BAD: select doc from doc_tab join authors on doc.value('(/doc/mainauthor/lname/text())[1]', 'nvarchar(50)') = lastname GOOD: select doc from doc_tab join authors on 1 = doc.exist('/doc/mainauthor/lname/text()[. = sql:column("lastname")]') (*) otherwise, value() method is most of the time more efficient

nodes() without XML index is a Table-valued function (details later) Bad cardinality estimates can lead to bad plans BAD: select c.value('@id', 'int') as CustID, c.value('@name', 'nvarchar(50)') as CName from Customer, @x.nodes('/doc/customer') as N(c) where Customer.ID = c.value('@id', 'int') BETTER (if only one wrapper doc element): select c.value('@id', 'int') as CustID, c.value('@name', 'nvarchar(50)') as CName from Customer, @x.nodes('/doc[1]') as D(d) cross apply d.nodes('customer') as N(c) where Customer.ID = c.value('@id', 'int') Use temp table (insert into #temp select … from nodes()) or Table-valued parameter instead of XML to get better estimates

Use subqueries BAD: SELECT CASE isnumeric (doc.value( '(/doc/customer/order/price)[1]', 'nvarchar(32)')) WHEN 1 THEN doc.value( '(/doc/customer/order/price)[1]', 'decimal(5,2)') ELSE 0 END FROM T GOOD: SELECT CASE isnumeric (Price) WHEN 1 THEN CAST(Price as decimal(5,2)) ELSE 0 END FROM (SELECT doc.value( '(/doc/customer/order/price)[1]', 'nvarchar(32)')) as Price FROM T) X Use subqueries also with NULLIF()

XQuery Parser Static Typing Algebrization XML Schema Collection Metadata Static Phase Runtime Optimization and Execution of physical Op Tree Dynamic Phase XML and rel. Indices Static Optimization of combined Logical and Physical Operation Tree SQL Parser Algebrization Static Typing SELECT x.query('…'), y FROM T WHERE …

IDTAG IDNodeType-IDVALUEHID 1.3.14 (TITLE)Element2 (xs:string)Bad Bugs#title#section#book XMLReader node table format example (simplified)

Serializer UDX serializes the query result as XML XQuery String UDX evaluates the XQuery string() function XQuery Data UDX evaluates the XQuery data() function Check UDX validates XML being inserted UDX name visible in SSMS properties window

Value comparisons, XQuery casts and value() method casts require atomization of item attribute: /person[@age = 42]  /person[data(@age) = 42] Atomic typed element: /person[age = 42]  /person[data(age) = 42] Untyped, mixed content typed element (adds UDX): /person[age = 42]  /person[data(age) = 42]  /person[string(age) = 42] If only one text node for untyped element (better): /person[age/text() = 42]  /person[data(age/text()) = 42] value() method on untyped elements: value('/person/age', 'int')  value('/person/age/text()', 'int') String() aggregates all text nodes, prohibits index use

Value comparisons require casts and type promotion Untyped attribute: /person[@age = 42]  /person[xs:decimal(@age) = 42] Untyped text node(): /person[age/text() = 42]  /person[xs:decimal(age/text()) = 42] Typed element (typed as xs:int): /person[salary = 3e4]  /person[xs:double(salary) = 3e4] Casting is expensive and prohibits index lookup Tips to avoid casting Use appropriate types for comparison (string for untyped) Use schema to declare type

Single paths are more efficient than twig paths Avoid predicates in the middle of path expressions book[@ISBN = "1-8610-0157-6"]/author[first-name = "Davis"]  /book[@ISBN = "1-8610-0157-6"] "∩" /book/author[first-name = "Davis"] Move ordinals to the end of path expressions Make sure you get the same semantics! /a[1]/b[1] ≠ (/a/b)[1] ≠ /a/b[1] (/book/@isbn)[1] is better than /book[1]/@isbn

Use context item in predicate to lengthen path in exist() Existential quantification makes returned node irrelevant BAD: SELECT * FROM docs WHERE 1 = xCol.exist ('/book/subject[text() = "security"]') GOOD: SELECT * FROM docs WHERE 1 = xCol.exist ('/book/subject/text()[. = "security"]') BAD: SELECT * FROM docs WHERE 1 = xCol.exist ('/book[@price > 9.99 and @price < 49.99]') GOOD: SELECT * FROM docs WHERE 1 = xCol.exist ('/book/@price[. > 9.99 and. < 49.99]') This does not work with or-predicate

Most frequent offender: parent axis with nodes() BAD: select o.value('../@id', 'int') as CustID, o.value('@id', 'int') as OrdID from T cross apply x.nodes('/doc/customer/orders') as N(o) GOOD: select c.value('@id', 'int') as CustID, o.value('@id', 'int') as OrdID from T cross apply x.nodes('/doc/customer') as N1(c) cross apply c.nodes('orders') as N2(o)

Avoid descendant axes and // in the middle of path expressions if the data structure is known. // still can use the HID lookup, but is less efficient XQuery construction performs worse than FOR XML BAD: SELECT notes.query(' { {sql:column("name")}, / } ') FROM Customers WHERE cid=1 GOOD: SELECT cid as "@cid", name, notes as "*" FROM Customers WHERE cid=1 FOR XML PATH('Customer'), TYPE

Create XML index on XML column CREATE PRIMARY XML INDEX idx_1 ON docs (xDoc) Create secondary indexes on tags, values, paths Creation: Single-threaded only for primary XML index Multi-threaded for secondary XML indexes Uses: Primary Index will always be used if defined (not a cost based decision) Results can be served directly from index SQL’s cost based optimizer will consider secondary indexes Maintenance: Primary and Secondary Indices will be efficiently maintained during updates Only subtree that changes will be updated No online index rebuild  Clustered key may lead to non-linear maintenance cost  Schema revalidation still checks whole instance

insert into Person values (42, ' Bad Bugs Nobody loves bad bugs. Tree Frogs All right-thinking people love tree frogs. ')

CREATE PRIMARY XML INDEX PersonIdx ON Person (Pdesc) Assumes typed data; Columns and Values are simplified, see VLDB 2004 paper for details PKXIDTAG IDNodeType-IDVALUEHID 4211 (book)Element1 (bookT)null#book 421.12 (ISBN)Attribute2 (xs:string)1-55860-438-3#@ISBN#book 421.33 (section)Element3 (sectionT)null#section#book 421.3.14 (TITLE)Element2 (xs:string)Bad Bugs#title#section#book 421.3.3--Text--Nobody loves bad bugs. #text()#section#book 421.53 (section)Element3 (sectionT)null#section#book 421.5.14 (title)Element2 (xs:string)Tree frogs#title#section#book 421.5.3--Text--All right-thinking people #text()#section#book 421.5.57 (bold)Element4 (boldT)love#bold#section#book 421.5.7--Text--tree frogs#text()#section#book

PKXIDNIDTIDVALUELVALUEHIDxsinil… 1 1 1 2 2 2 3 3 3 idx 1Binary XML 2 3 XML Column in table T(id, x) Primary XML Index (1 per XML column) Clustered on Primary Key (of table T), XID Non-clustered Secondary Indices (n per primary Index) Value Index Path Index Property Index 3 1 2 1 2 4 3 3 1 2

XQueries And XML Indices

SELECT * FROM docs WHERE 1 = xCol.exist(' /book/section/title/text()[contains(.,"optimization")] ') SELECT * FROM docs WHERE contains(xCol, 'optimization') AND 1 = xCol.exist(' /book/section/title/text()[contains(.,"optimization")] ')

Futures Selective XML Indices

Demo Selective XML Index

Customer SegmentSize reduction Perf improvement Banking~17x~11-28x Life science~3x~2-13x Insurance~190x~100-200x

Session Takeaways

Optimization whitepapers http://msdn2.microsoft.com/en-us/library/ms345118.aspx http://msdn2.microsoft.com/en-us/library/ms345121.aspx Newsgroups & Forum: microsoft.public.sqlserver.xml http://social.msdn.microsoft.com/Forums/en-US/sqlxml/threads Find Me Later At… My E-mail: mrys@microsoft.com My Weblog: http://sqlblog.com/blogs/michael_rys/ Twitter: @SQLServerMike

Michael Rys mrys@microsoft.com http://sqlblog.com/blo gs/michael_rys/ @SQLServerMike

@sqlserver @TechEd_NA #msTechEd mva Microsoft Virtual Academy SQL Server 2012 Eval Copy Get Certified! Hands-On Labs

Connect. Share. Discuss. http://europe.msteched.com Learning Microsoft Certification & Training Resources www.microsoft.com/learning TechNet Resources for IT Professionals http://microsoft.com/technet Resources for Developers http://microsoft.com/msdn

Evaluations http://europe.msteched.com/sessions Submit your evals online

Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager Microsoft Corporation.

Similar presentations

Presentation on theme: "Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager Microsoft Corporation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager Microsoft Corporation.

Similar presentations

Presentation on theme: "Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager Microsoft Corporation."— Presentation transcript:

Similar presentations

About project

Feedback