DAT318 - Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager SQL Server.

DAT318 - Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager SQL Server Engine, Microsoft http://sqlblog.com/blogs/michael_rys/

Session Prerequisites An understanding of the XML and XQuery support in SQL Server 2005 A good way to prepare is to attend session DAT311 - Best Practices for Optimizing XML Queries in SQL Server 2005 and Beyond DAT311 - Best Practices for Optimizing XML Queries in SQL Server 2005 and Beyond

XML Scenarios and when to store XML XML Design Optimizations General Optimizations XML Datatype method Optimizations XQuery Optimizations XML Index Optimizations Session Objectives And Agenda

XML Scenarios Data Exchange between loosely-coupled systems XML is ubiquitous, extensible, platform independent transport format Message Envelope in XML: Simple Object Access Protocol (SOAP), RSS, REST Message Payload/Business Data in XML Vertical Industry Exchange schemas Document Management XHTML, DocBook, Home-grown, domain-specific markup (e.g. contracts), OpenOffice, Microsoft Office XML (both default and user-extended) Ad-hoc modeling of semistructured data Storing and querying heterogeneous complex objects Semistructured data with sparse, highly-varying structure at the instance level XML provides self-describing format and extensible schemas →Transport, Store, and Query XML data

XML or Relational? Data CharacteristicsXMLRelational Flat Structured Data Hierarchical Structured Data Not First Class: PK- FK with cascading delete Semi-structured Data Not First Class Mark-up Data Not First Class: FTS Order preservation Not First Class Recursion (Recursive query)

XML and Relational! ScenariosXMLRelational Relational Data Exchange Use as transport, shred to relational Storage and Query Document Management Use as markup, store natively Provides framework to manage collections and relationships; provides Full-text search Semi-structured Data Represent semistructured parts Represent structured parts Message audit Store natively Used for querying over promoted properties Object serialization Store natively Used for querying over promoted properties

Decision tree: Processing XML in Database Does the data fit the relational model? Is the data semistructured? Is the data a document? Query into the XML? Search within the XML? Is the XML constrained by schemas? Shred the XML into relations Shred the structured XML into relations, store semistructured aspects as XML Define a full-text index Use primary and secondary XML indexes as needed Constrain XML if validation cost is ok Store as XML Store as varbinary(max) No semistructured Yes Promote frequently queried properties relationally structured

Shredding Approaches ApproachComplex Shapes BulkloadServer vs Midtier Business logic ProgrammingScale/ Performance SQLXML Bulkload with annotated schema Yes with limits Yesmidtierstaging tables on server, XSLT on midtier annotated XSD and small API very good/ very good ADO.Net DataSet No midtiermidtier, SSIS DataSet API or SSIS good/good CLR Table- valued function YesNoServer or midtier C#, VB custom code limited/good OpenXMLYesNoServerT-SQLdeclarative T- SQL, XPath against variable limited/good nodes()YesNoServerT-SQLdeclarative SQL, XQuery against var or table good/careful

SQL Server 2005 XML Data Type Architecture XML Parser XML Validation XML data type (binary XML) Schema Collection XML Relational XML Schemata OpenXML/nodes() FOR XML with TYPE directive Rowsets XQuery XML-DML Node Table PATH Index PROP Index VALUE Index PRIMARY XML INDEX XQuery

General Impacts Concurrency Control Locks on both XML data type and relevant rows in primary and secondary XML Indices Lock escalation on indices Snapshot Isolation reduces locks and lock contention Transaction Logs Bulkinsert into XML Indices may fill transaction log delay the creation of the XML indexes and use the SIMPLE recovery model Preallocate database file instead of dynamically growing Place log on different disk In-Row/Out-of-Row of XML large object Moving XML into side table or out-of-row if mixed with relational data reduces scan time. Due to clustering, insertion into XML Index may not be linear Chose integer/bigint identity column as key

Choose the right XML Model Element-centric vs attribute-centric Joe +: Attributes  often better performing querying –: Parsing Attributes  uniqueness check Generic element names with type attribute vs Specific element names Joe Joe +: Specific names  shorter path expressions +: Specific names  no filter on type attribute Wrapper elements +: No wrapper elements  simpler XML, shorter path expressions

Use an XML Schema Collection? Using no XML Schema (untyped XML) Can still use XQuery and XML Index!!! Atomic values are always weakly typed strings  compare as strings to avoid runtime conversions and loss of index usage No schema validation overhead No schema evolution revalidation costs XML Schema provides structural information Atomic typed elements are now using only one instead of two rows in node table/XML index (closer to attributes) Static typing can detect cardinality and feasibility of expression XML Schema provides semantic information Elements/attributes have correct atomic type for comparison and order semantics No runtime casts required and better use of index for value lookup

To Promote or Not Promote… Promotion precalculates paths Requires relational query XQuery does not know about promotion Promotion during loading of the data Using any of the shredding mechanisms 1-to-1 or 1-to-many relationshi ps Promotion using computed columns 1-to-1 only Persist computed column: Fast lookup and retrieval Relational index on persisted computed column: Fast lookup Promotion using Triggers 1-to-1 or 1-to-many relationships Trigger overhead Relational View over XML data Filters on relational view are not pushed down due to different type/value system

XQuery Methods query() creates new, untyped XML data type instance exist() returns 1 if the XQuery expression returns at least one item, 0 otherwise value() extracts an XQuery value into the SQL value and type space Expression has to statically be a singleton String value of atomized XQuery item is cast to SQL type SQL type has to be SQL scalar type (no XML or CLR UDT)

XQuery: nodes() Provides OpenXML-like functionality on XML data type column in SQL Server 2005 Returns a row per selected node Each row contains a special XML data type instance that: References one of the selected nodes Preserves the original structure and types Can only be used with the XQuery methods (but not modify()), count(*), and IS (NOT) NULL Appears as Table-valued Function (TVF) if no index present

sql:column()/sql:variable() Map SQL value and type into XQuery values and types in context of XQuery or XML-DML sql:variable(): accesses a SQL variable/parameter declare @value int set @value=42 select * from T where T.x.exist(‘/a/b[@id=sql:variable(“@value”)]’)=1 sql:column(): accesses another column value tables: T(key int, x xml), S(key int, val int) select * from T join S on T.key=S.key where T.x.exist(‘/a/b[@id=sql:column(“S.val”)]’)=1 Restrictions in SQL Server 2005: No XML, CLR UDT, datetime, or deprecated text/ntext/image

Demo Improving Slow XQueries, bad FOR XML

Optimal Use of Methods How to Cast from XML to SQL BAD: CAST( CAST(xmldoc.query('/a/b/text()') as nvarchar(500)) as int) GOOD: xmldoc.value('(/a/b/text())[1]', 'int') BAD: node.query('.').value('@attr', 'nvarchar(50)') GOOD: node.value('@attr', 'nvarchar(50)')

Optimal Use of Methods Grouping value() method Group value() methods on same XML instance next to each other if the path expressions in the value() methods are Simple path expressions that only use child and attribute axis and do not contain wildcards, predicates, nodetest, ordinals. The path expressions infer statically a singleton The singleton can be statically inferred from the DOCUMENT and XML Schema Collection Relative paths on the context node provided by the nodes() method Requires XML index to be present

Optimal Use of Methods Grouping value() method Example (assuming a schema constraint) BAD: select c.value('@id[1]', 'int') as CustID, o.value('@id', 'int') as OrdID, c.value('firstname[1]', 'nvarchar(50)') as fname, c.value('lastname[1]', 'nvarchar(50)') as lname from T cross apply x.nodes('/doc/customer') as N1(c) cross apply c.nodes('orders') as N2(o) GOOD: select c.value('@id', 'int') as CustID, c.value('firstname', 'nvarchar(50)') as fname, c.value('lastname', 'nvarchar(50)') as lname, o.value('@id', 'int') as OrdID from T cross apply x.nodes('/doc/customer') as N1(c) cross apply c.nodes('orders') as N2(o)

Optimal Use of Methods Using the right method to join and compare Use exist() method, sql:column()/sql:variable() and an XQuery comparison for checking for a value or joining if secondary XML indices present BAD: select doc from doc_tab join authors on doc.value('(/doc/mainauthor/lname/text())[1]', 'nvarchar(50)') = lastname GOOD: select doc from doc_tab join authors on 1 = doc.exist('/doc/mainauthor/lname/text()[. = sql:column("lastname")]') If applied on XML variable, value() method may still be more efficient

Optimal Use of Methods Avoiding multiple Method Evaluations Use subqueries BAD: SELECT CASE isnumeric (doc.value( '(/doc/customer/order/price)[1]', 'nvarchar(32)')) WHEN 1 THEN doc.value( '(/doc/customer/order/price)[1]', 'decimal(5,2)') ELSE 0 END FROM T GOOD: SELECT CASE isnumeric (Price) WHEN 1 THEN CAST(Price as decimal(5,2)) ELSE 0 END FROM (SELECT doc.value( '(/doc/customer/order/price)[1]', 'nvarchar(32)')) as Price FROM T) X Use subqueries also with NULLIF()

Combined SQL and XQuery/DML Processing XQuery Parser Static Typing Algebrization XML Schema Collection Metadata Static Phase Runtime Optimization and Execution of physical Op Tree Dynamic Phase XML and rel. Indices Static Optimization of combined Logical and Physical Operation Tree SQL Parser Algebrization Static Typing SELECT x.query('…'), y FROM T WHERE …

New XQuery Algebra Operators XML Reader TVF Table-Valued Function XML Reader UDF with XPath Filter Used if no Primary XML Index is present Creates node table rowset in query flow Multiple XPath filters can be pushed in to reduce node table to subtree Base cardinality estimate is always 10’000 rows!  Some adjustment based on pushed path filters XMLReader node table format example (simplified)

New XQuery Algebra Operators UDX Serializer UDX serializes the query result as XML XQuery String UDX evaluates the XQuery string() function XQuery Data UDX evaluates the XQuery data() function Check UDX validates XML being inserted UDX name visible in SSMS properties window

Optimal Use of XQuery Atomization of nodes Value comparisons, XQuery casts and value() method casts require atomization of item attribute: /person[@age = 42]  /person[data(@age) = 42] Atomic typed element: /person[age = 42]  /person[data(age) = 42] Untyped, mixed content typed element (adds UDX): /person[age = 42]  /person[data(age) = 42]  /person[string(age) = 42] If only one text node for untyped element (better): /person[age/text() = 42]  /person[data(age/text()) = 42] String() aggregates all text nodes, prohibits index use for value lookup

Optimal Use of XQuery Casting Values Value comparisons require casts and type promotion Untyped attribute: /person[@age = 42]  /person[xs:decimal(@age) = 42] Untyped text node(): /person[age/text() = 42]  /person[xs:decimal(age/text()) = 42] Typed element (typed as xs:int): /person[salary = 3e4]  /person[xs:double(salary) = 3e4] Casting is expensive and prohibits index lookup Tips to avoid casting: Use appropriate types for comparison (string for untyped) Use schema to declare type

Optimal Use of XQuery Maximize XPath expressions Single paths are more efficient than twig paths Avoid predicates in the middle of path expressions book[@ISBN = "1-8610-0157-6"]/author[firstname = "Davis"]  /book[@ISBN = "1-8610-0157-6"] "∩" /book/author[first-name = "Davis"] Move ordinals to the end of path expressions Make sure you get the same semantics! /a[1]/b[1] ≠ (/a/b)[1] ≠ /a/b[1] (/book/@isbn)[1] is better than /book[1]/@isbn

Optimal Use of XQuery Maximize XPath expressions in exist() Use context item in predicate to lengthen path in exist() Existential quantification makes returned node irrelevant BAD: SELECT * FROM docs WHERE 1 = xCol.exist ('/book/subject[text() = "security"]') GOOD: SELECT * FROM docs WHERE 1 = xCol.exist ('/book/subject/text()[. = "security"]') BAD: SELECT * FROM docs WHERE 1 = xCol.exist ('/book[@price > 9.99 and @price < 49.99]') GOOD: SELECT * FROM docs WHERE 1 = xCol.exist ('/book/@price[. > 9.99 and. < 49.99]') Does not work with or in predicate

Optimal Use of XQuery Inefficent Operations: parent axis Most frequent offender: parent axis with nodes() BAD: select o.value('../@id', 'int') as CustID, o.value('@id', 'int') as OrdID from T cross apply x.nodes('/doc/customer/orders') as N(o) GOOD: select c.value('@id', 'int') as CustID, o.value('@id', 'int') as OrdID from T cross apply x.nodes('/doc/customer') as N1(c) cross apply c.nodes('orders') as N2(o)

Optimal Use of XQuery Inefficent Operations Avoid descendant axes and // in the middle of path expressions if the data structure is known. // still can use the HID lookup, but is less efficient XQuery construction performs worse than FOR XML BAD: SELECT notes.query(' { {sql:column("name")}, / } ') FROM Customers WHERE cid=1 GOOD: SELECT cid as "@cid", name, notes as "*" FROM Customers WHERE cid=1 FOR XML PATH('Customer'), TYPE

Optimal Use of FOR XML Use TYPE directive when assigning result to XML BAD: declare @x xml; set @x = (select * from Customers for xml raw); GOOD: declare @x xml; set @x = (select * from Customers for xml raw, type); Use FOR XML PATH for complex grouping and additional hierarchy levels over FOR XML EXPLICIT Use FOR XML EXPLICIT for complex nesting if FOR XML PATH performance is not appropriate

Optimizations for XML Variables & Parameters Variables and parameters of type XML Large values are backed by tempDB  Use multiple tempDB files for better scalability Inlineable functions with XML parameters If value passed to the parameter is not the same type as parameter, conversions will be done for every method invocation  Use xml variable inside to avoid multiple casts (note that this function will be otherwise more expensive, so testing is advised) Specify singleton cardinality statically to improve cardinality estimates: Use XML Schemas, and/or Use ordinals in path expressions ([1], [last()] get mapped to TOP)

XML Indices Create XML index on XML column CREATE PRIMARY XML INDEX idx_1 ON docs (xDoc) Create secondary indexes on tags, values, paths Creation: Single-threaded only for primary XML index Multi-threaded for secondary XML indexes Uses: Primary Index will always be used if defined (not a cost based decision) Results can be served directly from index SQL’s cost based optimizer will consider secondary indexes Maintenance: Primary and Secondary Indices will be efficiently maintained during updates Only subtree that changes will be updated No online index rebuild  Clustered key may lead to non-linear maintenance cost  Schema revalidation still checks whole instance

insert into Person values (42, ' Bad Bugs Nobody loves bad bugs. Tree Frogs All right-thinking people love tree frogs. ') Example Index Contents

Primary XML Index CREATE PRIMARY XML INDEX PersonIdx ON Person (Pdesc) Assumes typed data; Columns and Values are simplified, see VLDB 2004 paper for details PKXIDTAG IDNodeType-IDVALUEHID 4211 (book)Element1 (bookT)null#book 421.12 (ISBN)Attribute2 (xs:string)1-55860-438-3#@ISBN#book 421.33 (section)Element3 (sectionT)null#section#book 421.3.14 (TITLE)Element2 (xs:string)Bad Bugs#title#section#book 421.3.3--Text--Nobody loves bad bugs. #text()#section#book 421.53 (section)Element3 (sectionT)null#section#book 421.5.14 (title)Element2 (xs:string)Tree frogs#title#section#book 421.5.3--Text--All right-thinking people #text()#section#book 421.5.57 (bold)Element4 (boldT)love#bold#section#book 421.5.7--Text--tree frogs#text()#section#book

Architectural Blueprint: Indexing PKXIDNIDTIDVALUELVALUEHIDxsinil… 1 1 1 2 2 2 3 3 3 idx 1Binary XML 2 3 XML Column in table T(id, x) Primary XML Index (1 per XML column) Clustered on Primary Key (of table T), XID Non-clustered Secondary Indices (n per primary Index) Value Index Path Index Property Index 3 1 2 1 2 4 3 3 1 2

Demo XQueries and XML Indices

Takeaway: XML Indices PRIMARY XML Index – Use when lots of XQuery FOR VALUE – Useful for queries where values are more selective than paths such as //*[.=“Seattle”] FOR PATH – Useful for Path expressions: avoids joins by mapping paths to hierarchical index (HID) numbers. Example: /person/address/zip FOR PROPERTY – Useful when optimizer chooses other index (for example, on relational column, or FT Index) in addition so row is already known

Use of Full-Text Index for Optimization Can provide improvement for XQuery contains() queries Query for documents where section title contains “optimization” Use Fulltext index to prefilter candidates (includes false positives) SELECT * FROM docs WHERE 1 = xCol.exist(' /book/section/title/text()[contains(.,"optimization")] ') SELECT * FROM docs WHERE contains(xCol, 'optimization') AND 1 = xCol.exist(' /book/section/title/text()[contains(.,"optimization")] ')

Resources Optimization whitepapers http://msdn2.microsoft.com/en-us/library/ms345118.aspx http://msdn2.microsoft.com/en-us/library/ms345121.aspx http://msdn2.microsoft.com/en-us/library/ms345118.aspx http://msdn2.microsoft.com/en-us/library/ms345121.aspx Online WebCasts http://www.microsoft.com/events/series/msdnsqlserver2005.mspx#SQL XML Newsgroups & Forum: microsoft.public.sqlserver.xml http://communities.microsoft.com/newsgroups/default.asp?ICP=sqlser ver2005&sLCID=us http://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=89 http://communities.microsoft.com/newsgroups/default.asp?ICP=sqlser ver2005&sLCID=us http://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=89 General XML and Databases whitepapers http://msdn2.microsoft.com/en-us/xml/bb190603.aspx http://msdn2.microsoft.com/en-us/xml/bb190603.aspx My E-mail: mrys@microsoft.commrys@microsoft.com My Weblog: http://sqlblog.com/blogs/michael_rys/ Related TechEd Presentations DAT311 - Best Practices for Optimizing XML Queries in SQL Server 2005 and Beyond

Resources Technical Communities, Webcasts, Blogs, Chats & User Groups http://www.microsoft.com/communities/default.mspx http://www.microsoft.com/communities/default.mspx Microsoft Learning and Certification http://www.microsoft.com/learning/default.mspx http://www.microsoft.com/learning/default.mspx Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet http://microsoft.com/msdn http://microsoft.com/technet Trial Software and Virtual Labs http://www.microsoft.com/technet/downloads/trials/default.ms px http://www.microsoft.com/technet/downloads/trials/default.ms px Required slide: Please customize this slide with the resources relevant to your session

Complete your evaluation on the My Event pages of the website at the CommNet or the Feedback Terminals to win!

DAT318 - Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager SQL Server.

Similar presentations

Presentation on theme: "DAT318 - Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager SQL Server."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DAT318 - Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager SQL Server.

Similar presentations

Presentation on theme: "DAT318 - Deep Dive into XQuery and XML in Microsoft SQL Server: Common Problems and Best Practice Solutions Michael Rys Principal Program Manager SQL Server."— Presentation transcript:

Similar presentations

About project

Feedback