Presentation is loading. Please wait.

Presentation is loading. Please wait.

You Got XML In My Database? What’s Up With That?

Similar presentations


Presentation on theme: "You Got XML In My Database? What’s Up With That?"— Presentation transcript:

1 You Got XML In My Database? What’s Up With That?
Stuart R Ainsworth

2 Purpose Discuss the marriage of XML to relational databases (SQL Server 2005+) Approach from a design perspective Brief history of two approaches

3 Why Me? Data Architect working in the field of financial information security Manage data flows from 15+ different vendor systems 150 million rows of data per day Prior experience as DBA and a reporting/database developer A chapter leader for AtlantaMDF

4 Why Me? NOT an XML “expert”
Agreed to do this presentation as a learning experience Continues to be a work-in-progress

5 Assumptions About You Database professional (DBA, database developer, BI specialist) Well-versed in SQL, especially T-SQL Basic understanding of what XML is

6 Goals Explore XML and relational data XML features in SQL Server 2005+
History Differences Compatibility XML features in SQL Server 2005+ Classic relational design challenges

7 History Lessons “Set the WABAC machine, Sherman!” -Mr. Peabody

8 History-Rel Paradigm Relational design based on work of E.F.Codd
A Relational Model of Data For Large Shared Data Banks – 1970 ACM Codd’s 12 Rules for Relational DB’s (1985) Implementation Ingres (1974) Relational Software (Oracle; 1979)

9 History-Rel Paradigm Context: Hierarchical databases prevalent
Tree structure Redundant data in attributes Expense of computer hardware Limited storage capability Limited expansion possibilities In a hierarchical data model, data is organized into a tree-like structure in such a way that it cannot have too many relationships. The structure allows repeating information using parent/child relationships. All attributes of a specific record are listed under an entity type. In a database, an entity type is the equivalent of a table; each individual record is represented as a row and an attribute as columns. Entity types are related to each other using 1: N mapping also known as one to many relationships. An example would be: an organization has records of employees in a table (entity type) called Employees. In the table there would be attributes/columns such as First Name, Last Name, Job Name and Wage. The company also has data about the employee’s children in a separate table called Children with attributes such as First Name, Last Name, and DOB. The Employee table represents a parent segment and the Children table represents a Child segment. These 2 segments form a hierarchy where an employee may have many children but each child may only have 1 parent.

10 History-Rel Paradigm “By the time UNIX began to become popular (1974), a well configured PDP-11 had 768 Kb of core memory, two 200 Mb moving head disks (hard disks), a reel to reel tape drive for backup purposes, a dot-matrix line printer and a bunch of [dumb] terminals. This was a high end machine, and even a minimally configured PDP-11 cost about $40,000. Despite the cost, 600 such installations had been put into service by the end of 1974, mostly at universities.” The History Of Computers During My Lifetime - The 1970's by Jason Patterson

11 History-Rel Paradigm “In 1973, IBM developed what is considered to be the first true sealed hard disk drive... It used two 30 Mb platters. Over the following decade, sealed hard disks (often called Winchester disks) took their place as the primary data storage medium, initially in mainframes, then in minicomputers, and finally in personal computers starting with the IBM PC/XT in 1983.”

12 Database Schemas Normalization 1NF 2NF 3NF Atomicity of data
Definition of the primary key 3NF Dependency on the primary key

13 That’s great, but…. Optimize storage of information
Reduce redundant information Increase performance for query engine Optimize data validity Maintain relationships on dependent keys Ensure consistent change control (Later) Optimize information security Well-designed security models Who has access to what, when, & how

14 That’s great, but…. Optimize storage of information
Reduce redundant information Increase performance for query engine Optimize data validity Maintain relationships on dependent keys Ensure consistent change control (Later) Optimize information security Well-designed security models Who has access to what, when, & how

15 History-XML 1970’s – Goldfarb, Mosher, & Lorie defined GML (later SGML – Standard Generalised Markup Language) Isolate content from presentation HTML – most well known SGML SGML is very complex. HTML standards became polluted Netscape vs IE FireFox vs IE

16 History-XML 1990’s Bosak, Bray, Clark defined eXtensible Markup Language – XML Well-formed 1 root element Matching end tags No overlapping elements Valid DTD (Document Type Definition) XML Schema

17 History-XML 2002, Microsoft released .NET
Response to Java interoperability .NET relies on XML to pass data

18 That’s great, but…. Isolate content from presentation
Defines standards for interpretation Minimal definitions for implementation Suggest data validity XML documents must be well-formed XML documents should be valid

19 XML commands “Never send a human to do a machine’s job!” -Agent Smith

20 XML in SQL Server 2000+ Generation: FOR XML RAW, AUTO, EXPLICIT

21 FOR XML RAW USE AdventureWorks GO SELECT Cust.CustomerID, OrderHeader.CustomerID as ohCustID, OrderHeader.SalesOrderID, OrderHeader.Status, Cust.CustomerType FROM Sales.Customer Cust INNER JOIN Sales.SalesOrderHeader OrderHeader ON Cust.CustomerID = OrderHeader.CustomerID FOR XML RAW

22 FOR XML RAW <row CustomerID="676" ohCustID="676" SalesOrderID="43659" Status="5" CustomerType="S" /> <row CustomerID="117" ohCustID="117" SalesOrderID="43660" Status="5" CustomerType="S" /> <row CustomerID="442" ohCustID="442" SalesOrderID="43661" Status="5" CustomerType="S" /> <row CustomerID="227" ohCustID="227" SalesOrderID="43662" Status="5" CustomerType="S" /> <row CustomerID="510" ohCustID="510" SalesOrderID="43663" Status="5" CustomerType="S" /> <row CustomerID="397" ohCustID="397" SalesOrderID="43664" Status="5" CustomerType="S" /> <row CustomerID="146" ohCustID="146" SalesOrderID="43665" Status="5" CustomerType="S" /> <row CustomerID="511" ohCustID="511" SalesOrderID="43666" Status="5" CustomerType="S" /> <row CustomerID="646" ohCustID="646" SalesOrderID="43667" Status="5" CustomerType="S" /> <row CustomerID="514" ohCustID="514" SalesOrderID="43668" Status="5" CustomerType="S" /> <row CustomerID="578" ohCustID="578" SalesOrderID="43669" Status="5" CustomerType="S" /> <row CustomerID="504" ohCustID="504" SalesOrderID="43670" Status="5" CustomerType="S" /> <row CustomerID="200" ohCustID="200" SalesOrderID="43671" Status="5" CustomerType="S" /> <row CustomerID="119" ohCustID="119" SalesOrderID="43672" Status="5" CustomerType="S" /> <row CustomerID="618" ohCustID="618" SalesOrderID="43673" Status="5" CustomerType="S" /> <row CustomerID="83" ohCustID="83" SalesOrderID="43674" Status="5" CustomerType="S" />

23 FOR XML AUTO USE AdventureWorks GO SELECT Cust.CustomerID, OrderHeader.CustomerID, OrderHeader.SalesOrderID, OrderHeader.Status, Cust.CustomerType FROM Sales.Customer Cust INNER JOIN Sales.SalesOrderHeader OrderHeader ON Cust.CustomerID = OrderHeader.CustomerID FOR XML AUTO

24 FOR XML AUTO <Cust CustomerID="676" CustomerType="S"> <OrderHeader CustomerID="676" SalesOrderID="43659" Status="5" /> </Cust> <Cust CustomerID="117" CustomerType="S"> <OrderHeader CustomerID="117" SalesOrderID="43660" Status="5" /> <Cust CustomerID="442" CustomerType="S"> <OrderHeader CustomerID="442" SalesOrderID="43661" Status="5" /> <Cust CustomerID="227" CustomerType="S"> <OrderHeader CustomerID="227" SalesOrderID="43662" Status="5" />

25 FOR XML EXPLICIT Beyond the scope of this presentation 

26 XML in SQL Server 2000+ Generation: Translation: FOR XML OPENXML
RAW, AUTO, EXPLICIT Translation: OPENXML sp_xml_preparedocument sp_xml_removedocument

27

28 XML in SQL Server 2005+ Generation: FOR XML PATH TYPE

29 FOR XML PATH USE AdventureWorks GO SELECT Cust.CustomerID AS OrderHeader.CustomerID AS OrderHeader.SalesOrderID AS OrderHeader.Status AS "Order/Status", Cust.CustomerType AS FROM Sales.Customer Cust INNER JOIN Sales.SalesOrderHeader OrderHeader ON Cust.CustomerID = OrderHeader.CustomerID FOR XML PATH

30 FOR XML PATH <row> <Customer CustomerID="676" /> <Order CustomerID="676" OrderID="43659"> <Status>5</Status> </Order> <Customer Type="S" /> </row> <Customer CustomerID="117" /> <Order CustomerID="117" OrderID="43660">

31 XML in SQL Server 2005+ Generation: Translation: FOR XML xml datatype
PATH TYPE Translation: xml datatype XQuery

32 XML Translation xml datatype Well-formed fragments (no root required)
2 GB maximum Cannot be compared or sorted Supports conversion to (n)varchar(max) Required for XQuery

33 XML Translation XQuery Complete query language outside of SQL Server
SQL Server implements limited subset xml methods query() value() exist() nodes() modify() (beyond scope of presentation)

34 .query() XML = '<Root> <ProductDescription ProductID="1" ProductName="Road Bike"> <Features> <Warranty>1 year parts and labor</Warranty> <Maintenance>3 year parts and labor extended maintenance is available</Maintenance> </Features> </ProductDescription> </Root>'

35 .query() <Features> <Warranty>1 year parts and labor</Warranty> <Maintenance>3 year parts and labor extended maintenance is available</Maintenance> </Features>

36 .exist() & .value() XML = '<Root> <ProductDescription ProductID="1" ProductName="Road Bike"> <Features> <Warranty>1 year parts and labor</Warranty> <Maintenance>3 year parts and labor extended maintenance is available</Maintenance> </Features> </ProductDescription> </Root>' = 1 BEGIN 'varchar(100)') AS Warranty END

37 .nodes() xml <row id="1"><name>Larry</name><oflw>some text</oflw></row> <row id="2"><name>moe</name></row> <row id="3" /> </Root>' SELECT T.c.query('..') AS result T(c)

38 .nodes() <row id="1"><name>Larry</name><oflw>some text</oflw></row> <row id="2"><name>moe</name></row> <row id="3" />

39 .nodes() xml <row id="1"><name>Larry</name><oflw>some text</oflw></row> <row id="2"><name>moe</name></row> <row id="3" /> </Root>' SELECT T.c.query('.').value('(//name)[1]', 'varchar(10)') AS result T(c)

40 .nodes() Larry moe NULL

41 XML in SQL Server 2005+ Generation: Translation: T-SQL: FOR XML
PATH TYPE Translation: xml datatype XQuery T-SQL: APPLY operator

42 T-SQL: APPLY The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. The table-valued function acts as the right input and the outer table expression acts as the left input. The right input is evaluated for each row from the left input and the rows produced are combined for the final output. The list of columns produced by the APPLY operator is the set of columns in the left input followed by the list of columns returned by the right input.

43 T-SQL: APPLY Important parts: Requires a table-valued function
“joins” TVF with a table CROSS :: INNER JOIN OUTER :: LEFT OUTER JOIN

44

45 Pulling it together TABLE (LastName varchar(10), Stooges xml) INSERT (LastName, Stooges) VALUES ('Howard', '<Stooge>Moe</Stooge> <Stooge>Curly</Stooge> <Stooge>Shemp</Stooge>'), ('Fine', '<Stooge>Larry</Stooge>') SELECT t.LastName, x.c.query('.').value('(/Stooge)[1]','varchar(10)') as FirstName t CROSS APPLY Stooges.nodes('/Stooge') x(c)

46 Pulling it together LastName FirstName Howard Moe Howard Curly Howard Shemp Fine Larry

47 XML – SQL Scenarios “If all you have is a hammer, everything looks like a nail” -Bernard Baruch

48 Whole XML documents Storage of XML documents
Application transfers complete document Need not be stored as xml datatype Decide if queried or modified as xml Depending on doc size, may not perform well Large documents require more I/O (disk & network) Multiple rows require more I/O (disk & network) Depending on datatype, may not validate

49 Classic Design Problems
Entity-Attribute-Value (EAV) designs Doesn’t solve problem; adds an option Adding attributes after release Complete datasets as parameters Additional disconnect between layers “object-like” handling set-oriented methodology Requires strict data handling

50 EAV design Database Storage determined after implementation
Requires multiple subqueries to fetch data May cause Performance Problems Brittle Validity

51 EAV Design IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.tables WHERE TABLE_NAME = 'emp_values') DROP TABLE emp_values ; IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.tables WHERE TABLE_NAME = 'emp') DROP TABLE emp ; create table emp (empno integer primary key ); create table emp_values (empno INT references emp, code varchar(20), value varchar(100)); insert into emp (empno) values (1234); insert into emp_values VALUES (1234, 'NAME','ANDREWS'); insert into emp_values VALUES (1234, 'SAL','1000'); insert into emp_values VALUES (1234, 'JOB','CLERK');

52 EAV Design SELECT * FROM dbo.emp_values ev empno code value 1234 NAME ANDREWS 1234 SAL JOB CLERK

53 EAV Design --structure we want version 1 SELECT e.empno, NAME = ev1.value, SAL = ev2.value, job = ev3.value FROM emp e JOIN emp_values ev1 ON e.empno = ev1.empno AND ev1.code='NAME' JOIN emp_values ev2 ON e.empno = ev2.empno AND ev2.code='SAL' JOIN emp_values ev3 ON e.empno = ev3.empno AND ev3.code='JOB' --structure we want version 2 SELECT e.empno, NAME = (SELECT ev.value FROM emp_values ev WHERE ev.empno = e.empno AND ev.code = 'NAME'), SAL = (SELECT ev.value AND ev.code = 'SAL'), JOB = (SELECT ev.value AND ev.code = 'JOB') FROM emp e

54 EAV design XML doesn’t solve design issues, but does mitigate the cost
May cause performance issues XML indexes Brittle validity Use XML Schema to validate Still requires dynamic SQL to transform data to UI

55 EAV Design TABLE (empno int, eav xml) INSERT (empno, eav) VALUES (1234, '<root> <NAME>Andrews</NAME> <JOB>Judge</JOB> <SAL>1000</SAL> </root>') SELECT e.empno, x.value('local-name(.)','VARCHAR(20)') AS ElementName, x.value('.','VARCHAR(20)') AS ElementValue e CROSS APPLY eav.nodes('/*/*') y(x) empno ElementName ElementValue 1234 NAME Andrews 1234 JOB Judge 1234 SAL 1000

56 Dataset Parameters Complete & complex datasets
“object-like” processing Single order with multiple line items Complex transfers between tables Avoid row-by-row inserts Minimizes network latency Application sends over XML document Stored proc shreds & inserts

57 Dataset Parameters <e EmployeeID="1" ContactID="1209"> <c FirstName="Guy" LastName="Gilbert" /> </e> <e EmployeeID="2" ContactID="1030"> <c FirstName="Kevin" LastName="Brown" /> <e EmployeeID="3" ContactID="1002"> <c FirstName="Roberto" LastName="Tamburello" />

58 Dataset Parameters CREATE PROC EmployeePerson XML) AS SELECT [1]', 'int') as EmployeeID, [1]', 'int') as ContactID T(c) SELECT [1]', 'int') as ContactID, [1]', 'varchar(25)') as FirstName, [1]', 'varchar(25)') as LastName

59 Dataset Parameters EmployeeID ContactID ContactID FirstName LastName 1209 Guy Gilbert 1030 Kevin Brown 1002 Roberto Tamburello

60 Dataset Parameters Maintains axioms of relational paradigm
Security of stored procs Validity of expected inputs Performance If used to minimize row-by-row processing, then yes Otherwise, no real impact.

61 Resources; Questions? “It’s a miracle that curiosity survives formal education.” – Albert Einstein

62 Contact Information Stuart R Ainsworth

63 Resources SQL Server 2008 Books Online
Pro T-SQL 2008 Programmer’s Guide Coles, 2008, Apress Professional SQL Server 2005 XML Klein, 2006, Wrox


Download ppt "You Got XML In My Database? What’s Up With That?"

Similar presentations


Ads by Google