You Got XML In My Database? What’s Up With That?

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Michael Pizzo Software Architect Data Programmability Microsoft Corporation.
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
Lecture-7/ T. Nouf Almujally
CSE 190: Internet E-Commerce Lecture 10: Data Tier.
Introduction to Databases Transparencies
Chapter 11 Data Management Layer Design
SQL Server 2000 and XML Erik Veerman Consultant Intellinet Business Intelligence.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Module 17 Storing XML Data in SQL Server® 2008 R2.
Database Design and Introduction to SQL
1 DATABASE TECHNOLOGIES BUS Abdou Illia, Fall 2007 (Week 3, Tuesday 9/4/2007)
Using XML in SQL Server 2005 NameTitleCompany. XML Overview Business Opportunity The majority of all data transmitted electronically between organizations.
CS370 Spring 2007 CS 370 Database Systems Lecture 2 Overview of Database Systems.
Systems analysis and design, 6th edition Dennis, wixom, and roth
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Introduction to Relational Databases &
Chapter 12: Designing Databases
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
Module 18 Querying XML Data in SQL Server® 2008 R2.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Module 3: Using XML. Overview Retrieving XML by Using FOR XML Shredding XML by Using OPENXML Introducing XQuery Using the xml Data Type.
Distribution of Marks For Second Semester Internal Sessional Evaluation External Evaluation Assignment /Project QuizzesClass Attendance Mid-Term Test Total.
1 A Very Brief Introduction to Relational Databases.
1 10 Systems Analysis and Design in a Changing World, 2 nd Edition, Satzinger, Jackson, & Burd Chapter 10 Designing Databases.
Module 9: Using Advanced Techniques. Considerations for Querying Data Working with Data Types Cursors and Set-Based Queries Dynamic SQL Maintaining Query.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
XML and SQL Server Better friends than you thought Matt Hartman.
Understanding Core Database Concepts Lesson 1. Objectives.
CompSci 280 S Introduction to Software Development
Fundamentals of DBMS Notes-1.
XML: Extensible Markup Language
Chapter 1 Introduction.
“Introduction To Database and SQL”
Using XML in SQL Server and Azure SQL Database
Client/Server Databases and the Oracle 10g Relational Database
Database System Concepts and Architecture
Information Systems Today: Managing in the Digital World
02 | Advanced SELECT Statements
Chapter 9 Database Systems
Fundamentals & Ethics of Information Systems IS 201
Database Management System
Server-Side Application and Data Management IT IS 3105 (FALL 2009)
Dirt, Spit, and Happy FLWOR
Database Performance Tuning and Query Optimization
Databases and Data Warehouses Chapter 3
Tools for Memory: Database Management Systems
ISC440: Web Programming 2 Server-side Scripting PHP 3
“Introduction To Database and SQL”
Introduction to Database Management System
CHAPTER 7: ADVANCED SQL.
1 Demand of your DB is changing Presented By: Ashwani Kumar
MANAGING DATA RESOURCES
Chapter 6 System and Application Software
Database.
Teaching slides Chapter 8.
Chapter 1: Introduction
Chapter 11 Database Performance Tuning and Query Optimization
Chapter 17 Designing Databases
Chapter 6 System and Application Software
Chapter 6 System and Application Software
Understanding Core Database Concepts
Chapter 6 System and Application Software
INTRODUCTION A Database system is basically a computer based record keeping system. The collection of data, usually referred to as the database, contains.
XML? What’s this doing in my database? Adam Koehler
Presentation transcript:

You Got XML In My Database? What’s Up With That? Stuart R Ainsworth stuart@codegumbo.com

Purpose Discuss the marriage of XML to relational databases (SQL Server 2005+) Approach from a design perspective Brief history of two approaches

Why Me? Data Architect working in the field of financial information security Manage data flows from 15+ different vendor systems 150 million rows of data per day Prior experience as DBA and a reporting/database developer A chapter leader for AtlantaMDF

Why Me? NOT an XML “expert” Agreed to do this presentation as a learning experience Continues to be a work-in-progress

Assumptions About You Database professional (DBA, database developer, BI specialist) Well-versed in SQL, especially T-SQL Basic understanding of what XML is

Goals Explore XML and relational data XML features in SQL Server 2005+ History Differences Compatibility XML features in SQL Server 2005+ Classic relational design challenges

History Lessons “Set the WABAC machine, Sherman!” -Mr. Peabody

History-Rel Paradigm Relational design based on work of E.F.Codd A Relational Model of Data For Large Shared Data Banks – 1970 ACM Codd’s 12 Rules for Relational DB’s (1985) Implementation Ingres (1974) Relational Software (Oracle; 1979)

History-Rel Paradigm Context: Hierarchical databases prevalent Tree structure Redundant data in attributes Expense of computer hardware Limited storage capability Limited expansion possibilities In a hierarchical data model, data is organized into a tree-like structure in such a way that it cannot have too many relationships. The structure allows repeating information using parent/child relationships. All attributes of a specific record are listed under an entity type. In a database, an entity type is the equivalent of a table; each individual record is represented as a row and an attribute as columns. Entity types are related to each other using 1: N mapping also known as one to many relationships. An example would be: an organization has records of employees in a table (entity type) called Employees. In the table there would be attributes/columns such as First Name, Last Name, Job Name and Wage. The company also has data about the employee’s children in a separate table called Children with attributes such as First Name, Last Name, and DOB. The Employee table represents a parent segment and the Children table represents a Child segment. These 2 segments form a hierarchy where an employee may have many children but each child may only have 1 parent.

History-Rel Paradigm “By the time UNIX began to become popular (1974), a well configured PDP-11 had 768 Kb of core memory, two 200 Mb moving head disks (hard disks), a reel to reel tape drive for backup purposes, a dot-matrix line printer and a bunch of [dumb] terminals. This was a high end machine, and even a minimally configured PDP-11 cost about $40,000. Despite the cost, 600 such installations had been put into service by the end of 1974, mostly at universities.” The History Of Computers During My Lifetime - The 1970's by Jason Patterson http://www.pattosoft.com.au/jason/Articles/HistoryOfComputers/1970s.html

History-Rel Paradigm “In 1973, IBM developed what is considered to be the first true sealed hard disk drive... It used two 30 Mb platters. Over the following decade, sealed hard disks (often called Winchester disks) took their place as the primary data storage medium, initially in mainframes, then in minicomputers, and finally in personal computers starting with the IBM PC/XT in 1983.”

Database Schemas Normalization 1NF 2NF 3NF Atomicity of data Definition of the primary key 3NF Dependency on the primary key

That’s great, but…. Optimize storage of information Reduce redundant information Increase performance for query engine Optimize data validity Maintain relationships on dependent keys Ensure consistent change control (Later) Optimize information security Well-designed security models Who has access to what, when, & how

That’s great, but…. Optimize storage of information Reduce redundant information Increase performance for query engine Optimize data validity Maintain relationships on dependent keys Ensure consistent change control (Later) Optimize information security Well-designed security models Who has access to what, when, & how

History-XML 1970’s – Goldfarb, Mosher, & Lorie defined GML (later SGML – Standard Generalised Markup Language) Isolate content from presentation HTML – most well known SGML SGML is very complex. HTML standards became polluted Netscape vs IE FireFox vs IE

History-XML 1990’s Bosak, Bray, Clark defined eXtensible Markup Language – XML Well-formed 1 root element Matching end tags No overlapping elements Valid DTD (Document Type Definition) XML Schema

History-XML 2002, Microsoft released .NET Response to Java interoperability .NET relies on XML to pass data

That’s great, but…. Isolate content from presentation Defines standards for interpretation Minimal definitions for implementation Suggest data validity XML documents must be well-formed XML documents should be valid

XML commands “Never send a human to do a machine’s job!” -Agent Smith

XML in SQL Server 2000+ Generation: FOR XML RAW, AUTO, EXPLICIT

FOR XML RAW USE AdventureWorks GO SELECT Cust.CustomerID, OrderHeader.CustomerID as ohCustID, OrderHeader.SalesOrderID, OrderHeader.Status, Cust.CustomerType FROM Sales.Customer Cust INNER JOIN Sales.SalesOrderHeader OrderHeader ON Cust.CustomerID = OrderHeader.CustomerID FOR XML RAW

FOR XML RAW <row CustomerID="676" ohCustID="676" SalesOrderID="43659" Status="5" CustomerType="S" /> <row CustomerID="117" ohCustID="117" SalesOrderID="43660" Status="5" CustomerType="S" /> <row CustomerID="442" ohCustID="442" SalesOrderID="43661" Status="5" CustomerType="S" /> <row CustomerID="227" ohCustID="227" SalesOrderID="43662" Status="5" CustomerType="S" /> <row CustomerID="510" ohCustID="510" SalesOrderID="43663" Status="5" CustomerType="S" /> <row CustomerID="397" ohCustID="397" SalesOrderID="43664" Status="5" CustomerType="S" /> <row CustomerID="146" ohCustID="146" SalesOrderID="43665" Status="5" CustomerType="S" /> <row CustomerID="511" ohCustID="511" SalesOrderID="43666" Status="5" CustomerType="S" /> <row CustomerID="646" ohCustID="646" SalesOrderID="43667" Status="5" CustomerType="S" /> <row CustomerID="514" ohCustID="514" SalesOrderID="43668" Status="5" CustomerType="S" /> <row CustomerID="578" ohCustID="578" SalesOrderID="43669" Status="5" CustomerType="S" /> <row CustomerID="504" ohCustID="504" SalesOrderID="43670" Status="5" CustomerType="S" /> <row CustomerID="200" ohCustID="200" SalesOrderID="43671" Status="5" CustomerType="S" /> <row CustomerID="119" ohCustID="119" SalesOrderID="43672" Status="5" CustomerType="S" /> <row CustomerID="618" ohCustID="618" SalesOrderID="43673" Status="5" CustomerType="S" /> <row CustomerID="83" ohCustID="83" SalesOrderID="43674" Status="5" CustomerType="S" />

FOR XML AUTO USE AdventureWorks GO SELECT Cust.CustomerID, OrderHeader.CustomerID, OrderHeader.SalesOrderID, OrderHeader.Status, Cust.CustomerType FROM Sales.Customer Cust INNER JOIN Sales.SalesOrderHeader OrderHeader ON Cust.CustomerID = OrderHeader.CustomerID FOR XML AUTO

FOR XML AUTO <Cust CustomerID="676" CustomerType="S"> <OrderHeader CustomerID="676" SalesOrderID="43659" Status="5" /> </Cust> <Cust CustomerID="117" CustomerType="S"> <OrderHeader CustomerID="117" SalesOrderID="43660" Status="5" /> <Cust CustomerID="442" CustomerType="S"> <OrderHeader CustomerID="442" SalesOrderID="43661" Status="5" /> <Cust CustomerID="227" CustomerType="S"> <OrderHeader CustomerID="227" SalesOrderID="43662" Status="5" />

FOR XML EXPLICIT Beyond the scope of this presentation 

XML in SQL Server 2000+ Generation: Translation: FOR XML OPENXML RAW, AUTO, EXPLICIT Translation: OPENXML sp_xml_preparedocument sp_xml_removedocument

XML in SQL Server 2005+ Generation: FOR XML PATH TYPE

FOR XML PATH USE AdventureWorks GO SELECT Cust.CustomerID AS "Customer/@CustomerID", OrderHeader.CustomerID AS "Order/@CustomerID", OrderHeader.SalesOrderID AS "Order/@OrderID", OrderHeader.Status AS "Order/Status", Cust.CustomerType AS "Customer/@Type" FROM Sales.Customer Cust INNER JOIN Sales.SalesOrderHeader OrderHeader ON Cust.CustomerID = OrderHeader.CustomerID FOR XML PATH

FOR XML PATH <row> <Customer CustomerID="676" /> <Order CustomerID="676" OrderID="43659"> <Status>5</Status> </Order> <Customer Type="S" /> </row> <Customer CustomerID="117" /> <Order CustomerID="117" OrderID="43660">

XML in SQL Server 2005+ Generation: Translation: FOR XML xml datatype PATH TYPE Translation: xml datatype XQuery

XML Translation xml datatype Well-formed fragments (no root required) 2 GB maximum Cannot be compared or sorted Supports conversion to (n)varchar(max) Required for XQuery

XML Translation XQuery Complete query language outside of SQL Server SQL Server 2005+ implements limited subset xml methods query() value() exist() nodes() modify() (beyond scope of presentation)

.query() DECLARE @myDoc XML SET @myDoc = '<Root> <ProductDescription ProductID="1" ProductName="Road Bike"> <Features> <Warranty>1 year parts and labor</Warranty> <Maintenance>3 year parts and labor extended maintenance is available</Maintenance> </Features> </ProductDescription> </Root>' SELECT @myDoc.query('/Root/ProductDescription/Features')

.query() <Features> <Warranty>1 year parts and labor</Warranty> <Maintenance>3 year parts and labor extended maintenance is available</Maintenance> </Features>

.exist() & .value() DECLARE @myDoc XML SET @myDoc = '<Root> <ProductDescription ProductID="1" ProductName="Road Bike"> <Features> <Warranty>1 year parts and labor</Warranty> <Maintenance>3 year parts and labor extended maintenance is available</Maintenance> </Features> </ProductDescription> </Root>' IF @Mydoc.exist('/Root/ProductDescription/Features/Warranty') = 1 BEGIN SELECT @Mydoc.value('(/Root/ProductDescription/Features/Warranty)[1]', 'varchar(100)') AS Warranty END

.nodes() DECLARE @x xml SET @x='<Root> <row id="1"><name>Larry</name><oflw>some text</oflw></row> <row id="2"><name>moe</name></row> <row id="3" /> </Root>' SELECT T.c.query('..') AS result FROM @x.nodes('/Root/row') T(c)

.nodes() <row id="1"><name>Larry</name><oflw>some text</oflw></row> <row id="2"><name>moe</name></row> <row id="3" />

.nodes() DECLARE @x xml SET @x='<Root> <row id="1"><name>Larry</name><oflw>some text</oflw></row> <row id="2"><name>moe</name></row> <row id="3" /> </Root>' SELECT T.c.query('.').value('(//name)[1]', 'varchar(10)') AS result FROM @x.nodes('/Root/row') T(c)

.nodes() Larry moe NULL

XML in SQL Server 2005+ Generation: Translation: T-SQL: FOR XML PATH TYPE Translation: xml datatype XQuery T-SQL: APPLY operator

T-SQL: APPLY The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. The table-valued function acts as the right input and the outer table expression acts as the left input. The right input is evaluated for each row from the left input and the rows produced are combined for the final output. The list of columns produced by the APPLY operator is the set of columns in the left input followed by the list of columns returned by the right input.

T-SQL: APPLY Important parts: Requires a table-valued function “joins” TVF with a table CROSS :: INNER JOIN OUTER :: LEFT OUTER JOIN

Pulling it together DECLARE @T TABLE (LastName varchar(10), Stooges xml) INSERT INTO @T (LastName, Stooges) VALUES ('Howard', '<Stooge>Moe</Stooge> <Stooge>Curly</Stooge> <Stooge>Shemp</Stooge>'), ('Fine', '<Stooge>Larry</Stooge>') SELECT t.LastName, x.c.query('.').value('(/Stooge)[1]','varchar(10)') as FirstName FROM @T t CROSS APPLY Stooges.nodes('/Stooge') x(c)

Pulling it together LastName FirstName Howard Moe Howard Curly Howard Shemp Fine Larry

XML – SQL Scenarios “If all you have is a hammer, everything looks like a nail” -Bernard Baruch

Whole XML documents Storage of XML documents Application transfers complete document Need not be stored as xml datatype Decide if queried or modified as xml Depending on doc size, may not perform well Large documents require more I/O (disk & network) Multiple rows require more I/O (disk & network) Depending on datatype, may not validate

Classic Design Problems Entity-Attribute-Value (EAV) designs Doesn’t solve problem; adds an option Adding attributes after release Complete datasets as parameters Additional disconnect between layers “object-like” handling set-oriented methodology Requires strict data handling

EAV design Database Storage determined after implementation Requires multiple subqueries to fetch data May cause Performance Problems Brittle Validity

EAV Design IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.tables WHERE TABLE_NAME = 'emp_values') DROP TABLE emp_values ; IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.tables WHERE TABLE_NAME = 'emp') DROP TABLE emp ; create table emp (empno integer primary key ); create table emp_values (empno INT references emp, code varchar(20), value varchar(100)); insert into emp (empno) values (1234); insert into emp_values VALUES (1234, 'NAME','ANDREWS'); insert into emp_values VALUES (1234, 'SAL','1000'); insert into emp_values VALUES (1234, 'JOB','CLERK');

EAV Design SELECT * FROM dbo.emp_values ev empno code value 1234 NAME ANDREWS 1234 SAL 1000 1234 JOB CLERK

EAV Design --structure we want version 1 SELECT e.empno, NAME = ev1.value, SAL = ev2.value, job = ev3.value FROM emp e JOIN emp_values ev1 ON e.empno = ev1.empno AND ev1.code='NAME' JOIN emp_values ev2 ON e.empno = ev2.empno AND ev2.code='SAL' JOIN emp_values ev3 ON e.empno = ev3.empno AND ev3.code='JOB' --structure we want version 2 SELECT e.empno, NAME = (SELECT ev.value FROM emp_values ev WHERE ev.empno = e.empno AND ev.code = 'NAME'), SAL = (SELECT ev.value AND ev.code = 'SAL'), JOB = (SELECT ev.value AND ev.code = 'JOB') FROM emp e

EAV design XML doesn’t solve design issues, but does mitigate the cost May cause performance issues XML indexes Brittle validity Use XML Schema to validate Still requires dynamic SQL to transform data to UI

EAV Design DECLARE @emp TABLE (empno int, eav xml) INSERT INTO @emp (empno, eav) VALUES (1234, '<root> <NAME>Andrews</NAME> <JOB>Judge</JOB> <SAL>1000</SAL> </root>') SELECT e.empno, x.value('local-name(.)','VARCHAR(20)') AS ElementName, x.value('.','VARCHAR(20)') AS ElementValue FROM @emp e CROSS APPLY eav.nodes('/*/*') y(x) empno ElementName ElementValue 1234 NAME Andrews 1234 JOB Judge 1234 SAL 1000

Dataset Parameters Complete & complex datasets “object-like” processing Single order with multiple line items Complex transfers between tables Avoid row-by-row inserts Minimizes network latency Application sends over XML document Stored proc shreds & inserts

Dataset Parameters <e EmployeeID="1" ContactID="1209"> <c FirstName="Guy" LastName="Gilbert" /> </e> <e EmployeeID="2" ContactID="1030"> <c FirstName="Kevin" LastName="Brown" /> <e EmployeeID="3" ContactID="1002"> <c FirstName="Roberto" LastName="Tamburello" />

Dataset Parameters CREATE PROC EmployeePerson (@x XML) AS SELECT T.c.value('(@EmployeeID) [1]', 'int') as EmployeeID, T.c.value('(@ContactID) [1]', 'int') as ContactID FROM @x.nodes('/e') T(c) SELECT T.c.value('(@ContactID) [1]', 'int') as ContactID, T.c.value('(c/@FirstName) [1]', 'varchar(25)') as FirstName, T.c.value('(c/@LastName) [1]', 'varchar(25)') as LastName

Dataset Parameters EmployeeID ContactID 1 1209 2 1030 3 1002 ContactID FirstName LastName 1209 Guy Gilbert 1030 Kevin Brown 1002 Roberto Tamburello

Dataset Parameters Maintains axioms of relational paradigm Security of stored procs Validity of expected inputs Performance If used to minimize row-by-row processing, then yes Otherwise, no real impact.

Resources; Questions? “It’s a miracle that curiosity survives formal education.” – Albert Einstein

Contact Information Stuart R Ainsworth stuart@codegumbo.com http://www.codegumbo.com http://www.twitter.com/stuarta

Resources SQL Server 2008 Books Online Pro T-SQL 2008 Programmer’s Guide Coles, 2008, Apress Professional SQL Server 2005 XML Klein, 2006, Wrox