4/17/2015 DB2 9 – A DBA Guide to Native XML
Agenda and Purpose XML it looks easy enough XML in DB2 9 XML testing XML and Performance Summary and Questions Understand XML and related technology DB2 9 implementation of native XML Have an idea how it looks in the real world Performance and design considerations for XML Be able to discuss options for DB2 XML applications
Why Pure XML o85% of information is unstructured o50 separate systems and 2 -3 ERP in average company o30% of people’s time is spent searching for relevant information 40 Exabyte's (4 x 10 to the 19th) of unique information will be generated in 2007
Why XML 85% of information is unstructured XML can represent just about anything XML forces syntax-Level interoperability Information outlives technology Information Outlives Technology
XML Overview › XML stands for Extensible Markup Language › XML is a markup language much like HTML › XML was designed to describe data › XML tags are not predefined. You must define your own tags › XML uses a XML Schema (XSD) to describe the data (DTD older technology) › XML is a W3C Recommendation
XML Description - TAGS Mike Sniezek BMC Software (713)
XML Validation XSD XML SCHEMA XML DOCUMENT Mike Sniezek BMC Software (713)
XML Transformation › XSL consists of three parts: –XSLT - a language for transforming XML documents –XPath - a language for navigating in XML documents –XSL-FO - a language for formatting XML documents XML DocumentXSL Style sheetParameters XSL Transformation XMLTextSQLXHTMLWMLHTML
XML Data - Spread sheet used for examples
XML Data from Spreadsheet Empire Burlesque Bob Dylan USA Columbia Hide your heart Bonnie Tyler UK CBS Records
XML Display with XLS <xsl:stylesheet version="1.0" xmlns:xsl=" Title Artist TitleArtist Empire Burlesque Bob Dylan
Using XML the Big Picture Apples miracle
XML and DB2 v9
XML Data type / storage Tablespace Table XML Tablespace XML Table XML columnDOCID NODEIDXMLDATA DOCID Index DOCID, NODEID Index XML User Index CREATE TABLE FAVORITE_CDS (NAME CHAR(20) NOT NULL, CDID BIGINT, CDINFO XML); XML Tablespace Partitioned by growth, if the base table space is not partitioned Partitioned by range, if the base table space is partitioned
SYSXMLRELS TBOWNER TBNAME COLNAME XML TABLE MKTMBS CLIENTS CONTACTINFO XCLIENTS MKTMBS FAVORITE CDINFOXFAVORITE_CDS MKTMBS PURCHASEORDERS XMLPOXPURCHASEORDERS Base Table XML Table XML columnDOCID NODEIDXMLDATA SYSIBM.SYSXMLRELS
SYSXMLSTRINGS SELECT FROM "SYSIBM".SYSXMLSTRINGS WHERE STRINGID > 1142 STRINGID STRING 1143 TITLE 1144 ARTIST 1145 COMPANY 1146 YEAR 1147 CD NUMBER OF ROWS SELECTED 5 Empire Burlesque Bob Dylan USA Columbia Hide your heart Bonnie Tyler UK CBS Records Hide your heart1143
Just going through the basics
Getting to DB Commands Entered connect to DEDK user MVSMXS1 using ********; connect to DEDK user MVSMXS1 using Database Connection Information Database server = DB2 OS/ SQL authorization ID = MVSMXS1 Local database alias = DEDK A JDBC connection to the target has succeeded.
XML Data Type – Purchase Orders Alice Smith 123 Maple Street Mill Valley CA Robert Smith 8 Oak Avenue Old Town PA Hurry, my lawn is going wild Lawnmower Confirm this is electric Baby Monitor
DDL - Nothing To Worry About CREATE TABLESPACE relData pagesize 4K managed by automatic storage bufferpool bp4k; DROP TABLE PURCHASEORDERS; CREATE TABLE PurchaseOrders (ponumber varchar(10) not null, podate date not null, status char(1), XMLpo xml, primary key (ponumber)); CREATE TABLE PO LIKE PurchaseOrders; CREATE VIEW ValidPurchaseOrders as SELECT ponumber, podate, XMLpo FROM PurchaseOrders WHERE status = ‘A’; ALTER TABLE PurchaseOrders ADD revisedXMLpo xml;
Manipulation UPDATE PurchaseOrders SET XMLpo = :XMLpo_revised WHERE ponumber = ‘12345’; INSERT INTO PurchaseOrders VALUES (‘ ’,CURRENT DATE, ‘A’, :xmlPo); INSERT INTO PurchaseOrders VALUES (‘ ’, CURRENT DATE, ‘A’, XMLPARSE(DOCUMENT :vchar PRESERVE WHITESPACE) ); INSERT into PurchaseOrders VALUES( ' ', CURRENT DATE, 'A',DSN_XMLValidate(:lobPo, ’SYSXSR.myPOSchema’)); DELETE FROM PurchaseOrders WHERE ponumber = ‘12345’
Retrieval SELECT XMLpo INTO :xmlPo FROM PurchaseOrders WHERE ponumber = ‘ ’; SELECT XMLPO FROM PurchaseOrders WHERE XMLEXISTS(‘//items/item[productName = “Baby Monitor”]’ PASSING XMLpo); SELECT XMLQUERY(‘//items/item/quantity’ PASSING XMLpo) FROM PurchaseOrders WHERE …;
Indexes CREATE INDEX ON PurchaseOrders(XMLPO) Generate Keys Using XMLPATTERN ‘/purchaseOrder/items/item/productName’ as SQL VARCHAR(100); Index will be used for this query. SELECT XMLPO FROM PurchaseOrders XMLEXISTS(‘/purchaseOrder/items/item[productName = “Lawnmower”]’ passing XMLPO)
Validation XML Schema Support DB2 requires a SQL identifier for identification. REGISTER XMLSCHEMA FROM file://C:/xmlschema/order.xsd AS ORDERSCHEMA COMPLETE [ENABLE DECOMPOSITION]; REMOVE XMLSCHEMA ORDERSCHEMA;
XML Performance How are you going to use the XML?
INSERT Performance INSERT › The obvious the larger and more complex the XML column is the more expensive the insert. › The more indexes the more overhead › INSERT with VALIDATE is at least double the overhead › Use host variables rather than Literals › Use LOAD instead of SQL INSERT (30 to 40 percent)
Update › When updating an XML document, an SQL UPDATE statement is equivalent to performing an SQL DELETE and INSERT and the performance will be about the same. › If this is going to happen allot then consider splitting the XML into smaller pieces
Select Performance › Performance for SQL no real change, the size and complexity of the XML document will determine overhead. Well coded SQL proper indexes and physical design still make the big difference. › XML Indexes are different and good ones make a great deal of difference. CREATE INDEX ON PurchaseOrders(XMLPO) Generate Keys Using XMLPATTERN ‘/purchaseOrder/items/item/productName’ as SQL VARCHAR(100);
Data Considerations CREATE TABLE CD_CATALOG (TITLE VARCHAR(30), ARTIST VARCHAR(30), COUNTRY VARCHAR(25), COMPANY VARCHAR (25) PRICE DECIMAL(5,2), YEAR SMALINT); CREATE TABLE CD_CATALOG (PID VARCHAR(10) NOT NULL, OWNER VARCHAR(30), DESCRIPTION XML); CREATE TABLE CD_CATALOG (NAME CHAR(20) NOT NULL, CDID BIGINT, CDINFO XML, PRIMARY KEY (NAME)); Native XML is good when you need Schema flexibility Search Performance Partial document retrieval
Summary › XML is the current standard for sharing data › How your organization is exploiting XML or will be exploiting XML should be understood by Database Administrators › You should be familiar of how DB2 9 stores and catalogs the XML data type › You don’t have to be an SQL GURU but be aware of performance characteristics › Even if your organization has no immediate plans to use the XML data type your understanding maybe key to the future of new applications. – Try it! › If you have questions
Some useful links › - related acronyms › - XML Tutorial › - XML 1.0 Standard: › - XPATH and XQUERY › - Schema › - XML applications and standards › -XPATH Tutorial › -Red Book › - examples