Download presentation
Presentation is loading. Please wait.
Published byHarry Charles Modified over 9 years ago
1
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Native XML Databases Ronald Bourret rpbourret@rpbourret.com http://www.rpbourret.com
2
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Overview What is a native XML database? Use cases Code examples Implementation and performance Normalization and referential integrity Common database features
3
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com What is a Native XML Database?
4
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Storing XML in a database 1.Build a model »Model the data in the XML document, or... »Model the XML document itself 2.Map the model to the database 3.Transfer data according to the model
5
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Sample XML document 1234 Gallagher Co. 29.10.00 A-10 12 10.95 B-43 600 3.99
6
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Modeling data Objects in model are specific to XML schema object Order { number=1234; customer="Gallagher Corp."; date=29.10.00; items={ptrs to Items}; } object Item { number=1; part="A-10"; quantity=12; price=10.95; } object Item { number=2; part="B-43"; quantity=600; price=3.99; }
7
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Storing data Database schema specific to XML schema Orders Number Customer Date 1234 Gallagher Co. 291000......... Items SONum Item Part Qty Price 1234 1 A-10 12 10.95 1234 2 B-43 600 3.99...............
8
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Modeling documents Objects in model independent of XML schema Element (Order) Element Element Element Element Element (Number) (Customer) (Date) (Item) (Item) Text Text Text... Element Element Element Attr 1234 Gallagher Co. 29.10.00 (Part) (Quantity) (Price) (Number) Text Text Text Text A-10 12 10.95 1
9
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Storing documents Database schema independent of XML schema (order columns not shown) Elements Attributes Text ID Name Parent ID Name Parent Parent Value 1 Order -- 13 Number 5 2 1234 2 Number 1 14 Number 6 3 Gallagher Co. 3 Customer 1 4 29.10.00 4 Date 1 7 A-10 5 Item 1 8 12 6 Item 1 9 10.95 7 Part 5 10 B-43 8 Quantity 5 11 600 9 Price 5 12 3.99 10 Part 6 13 1 11 Quantity 6 14 2 12 Price 6
10
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Types of databases (take 1) Categorize databases based on what they model Model data => XML-enabled database Model document => Native XML database
11
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Types of databases (take 2) XML adds a new data model to the world »In addition to relational, hierarchical, OO,... The “XML” data model is »A tree of ordered nodes »Nodes have different types (element, attribute, etc.) »Some nodes are labeled (Date, Quantity, Price, etc.) »Data stored in leaf nodes Modeling language is an XML schema language »DTD, XML Schemas, RELAX NG, etc.
12
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Types of databases (take 2) XML-enabled databases »Have their own data model (relational, OO, etc.) »Map their data model to XML data model Native XML databases »Use XML data model directly
13
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Formal definition¹ A native XML database: »Defines an XML data model »Uses a document as its fundamental unit of (logical) storage »Can have any physical storage ¹ Definition from XML:DB Initiative
14
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com XML data model XML 1.0 did not define a model Native XML databases define their own model »Model must include elements, attributes, text, and document order »Examples are XPath, Info Set, DOM, and SAX »XQuery model will be de facto standard in future? Data transferred according to the model
15
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Fundamental unit of storage Document is fundamental unit of logical storage »Equivalent structure in RDBMS is a row Document usually contains single set of data
16
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Physical storage Can use any physical storage Examples »Tables for SAX “objects” in relational database »DOM objects in object-oriented database »Binary file format optimized for XPath data model »Hashtables of XPaths and values »Compressed, indexed XML documents in file system
17
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use Cases
18
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use case: Tabular data Manufacturing data Accounting data Employee data Scientific data...
19
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use a native XML DB only if: You want to lose your job. Relational databases excel at storing tabular data Systems are mature, fast Few reasons to change existing code or DBMS
20
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use case: Documents Product documentation Static Web pages Presentations Advertisements....
21
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use a native XML DB because: Structure too irregular for relational model Physical info (entities, CDATA,...) is important Need document-centric queries »Get first chapter with a paragraph containing “XML” »Get headings of all chapters that contain figures Want to retain document identity
22
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use case: Semi-structured data Some structure, but not regular »Structured data: white pages »Semi-structured data: yellow pages Examples »Health data »Result of EAI (Enterprise Application Integration) »Geneological data »...
23
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use a native XML DB because: Difficult to store in a relational database »Use single, sparsely populated table or... »Many tables (one per subject area) Documents might contain arbitrary extensions
24
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use case: “Natural” format is XML Only format is XML »XSLT stylesheets Data stored temporarily as XML »Long-running transactions »Enterprise application integration (EAI) »Documents in a message queue Schema-less documents »No schema or schema not known
25
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use a native XML DB because: Want query language, security, transactions, etc. No reason to use an XML-enabled database »Natural format is XML »Mapping arbitrary documents at run-time is inefficient, error-prone
26
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use case: Large documents Need entire document in memory »DOM tree »XSLT transformation »etc. Document too large for memory
27
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use a native XML DB because: In-memory tree lazily instantiated »Unused nodes kept on disk Tree size limited only by: »Disk size »Database address space
28
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use case: Archiving Long-term document storage Often required by law »Pharmaceutical industry »Contracts »Financial transactions
29
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use a native XML DB because: Want query language, security, transactions, etc. »Better than file system or BLOBs Retains complete document »Retaining original document might be a problem May support versioning through updates or diffs
30
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use case: Performance We all want better performance, right?
31
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Use a native XML DB because: May be faster than XML-enabled databases Speed depends on: »Actual query »Query and storage engines »Output format (DOM, SAX, string)
32
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Code Examples
33
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com XML:DB API Database-neutral API from XML:DB Initiative Similar to JDBC, OLE DB, etc. »Implemented by database drivers »Methods to connect to database, execute queries, etc. »Uses separate query language(s)
34
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com JDBC example: INSERT // Instantiate the Sun JDBC-ODBC bridge driver. The driver registers // itself with the driver manager. Class.forName("sun.jdbc.odbc.JdbcOdbcDriver"); // Get a connection to the Orders database and turn auto-commit off. Connection conn = DriverManager.getConnection("jdbc:odbc:Orders", "rpbourret", "my_password"); conn.setAutoCommit(false); // Get a Statement. Statement s = conn.createStatement();
35
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com JDBC example: INSERT (cont.) // Execute INSERT statements to insert new rows into the Orders // and Items tables. s.executeUpdate("INSERT INTO Orders " + "VALUES ('1234', 'Gallagher Co.', #10/29/00#)"); s.executeUpdate("INSERT INTO Items " + "VALUES ('1234', 'A-10', 12)"); s.executeUpdate("INSERT INTO Items " + "VALUES ('1234', 'B-43', 600)"); // Commit the transaction. Close the statement and the connection. conn.commit(); s.close(); conn.close();
36
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com XML:DB example: INSERT // Get the database (driver) for dbXML/Xindice. Register it // explicitly with the database manager. Class c = Class.forName("org.dbxml.client.xmldb.DatabaseImpl"); Database db = (Database) c.newInstance(); DatabaseManager.registerDatabase(db); // Get the Orders collection. Collection coll = DatabaseManager.getCollection("xmldb:dbxml:///db/Orders", "rpbourret", "my_password"); // Get a transaction service. XML:DB is designed to use "services", // such as transaction, collection management, and query services. TransactionService txn = (TransactionService)coll.getService("TransactionService", "1.0");
37
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com XML:DB example: INSERT (cont.) // Start a new transaction. txn.begin(); // Create a new (empty) XMLResource in the Orders collection. XMLResource res = (XMLResource)coll.createResource("Order1234", "XMLResource"); // Read the contents of the Order1234.xml file into a string. int b; FileReader r = new FileReader("Order1234.xml"); StringWriter w = new StringWriter(); while ((b = r.read()) != -1) { w.write(b); }
38
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com XML:DB example: INSERT (cont.) // Set the contents of the new resource to the string. res.setContent(w.toString()); // Commit the transaction. txn.commit(); // Close the collection and deregister the database. coll.close(); DatabaseManager.deregisterDatabase(db);
39
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com JDBC example: SELECT // Instantiate the Sun JDBC-ODBC bridge driver. The driver registers // itself with the driver manager. Class.forName("sun.jdbc.odbc.JdbcOdbcDriver"); // Get a connection to the Orders database. Connection conn = DriverManager.getConnection("jdbc:odbc:Orders", "rpbourret", "my_password"); // Execute a SELECT statement to get all orders from 10/29/00. Statement s = conn.createStatement(); ResultSet rs = s.executeQuery("SELECT Number, Date, Customer " + "FROM Orders WHERE Date=#10/29/00#");
40
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com JDBC example: SELECT (cont.) // Get each row in the result set and print it. while (rs.next()) { System.out.println("Order: " + rs.getString(1)); System.out.println("Customer: " + rs.getString(2)); System.out.println("Date: " + rs.getDate(3)); System.out.println(); } // Close the result set, the statement, and the connection. rs.close(); s.close(); conn.close();
41
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com XML:DB example: SELECT // Get the database (driver) for dbXML/Xindice. Register it // explicitly with the database manager. Class c = Class.forName("org.dbxml.client.xmldb.DatabaseImpl"); Database db = (Database) c.newInstance(); DatabaseManager.registerDatabase(db); // Get the Orders collection. Collection coll = DatabaseManager.getCollection("xmldb:dbxml:///db/Orders", "rpbourret", "my_password"); // Get an XPath query service over the Orders collection. XML:DB // is designed to use a variety of "services", such as different // query engines. XPathQueryService service = (XPathQueryService)coll.getService("XPathQueryService", "1.0");
42
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com XML:DB example: SELECT (cont.) // Execute an XPath query to get all orders from 29.10.00. ResourceSet docs = service.query("/Order[Date=\"29.10.00\"]"); // Iterate over the returned documents and print each one. We do // not need to know any metadata (such as data types) here. This // is because metadata is contained in the XML document itself. ResourceIterator iterator = docs.getIterator(); while (iterator.hasMoreResources()) { XMLResource doc = (XMLResource)iterator.nextResource(); System.out.println((String)doc.getContent()); System.out.println(); } // Close the collection and deregister the database. coll.close(); DatabaseManager.deregisterDatabase(db);
43
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Implementation and Performance
44
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Text-based storage Stores documents as text Uses file system, CLOB, etc. »Includes XML-aware text in RDBMS May need to parse documents at run time Uses indexes to avoid extra parsing, increase search speed
45
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Text-based storage 123 Main St. Chicago IL 60609 USA
46
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Text-based databases Indexed files »TextML CLOBs »Oracle 9i release 2, DB2
47
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Model-based storage Stores documents in “object” form Documents parsed when inserted For example, store DOM objects in OODBMS Underlying storage can be relational, object- oriented, hierarchical, proprietary Uses indexes to speed searches
48
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Model-based storage 123 Main St. Chicago IL 60609 USA Element Element Element Element Element Element Text Text Text Text Text
49
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Model-based databases Proprietary »Tamino, Xindice, Neocore, Ipedo, XStream DB, XYZFind, Infonyte, Virtuoso, Coherity, Luci, TeraText, Sekaiju, Cerisent, DOM-Safe, XDBM,... Relational »Xfinity, eXist, Sybase, DBDOM Object-oriented »eXcelon, X-Hive, Ozone/Prowler, 4Suite, Birdstep
50
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Performance No public benchmark data? Native vendors say native is faster Relational vendors say relational is faster Native XML databases rely heavily on indexing »Slows update times Some guesses are possible
51
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Whole documents and fragments (Text-based databases) Should be very fast »Data is contiguous on disk »Retrieval requires index lookup and single disk read 1. Index lookup 2. Position disk head 3. Read to here
52
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Whole documents and fragments (Model-based databases) Databases with proprietary stores should be fast »Can use physical pointers between nodes Databases built on other DBs may be fast or slow »Depends on underlying database and implementation Node 1. Index lookup 2. Position disk head 3. Follow pointers to end
53
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Unindexed data Slow for model-based databases »Must read many elements, not just particular type »Comparisons may be slower due to converting text Very slow for text-based databases »Must parse document as well as comparing values Element Element Element Text Text Text Attr Element... Task: Find date 29.10.00 Relational database: 1. Search this column Model-based native XML database: 1. Search all elements for Date elements 2. Search text for all Date elements Orders......... 1234 29.10.00 Gallagher Industries.........
54
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Unindexed data: Optimization /Order[Date="29.10.00"] »Only need to search children of Order »Schema may help locate Date element //Order[Date="29.10.00"] »Schema may help locate Order and Date elements »Without schema, must search entire document
55
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Query return types Text-based databases »Very fast returning strings »Slow returning DOM or SAX due to parsing Model-based databases »Probably similar to XML-enabled databases
56
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Queries not following storage hierarchy The $64,000 question... Slower than hierarchical queries Pointers to parents may help model-based DBs 1234 Gallagher Co. 29.10.00 A-10 12 10.95 B-43 600 3.99 Get the dates and numbers of all sales orders for part “A-10” 1. Index lookup for part “A-10” 2. Follow pointers or index to Order 3. Search children for Number, Date <Part Number="A-10" 1234 29.10.00...
57
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Normalization and Referential Integrity
58
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Normalization Defined with respect to the relational model Performed on schema Determines minimal logical structures Eliminates duplicate data
59
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Example: Normalization ORDERS: OrderNum, Date, CustNum, Address, ItemNum, Qty, PartNum, Price ORDERS: OrderNum, Date, CustNum CUSTOMERS: CustNum, Address ITEMS: OrderNum, ItemNum, Qty, PartNum PARTS: PartNum, Price
60
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com First normal form Relational schema One value per column Define primary key »Means one set of data per row XML schema Does not apply; multiple children allowed Root element not a “wrapper” »Means one set of data per doc »Reduces concurrency problems »Reduces parse time in text-based databases
61
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com First normal form (cont.) Relational schemaXML schema ORDERS: OrderNum, Date, CustNum, Address, ItemNum, Qty, PartNum, Price
62
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Second normal form Relational schema Create separate table for one-to-many fields »Eliminates duplicate "parent" data XML schema Create separate elements for repeated groups Separate document not needed »Hierarchical structure allows multiple children per parent
63
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Second normal form (cont.) Relational schemaXML schema ORDERS: OrderNum, Date, CustNum, Address, ItemNum, Qty, PartNum, Price ORDERS: OrderNum, Date, CustNum, Address ITEMS: OrderNum, ItemNum, Qty, PartNum, Price
64
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Third normal form Relational schema Create separate table for many-to-one fields »Eliminates duplicate "child" data XML schema Primary information store: »Move data to separate document »Too many docs => use RDBMS Secondary information store: »Create separate elements »Duplicate data often not an issue »Historic data is read-only
65
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Third normal form (cont.) Relational schemaXML schema (1) ORDERS: OrderNum, Date, CustNum, Address ITEMS: OrderNum, ItemNum, Qty, PartNum, Price ORDERS: OrderNum, Date, CustNum ITEMS: OrderNum, ItemNum, Qty, PartNum CUSTOMERS: CustNum, Address PARTS: PartNum, Price
66
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Third normal form (cont.) XML schema (2)
67
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Referential integrity Means logical pointers point to valid data CaseReferential integrity ID/IDREFEnforced during validation Key/keyrefEnforced during validation XLinks, proprietary links [1]Not enforced (problem!) [2] External entity refsNot enforced (problem!) [2] [1] Not widely supported [2] Not enforceable for entities outside database
68
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Common Database Features
69
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Round-tripping All databases can round-trip documents Round-trip level depends on database Text-based DBs should do exact round-tripping Model-based DBs round-trip at level of model »Minimum level is elements, attributes, PCDATA, and document order »May be less than canonical XML (comments and processing instructions may be discarded)
70
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Document collections Contain related documents Similar to directory in file system... »Documents can have any schema »Some databases allow nested collections... or similar to table in relational database »Documents restricted to a single schema »Allows indexes and queries to be optimized
71
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Indexes All databases use indexes »Structural indexes index element and attribute names »Value indexes index text and attribute values »Full text indexes index tokens in values Some databases index everything Other databases allow user to specify index fields
72
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Query languages Many databases have proprietary languages XPath is most common standard language »Includes extensions for multi-document queries XQuery will be standard in the future
73
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Updates Many databases only replace existing document Common update strategies: »Updates through live DOM »Update language. Use XPath to identify node, then: Insert data before/after node Modify node Delete node »Extensions to XQuery Validity of updates often not enforced
74
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com APIs Similar to ODBC »Query language is separate from API »Connect, execute query, get results, commit/rollback »Results returned as string, DOM tree, or SAX events Often available over HTTP Most databases have proprietary APIs »XML:DB is database-neutral API »Oracle, IBM have submitted API through Sun JCP
75
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Transactions, locking, and concurrency Most databases support transactions Locking usually at document (not fragment) level Whether this is an issue depends on »What is stored in a single document »Number of concurrent users Fragment locking more common in the future?
76
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Links Few databases support XLink or proprietary links Some links are live (i.e. allow updates) Referential integrity not supported
77
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com External data Some databases can merge external data »Relational data (ODBC, JDBC, OLE DB) »Application data (SAP, PeopleSoft, Excel, etc.) »... Whether data is live depends on database In the future, many databases will probably support live external data
78
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com External entity storage Not clear whether to store entity or URI »Storing entity incorrect if URI points to live data »Storing URI may be incorrect if entity is a snapshot Correct answer is probably to let user decide Not sure what is currently stored »All databases store URI of external entities? »Some databases also store BLOBs (unparsed entities)
79
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Resources Ronald Bourret’s Papers Page »http://www.rpbourret.com/xml/index.htm XML / Database Links »http://www.rpbourret.com/xml/XMLDBLinks.htm XML:DB Initiative »http://www.xmldb.org/ The XML Cover Pages: XML and Databases »http://xml.coverpages.org/xmlAndDatabases.html
80
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Questions? Ronald Bourret rpbourret@rpbourret.com http://www.rpbourret.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.