Benchmarking XML storage systems Information Systems Lab HS 2007 Final Presentation © ETH Zürich | Benchmarking XML
Benchmarking XML – Final Presentation 2 Agenda Project Overview Motivation Goal of the Project Benchmark Overview Results RDBMS 1 Sedna MonetDB
Benchmarking XML – Final Presentation 3 Motivation Traditional DBMS use relational data model Vendors extend their systems to process XML or build new native stores XML processing is conceived to be slow Benchmarks for XML are just being developed
Benchmarking XML – Final Presentation 4 Goal of the Project Analyse and compare performance of different systems to process XML Systems tested: RDBMS1 – big player in the relational DBMS market, extended their product with XML capabilities Sedna – free native XML DB designed to be a universal system for a wide range of XML applications MonetDB – very fast compared to other XML-DBs, but only supports a small part of the XQuery functions
Benchmarking XML – Final Presentation 5 Benchmark Benchmark used : TPC-X currently under development at ETH models an Amazon-like online store in XML complete database is one XML file e.g.: users with history, products with comments complex queries that put stress on query engine
RDBMS1 Information Systems Lab HS 2007 Final Presentation © ETH Zürich | Benchmarking XML
Benchmarking XML – Final Presentation 7 Impression of the System almost all queries work with few changes update queries were surprisingly easy to adapt
Benchmarking XML – Final Presentation 8 Impression of the System (contd.) not supported: type-switch (limited schema support) user-defined functions
Benchmarking XML – Final Presentation 9 Current Performance datamining about one order of magnitude slower than Sedna update and search seem a bit faster (but still slower than others)
Benchmarking XML – Final Presentation 10 Tuning possibilities any XPath expression can be indexed Indexes seem to be based on rows rather than on trees
Benchmarking XML – Final Presentation 11 Issue with Indexing Indexes help only with „split“-tables, but they are slower in general
Benchmarking XML – Final Presentation 12 Issues „When the only tool you own is a hammer, every problem begins to resemble a nail.“ Abraham Maslow
Benchmarking XML – Final Presentation 13 Issues with Joins there is only Nested-Loops-Join no use of index as soon as a join is needed joins for almost anything
Benchmarking XML – Final Presentation 14 Summary almost anything works (even the adapter for XCheck!) everything is slow
Benchmarking XML – Final Presentation 15 Conclusion RDBMS1 is not suited for TpcX-Benchmark XML storage as a improvement for relational data but not as stand-alone system
Sedna Information Systems Lab HS 2007 Final Presentation © ETH Zürich | Benchmarking XML
Benchmarking XML – Final Presentation Overview Free native XML Database No Schema support Bulk-Load (native XML data storage) Document Collections Indexing Full-Text indexing (dtSearch)
Benchmarking XML – Final Presentation Impression Good Introduction Example Few Reference Material Active Development Team
Benchmarking XML – Final Presentation XQuery Support Most of the queries worked with a few changes Not supported: Schema Import FLWR-Expression with Update-Statement
Benchmarking XML – Final Presentation Indexing (value Indices) Based on B-Tree For Elements and Attribute Values Managing: Create Index on Nodes by Keys Query executer does not support indexes automatically -use „index-scan“ function in XQuery
Benchmarking XML – Final Presentation Indexing (cont.) gainsPerMonth1001’00010’00050’000100’000 Normal With Indices
Benchmarking XML – Final Presentation Indexing (Full-Text Indices) Sedna provides Full-Text Indices with dtSearch dtSearch: commercial text retrieval engine No free download
Benchmarking XML – Final Presentation Conclusion Easy to start with the system Few reference material Most of the queries work with a few changes Execution time grows exponentially with larger dataset Value indices deliver better execution times
MonetDB Information Systems Lab HS 2007 Final Presentation © ETH Zürich | Benchmarking XML
Benchmarking XML – Final Presentation Overview & impression of the system well documented installation / usage many xquery features not supported good performance xml schema support, but no noticed performance or functionality effect no support for user defined indexing (”automatic and self-tuning indexes”)
Benchmarking XML – Final Presentation Architecture MonetDB: Open-source database system for high-performance applications in data mining, OLAP, XML Query, test and multimedia retrieval. Provides the databse functionality using the MIL- interface (MonetDB Interpreter Language). Pathfinder: XQuery compiler that translates xquery expressions into relational algebra and calls MIL functions.
Benchmarking XML – Final Presentation XQuery support Date/Time functions (0/76) String functions (21/32) fn:contains, fn:tokenize Sequence functions (11/19) fn:insert-before … … quite complete support for XQuery language… monetdb.cwi.nl Not supported functions:
Benchmarking XML – Final Presentation XML data import pf:add-doc("url", "file", x%) need x > 0 for update queries -> need to adapt xcheck influence on performance not clear
Benchmarking XML – Final Presentation Performance...often achieves a 10- fold raw speed improvement for SQL and XQuery over competitor RDBMSs... monetdb.cwi.nl
Benchmarking XML – Final Presentation Scalability
Benchmarking XML – Final Presentation Conclusions Very fast, good for large documents and expensive queries Small documents: no drawback compared to other DBMSs Big problem: lack of function support If xquery function support gets better, it’s probably the database of our choice!
Project Summary Information Systems Lab HS 2007 Final Presentation © ETH Zürich | Benchmarking XML
Benchmarking XML – Final Presentation Project Summary RDBMS1 slow but can process almost anything. XML as a feature. Sedna quite fast, can process a reasonable part of XML. MonetDB very fast, but only limited capabilities.