GISt lunch meeting OTB Research Institute for Housing, Urban and Mobility Studies Writing a DBMS buyers guide Wim de Haas Wilko Quak Based on presentation at FOSS4G 2007 on benchmarking
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Overview Original idea: benchmarking Complications of benchmarking New Idea: buyers guide What should be in this guide
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Benchmark consideration: Weird Cases department diagonal query geometry flat query geometry
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Benchmark consideration: Hot vs Cold
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Why bother with benchmarking… Stonebraker2007: Where to find dramatic differences in Spatial DBMSs? We define “dramatically outperform” to mean at least a factor 10 advantage […then] customers will be inclined to try the new architecture
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Where to expect Dramatic differences? Linux vs Windows. (No) Choice of DBMS (Only in specific cases) Choice of FileSystem (no) Functionality Difference (Yes) Choice of Parameters (Maybe)
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Problems with testing DBMS vendors do not want published results Oracle explicitly forbids publishing benchmark results Hardware Moore’s Law Release Frequency of Software Spatial testing cannot be done on synthetic data Too many parameters Benchmark results are outdated before they are publised
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Solution Don’t spend our time on producing benchmark results: Write buyer’s guide: we need a classification of users. Let people do their own testing: Tell them what en when to test and help them with at test suite.
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Classification of spatial DMBS users Four classes: 1.Server Builders: publish spatial data via web server 2.GIS User: Load various datasets and perform complex analyses 3.Data Maintainer: Maintain one core dataset 4.Power Users: All of the above and more
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Class 1: Web Server Builders You do not really need a DBMS for this (You use a fraction of DBMS functionality) Only one query counts: Find everything within BBOX
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Class 2: GIS users Main interest is functionality Spend more time on loading data Need a good query optimiser Analysis
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Class 3: Dataset Maintainers Limited number of queries Transactions are an issue Clustering of data after updates is interesting
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Class 4: Power users Do their own testing Need a platform to discuss their findings
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Test suite proposal 1.Very simple performance test script with few parameters BBOX Query Fixed Dataset (Propasal OpenStreetMap dataset) 2.Configurable test suite Full Suite that tests every corner of DBMS For specialists only
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Test 1: simple BBOX select Write simple script that generates a lot of rectangle queries. Paremeter: DBMS size query box size experiment length
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Test 1: grow DBMS size Question: Does query response time depend on DBMS size or on core memory? Experiment: Run same test on more an more copies of same database.
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Test 1 – result: PostGIS vs MySQL
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Test 2: Comprehensive Test Suite Create set of killer polygons so that every line of source code will be touched by running operations. Test Query optimizer Test Join Operator Must be done with Skewed Data
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting What should be in the Buyer’s guide Performance is not an issue. What are issues: Details of functionality (topology, coordinate transforms) Total cost of ownership (open-source vs proprietary) Configuration (faster disks or faster CPU) Ease of Use (2 days of programming == A LOT OF HARDWARE) Use of standards (vendor lock-in, system integration) Can we answer these questions?
OTB Research Institute for Housing, Urban and Mobility Studies GISt lunch meeting Discussion