Spatial DBMS issues OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS Research GISt lunch meeting Wilko Quak
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Overview Introduction to DBMS Query Processing Benchmarking a spatial DBMS The GeoInfoNed project MonetDB Discussion
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Introduction to DBMS query processing Slides borrowed from Dr. Yang He
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Query processing overview Review relational algebra Query processing introduction stages of query processing query optimisation relational algebra tree
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Relational algebra (1) a relational languages proposed by Codd implementable basis of high-level (SQL) query execution a collection of simple, 'low-level’ operations used to manipulate relations input is one or more relations output is one relation
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Relational algebra (2) Relational operations unary operators Restrict (Select) Project binary operators Cartesian productX Union Intersection Difference- Join Divide P
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues e.g. two relations Student and Registration Example relations Student ( SID, Name, Gender ) Registration ( SID, CID, Mark ) SIDNameGender S1KateF S2JohnM S3KateF S4FredM Student SIDCIDMark S1C165 S1C245 S2C280 S2C460 S3C150 S3C275 S4C370 Registration
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues e.g. “Identify all male students” in SQL in relational algebra (Gender= ‘ M ’ ) ( Student ) Queries examples (1) Select SIDNameGender S2JohnM S4FredM SELECT SID, Name, Gender FROM Student WHERE Gender=′M′;
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues e.g. “List student’s name and gender.” in SQL In relational algebra Name, Gender ( Student ) Queries examples (2) Project NameGender KateF JohnM FredM SELECT Name, Gender FROM Student;
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Queries examples (3) e.g. “Show student ID, name, their course ID and marks” in SQL in relational algebra SELECT s.SID, Name, CID, Mark FROM Student s, Registration r WHERE s.SID = r.SID; ( SID, Name ( Student) ) ( (Registration) ) SID,CID,Mark SID, Name, CID, Mark ( Student Registration ) or ProjectNatural Join
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues A user query may require several operations to be performed relational algebra is a procedural language so query operations are evaluated in the order specified a complex query can be executed in different ways, so an efficient one should be used as efficiency is an important DBMS requirement – query optimisation Queries in relational algebra
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Query processing Four stages involved in query processing query decomposition or parsing query optimization code generation runtime query execution
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Query optimization (1) refers to the activity of choosing an efficient execution strategy or plan for processing a query rule-based and cost-based strategies database statistics in system catalog used for cost estimation is a prime objective of the query processing
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Query optimization (3) In a query processing, disk access takes most time The main objective of the query optimisation is to minimize the number of disk accesses Many DBMSs use heuristic rules for query optimization e.g. “Perform selection and projection operations as early as possible to reduce the cardinality of the relation and the subsequent process of that relation”
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Query processing – an example e.g. “Show student ID, name, their course ID and marks” in SQL it can be transformed into relational algebra query SELECT s.SID, Name, CID, Mark FROM Student s, Registration r WHERE s.SID = r.SID; ( SID, Name ( Student) ) ( (Registration) ) SID,CID,Mark SID, Name, CID, Mark ( Student Registration ) or The first one is better: much less disk access than the second
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues e.g. Relational algebra query tree (2) Student Registration SID, Name SID,CID,Marke Leaf nodes Intermediate nodes Root ( SID, Name ( Student) ) ( (Registration) ) SID,CID,Mark
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Spatial Query processing In spatial query processing the operator is a spatial operator, for the rest it is the same as non-spatial query processing: Spatial Select Find all objects within given rectangle [99.99%] Spatial Join (overlay in GIS terms) Find all restaurants within national parks
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues DBMS Benchmarking Categorization of DBMS usage Implications for benchmarking benchmark choices
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Categories of DBMS users Static usage: Predefined queries with changing parameters Queries can be hand optimized Dynamic usage (browsing): Many different queries Query optimizer is important Access via object-relational mapping (e.g. Hibernate) Not discussed here All categories need different benchmarking
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Benchmarking static DBMS usage Notes: Critical factor is testing the ‘query processor’. Query optimizer is not important Benchmark: Make small set of simple queries that test one operation
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Benchmarking dynamic DBMS usage Notes: Critical factor is testing the ‘query optimizer’. Very hard to get quality reproducible results. It is very hard to assess the quality of the query optimizer but a small testset might give some insight: select city.name,river.name from city,river where city.inhabitants > X and distance(city.geometry,river.geometry) < Y;
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Other benchmarking considerations Functionality Usability update behaviour
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues GeoInfoNed – RGI-232
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues GeoInfoNed -- What and Why Build a spatially enabled DBMS because: A DBMS is at the core of many system. If you improve the core the whole system improves. There is a need for an (open source) experimentation platform for Geo DBMS research.
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Who CWI – Leading DBMS experts with MonetDB TUDelft/OTB – Knowledge of spatial processes CycloMedia – Huge dataset and interesting problems RWS/AGI – Large and diverse datasets and interesting problems
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues How At CWI there is the MonetDB DBMS. First we will extend it with basic spatial types (According to OpenGIS). Together with our ‘Problem Holder’ partners we will find directions for more extensions. MonetDB already has support for: Image Data, XML storage and querying etc.
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Example Is there a relationship between traffic accidents and objects near the road? GeoInfoNed
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues MonetDB Introduction* Hardware trends MonetDB design considerations MonetDB architecture *Slides borrowed from CWI
OTB Research Institute for Housing, Urban and Mobility Studies Hardware Trends 50% p/year: - cpu speed - mem size - mem bandwidth - disk bandwidth 1% p/year: - mem latency 10% p/year: - disk latency
OTB Research Institute for Housing, Urban and Mobility Studies Latency is the enemy! Commercial DBMS products (oracle, DB2, SQLserver) stem from OLTP roots focus on minimizing random I/Os => depend on latency! MonetDB: built for bulk access optimize CPU and memory performance Latency is one of the killing factors in Friso’s simplicial homology implementation
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues MonetDB design considerations Multi-model database kernel support Extensible data types, operators, accelerators Database hot-set is memory resident Simple data structures are better Index management should be automatic Do not replicate the operating system Optimize when you know the situation Cooperative transaction management
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Monet kernels MAPI protocol JDBC C-mapi lib Perl End-user application ODBC PHP Python SQL XQuery MonetDB product family Here a MATLAB interface and Frank’s life would be easier
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues MonetDB - Physical data organization Binary Association Tables
OTB Research Institute for Housing, Urban and Mobility Studies Spatial DBMS issues Discussion