“Codd’s Rules for Spatial ORDBMS” How the assumptions of the Relational Model have been relaxed to accommodate geographic data Spatial ORDBMS implementations with Oracle Spatial, PostGIS, and ArcSDE Matt Stuemky Presentation based on Final Paper GEOG 582 (Spring 2008) University of Southern California Department of Geography May 13 2008
Some issues and challenges with storing geographic data in a RDBMS Traditional relational model is excellent for storing and managing business and transactional data, not as well- suited for handling spatial data Spatial data, even simple features (geometry - points, lines, and polygons), are more complex than business data Going beyond storing just vector-based spatial data: how to efficiently store complex, raster-based spatial data? All spatial ORDBMS should incorporate SQL-99 MM Spatial / Open Geospatial Consortium (OGC) standards and specifications to assure interoperability & consistency OODBMS? Why are ORDBMS products favored over true object-oriented DBMS? “Square pegs into round holes” Support object-oriented concepts such as objects, classes, methods, properties, and inheritance RDBMS vendors provide high performance, high security, scalable, very reliable RDBMS products that IT industry has gotten very familiar with for several decades, including for GIS usage Easier for RDBMS vendors to incorporate basic OO concepts into their existing products and provide support for complex data types compared to new vendors introducing true OODBMS solutions Spatial ORDBMS object-relational database management system Storage Indexing Security Backup Versioning Long transactions Database Some issues and challenges with storing geographic data in a RDBMS May 13 2008
Spatial ORDBMS vendor solutions Middleware solutions ESRI: ArcSDE Spatially enables RDBMS including ESRI file geodatabases, Oracle, IBM DB2, Informix, and SQL Server MapInfo: Spatialware Integrated server solutions Oracle: Oracle Spatial PostgreSQL: PostGIS open-source spatial ORDBMS solution Microsoft: SQL Server Spatial (coming, late as usual, in 2008) ArcSDE RDBMS Oracle PostgreSQL Oracle Spatial PostGIS Some advantages of direct integration: All DBMS core capabilities including security, replication, use of triggers in one place A common DBMS interface and tools to access both spatial and non-spatial data Spatial ORDBMS vendor solutions May 13 2008
Rule 1: Basic framework for spatial databases Enhancement of Codd’s First Rule: The Information Rule Spatial ORDBMS must be able to support the storage of geospatial data; all subsequent “rules” are the detailed specifications that support this first rule. Spatial databases in ORDBMS must support (at minimum): Complex (geometry) data types Spatial data within related tables – feature classes, feature datasets Validation rules - subtypes and domains Spatial metadata Spatial reference (coordinate) systems and transformations Topologies and methods for analyzing spatial relationships Geometric networks Multi-dimensional, hierarchical indexes for searching spatial data Storage of both spatial and non-spatial data in the same database PARCEL features Instances of a Feature class Instances of an Object class Rule 1: Basic framework for spatial databases May 13 2008
Rule 2: Simple features and geometry data types Abstract real-world spatial information into distinct, identifiable entities – objects for representing real-world geographic features (Shekhar & Chawla, 2003) Amendment to the relational model: support for geometry data types, including points, lines, polygons, and aggregations of these Conform to the Open Geospatial Consortium (OGC) Simple Features standard Complex (composite) objects, representing real-world geographic features, can be created by incorporating these spatial data types. Examples: Roads from lines, Parcels from polygons, etc. Instances of these spatial objects must be able to be stored and managed within the logical framework of relational tables. Geometry object model © 2005 John Wiley & Sons, Ltd Rule 2: Simple features and geometry data types May 13 2008
Rule 3: Comprehensive spatial data language Enhancement of Codd’s Rule #5: comprehensive data sublanguage and Rule #7: High-level insert, update, and delete “Spatial SQL”: OGC specification, Simple Features for SQL (SFSQL) Use of WKT (Well-Known Text) and WKB (Well-Known Binary) formats for inputting/outputting data Basic methods for creating, updating and deleting spatial data using SQL syntax Operators and methods for querying spatial data Support for spatial analysis functions Examples: distance (between two points), length (of a line), area (of a polygon), buffer, etc. PostGIS: Create a new table; add a spatial column (attribute) to it Create a new instance of a Road feature (spatial object, linestring) CREATE TABLE roads (road_id INTEGER, road_name VARCHAR); SELECT AddGeometryColumn( ’roads’, ’roads_geom’, -1, ’GEOMETRY’, 3 ); INSERT INTO roads (road_id, roads_geom, road_name) VALUES (1,GeomFromText(’LINESTRING(191232 243118,191108 243242)’,-1),’Jeff Rd’); PostGIS: Select query SELECT road_id, road_name FROM roads WHERE roads_geom ~= GeomFromText(’LINESTRING(191232 243118,191108 243242)’,-1); Rule 3: Comprehensive spatial data language May 13 2008
Rule 4: Topologies and evaluation of spatial relationships Enhancement of Codd’s Rule #10: Integrity Independence Spatial ORDBMS must support topologies Store / manage shared geometry (node, edge, and face elements) Enforce spatial data integrity rules Examples: no gaps between parcels, no overlapping parcels, etc. SQL operators and methods for evaluating spatial relationships Spatial relationship methods Equals – same geometries Disjoint – geometries share common point Intersects – geometries intersect Touches – geometries intersect at common boundary Crosses – geometries overlap Within– geometry within Contains – geometry completely contains Overlaps – geometries of same dimension overlap Relate – intersection between interior, boundary or exterior PostGIS: Select geometry objects based on spatial relationship with other geometry objects SELECT id, the_geom FROM geotable WHERE the_geom && ’POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))’ AND _ST_Contains(the_geom,’POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))’); Rule 4: Topologies and evaluation of spatial relationships May 13 2008
Rule 5: Spatial access methods (SAMs) Single-dimensional index methods such as B-trees not sufficient for indexing spatial data; spatial ORDBMS must support multi-dimensional indexing Common spatial access methods (SAMs): Grid, Point Quadtrees, Region Quadtrees, R-trees, and variations of these Oracle Spatial primarily uses R-tree indexes but supports Quadtree-based indexes; PostGIS uses an R-tree index implemented on top of a GiST (Generalized Search Trees) index Region Quadtree R-tree Rule 5: Spatial access methods (SAMs) © 2005 John Wiley & Sons, Ltd May 13 2008
Rule 6: Raster-based geospatial features No OGC-based standards for raster-based spatial data? More complex implementation than simple (geometry) features Support for storing both image-based (satellite remote sensing, airborne photos, etc.) and grid-based (digital terrain elevation, land cover information, etc.) raster data ORDBMS raster architecture: Raster object type Indexing of raster data Raster metadata SQL API for inserts, updates, and queries Requires a substantial amount of storage space and numerous tables in database Proprietary solutions: Oracle Spatial: GeoRaster; now available PostGIS: PGRaster is a very similar implementation to Oracle’s GeoRaster; product is still under development Further demonstrates how far the relational model has been relaxed to accommodate geographic data © 2008 Oracle Corporation Rule 6: Raster-based geospatial features May 13 2008
Rule 7: Spatial data across distributed systems Enhancement of Codd’s Rule #11: Distribution Independence Allow seamless storage, retrieval and exchange of data in spatially-enabled databases (queries with both homogonous and heterogeneous databases) across distributed systems, including network servers and the Internet for web-based systems. Older OGC specifications for simple features distribution between applications and other RDBMS: OLE/COM and CORBA specifications Define how APIs should handle the storage, retrieval, and exchange of vector-based (point, line, polygon) spatial data Newer OGC standards and specifications for spatial data distribution on the Internet/World Wide Web: Web Feature Service (WFS) specification for simple features (vector-based) data Web Coverage Service (WCS) standard for raster-based data Geography Markup Language (GML) encoding standard for exchanging geospatial data Oracle Spatial, PostGIS, and ArcSDE provide support for: Synchronized replication of spatial data between databases over local and wide area networks (also: version reconciliation and long transaction support) Exchange of spatial data on the web using WFS, WCS and GML. Rule 7: Spatial data across distributed systems May 13 2008