GIS Data Models: Vector INLS 110-111 GIS Digital Information: Uses, Resources & Software Tools Prepared by: Mary Ruvane PhD Candidate, SILS
GIS Data Models The real world can only be depicted in a GIS through the use of models that define phenomena in a manner that computer systems can interpret, as well perform meaningful analysis There are two significant model categories at present: Graphic Models: Vector and Raster Database Models/Structures: simple lists, sequential files, indexed files, hierarchal files, network files, relational database, etc. Today we’ll touch upon real-world entities translated into database objects, but our main focus will be on the models used in representing GIS data graphically (Vector vs Raster)
Real World > Data Needed Basic carrier of information = entity Real-world phenomenon not divisible into phenomena of the same kind An entity consists of: Type Classification Attributes Relationships The real world model determines which data need to be acquired Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p. 37
Entity: Type Classification Assumes identical occurrences can be classified Each entity type must be unique (no ambiguity) e.g., detached house classified under house; not industrial building Some entities may need to be categorized e.g., roadways as a class: with categories for national highways, urban roads, private roads Entity type also known as qualitative data or in statistics the ‘nominal scale’
Entity: Attributes Each entity type may have one or more attributes e.g., buildings may have attributes characterizing material (frame or masonry), as well number of stories Attributes may describe quantitative data ranked in three levels of accuracy Ordinal (Ranks) Good Better Best Interval (numeric) Age Income Ratio (scale) Length Area
Real World > Data Modeling Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p 38. Fig 3.1. (simplified model of real world) The process of interpreting reality by using both a real world and a data model is called data modeling. Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p 38.
Real World > Modeling Process Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p 39. Fig 3.2. (Modeling process) The process of interpreting reality by using both a real world and a data model is called data modeling. Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p 39. Fig 3.2.
Modeling: Geometric & Attribute Data Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p. 40. Fig 3.3. Geographical data can be divided into geometric data & attribute data. Attribute data can in turn be subdivided into qualitative data and quantitative data. Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p. 40.
Modeling: Attribute Data Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. pp 40. Fig 3.4. Attribute data consist of qualitative (type of object) or quantitative data (ordinal, interval, ratio) Quantitative data can be categorized into ordinal data, which specify quality by the use of text; interval data, data arranged into classes along a continuous scale; and ratio data, data measured in relation to a zero starting point). Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. pp 40.
Modeling: Entity Relations Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. pp 40. Fig 3.5. Relationships between entities: Pertains/belong to: A depth figure pertains to a specific shoal, or a pipe belongs to a larger network of contiguous pipes Comprises: A country or state comprises counties, which in turn comprise townships. located in/on: A particular building is located on a specific property. borders on: Two properties have a common border. Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. pp 40.
Data Model > Entities as Objects Real-world entities correspond to database objects carrier of information = entity > object(s) Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p 42. Fig 3.6. Image: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p 42.
Objects Characterized by: Type (unique ID, type code/object class) Attributes (qualitative/quantitative data) Relations (calculable vs. attributable) Geometry (point, line, area/polygon) Quality (accuracy, resolution, coverage extent, representation, etc.) Real world models & entities cannot be realized directly in databases, partly because a single entity may comprise several objects. E.g., ‘Oak Tree Road’ may be represented as a compilation of all the roadway sections between intersections/stop signs, with each of the sections carrying object information. Prior to creating GIS data the criteria for dividing a roadway in sections must be selected before the roadway can be described.
Object: Spatial Component Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p 43. Fig 3.7. Graphical Representation of Objects Spatial Component: Points (no dimensions): simplest graphical representation. Eg corner or property boundary, coordinate of a building location. Scale determines whether object is defined as point or an area. Large-scale representation of a building may be shown as an area, whereas in small-scale it likely would be a point (symbol) Lines (one dimension): Connect at least two points & used to represent objects that may be defined in one dimension. Eg property boundaries, electric power lines, telecommunication cables. Roads & rivers may be either lines or areas, depending on the scale. Areas/polygons (two dimensions): Used to represent objects defined in two dimensions. Eg Lake, area of woodland, township. Again, scale determines whether object represented by area or point. Areas are delineated by at least three connecting lines, each of which comprise points. Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p 43.
Object: Attribute Component Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p. 43. Fig 3.7. Attribute Representation of Objects: Attribute values: same as entity attributes of real-world model. Describe an objects features-- qualitative (type of object) or quantitative data (ordinal, interval, ratio). In practice object attributes are stored in tables, with objects on lines and attributes in columns. Relations: May be calculated from Coordinates of an object. Eg line intersections or area overlaps Object structure (relation) e.g. beginning & end points of a line, lines that form a polygon, or locations of polygon on either side of a line Relations that must be entered as attributes. Eg. Levels of crossing roads that don’t intersect, division of a county into townships Quality: Graphical accuracy (such as +/- 1.0 m accuracy) Updating ( when & how data should be updated) Resolution/detailing (whether roads should be represented by lines or both edges) Extent of geographical coverage, attributes included, etc Logical consistency between geometry & attributes Representation: discrete vs. continuous Relevance: (where input may be a surrogate for original data that are unobtainable) Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p. 43.
Basic Data Models (Graphics) There are two types of GIS Data Models: (models used for graphic representation of geographic space) Vector Raster Note: A database structure need seldom be made to suit a data model. But a well prepared data model is vital for a successful GIS analysis. We will discuss database structures/models further in a separate presentation.
Vector vs Raster Graphics Image Source: Burrough, Peter A. and Rachael A. McDonnell. (1998). Principles of Geographic Information Systems. p 27. fig. 2.6. The different ways of graphically displaying data encapsulated by (a) left – vector entity models, and (b) right – raster models. Image Source: Burrough, Peter A. and Rachael A. McDonnell. (1998). Principles of Geographic Information Systems. p 27.
Vector Data Models/Structures One model for representing geographic space Spatial locations are explicit Relationships between entities/objects are implicit Points associated with single set of coordinates (X, Y) Lines are a connected sequence of coordinate pairs Areas are a sequence of interconnected lines whose 1st & last coordinate points are the same Whereas in Raster models/structures the attribute is explicitly stored and the object/entity location is implied on the basis of its generalized position in the grid structure
Vector Data Models/Structures Model most representative of dimensionality as it appears on a map Entity data and attribute data kept in separate files, perhaps a DBMS, which links them A line consists of 2 or more coordinate pairs, with its attributes stored separately More complex lines made up of many line segments Exactness > depends on level of generalization/scale
Variety of Vector Models Spaghetti model Topological model (most common) Triangulated irregular network (TIN) Dime files and TIGER files Network model Digital Line Graph (DLG) Shapefile (ArcView/ArcGIS; ESRI) Others: HPGL, PostScript/ASCII, CAD/.dxf
Vector Model: Spaghetti Source: Lakhan, V. Chris. (1996). Introductory Geographical Information Systems. p. 54. fig. 4.10. Non-topological Source: Lakhan, V. Chris. (1996). Introductory Geographical Information Systems. p. 54.
Vector Model: Topological Source: Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p. 62. fig. 4.12. The idea behind the topology model is that all geometric objects digital map data can be represented by nodes and links (a). The objects’ attributes and relationships can be described by storing nodes and links in three tables: (b) a polygon table (c) a node topology table and (d) a link topology table (e) an additional table gives the objects’ geographical coordinates and is stored separately from the attribute data files Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p. 62. fig. 4.12.
Why Topology Matters Connections & relationships between objects are independent of their coordinates Overcomes major weakness of spaghetti model – allowing for GIS analysis (Overlaying, Network, Contiguity, Connectivity) Requires all lines be connected, polygons closed, loose ends removed. Source: Clarke, Keith C. (2001). 3rd Ed. Getting Started with Geographic Information Systems. p. 86.
Vector Model: TIN Source: Demers, Michael. N. (2000). 2nd Ed. Fundamentals of Geographic Information Systems. p. 117. fig 4.19. tessellate: to form into a mosaic pattern. tessellation: a mosaic, typically consisting of small square stones Source: Demers, Michael. N. (2000). 2nd Ed. Fundamentals of Geographic Information Systems. p. 117.
Vector Model: Dime files and TIGER files GBF/DIME model Vector Model: Dime files and TIGER files TIGER model Image Source: Demers, Michael. N. (2000). 2nd Ed. Fundamentals of Geographic Information Systems. p. 113. fig 4.16. DIME files and TIGER files. Example of topological vector data models: the GBF/DIME model the TIGER model the POLYVRT model Original source: Parts (a) and (b) modified from K.C. Clark, Analytical and Computer Cartography (1990). Prentice Hall, Inc., Englewood Cliffs, NJ. A number of topological data models exist and in common use – GBF/DIME (geographic base file/dual independent map encoding): created by US Bureau of the Census to automate storage of street map data for the decennial census (1969). Each segment ends when it either changes direction or intersects another line Nodes identified with codes Directional codes assigned “from node; to node” – facilitates error checking street address & UTM coordinates for each link are explicitly defined – facilitating address matching Disadvantage: no order in which line segments occur in system – searching sequential slow POLYVRT (POLYgon conVeRTer): developed by Peucker & Chrisman & later implemented at Harvard Lab for Computer Graphics Like TIGER, POLYVRT allows for selective retrieval of specific entity types based on their codes: points, lines, polygons . Eliminates storage & search inefficiencies of the basic topological model by separately storing each type of entity (points, lines, polygons). Separate objects then linked in a hierarchical data structure w/points relating to lines, in turn related to polygons, all through the use of pointers. Each collection of line segments, collectively called chains begins & ends w/specific nodes (intersection between 2 chains) advantage: reduces storage required, faster retrieval disadvantage: difficult to detect incorrect pointers for any given polygon until actually retrieved POLYVRT model Image Source: Demers, Michael. N. (2000). 2nd Ed. Fundamentals of Geographic Information Systems. p. 113. fig 4.16.
Vector Model: TIGER (US Census Bureau) Image Source: Clarke, Keith C. (2001). 3rd Ed. Getting Started with Geographic Information Systems. p 92. fig 3.16. (Tiger data structure) TIGER formats are from the enumeration maps of the US Census Bureau; designed for use with the 1990 US census Contain topology; predecessor of GBF/DIME files which popularized topological data structures Consist of an arc/node arrangement, w/separate files for points, lines, & areas linked by cross reference. TIGER terminology calls points zero cells, lines one cells, and areas two cells. By cross indexing some features can be encoded as landmarks e.g. rivers, roads, buildings TIGER files exist for entire US, including PR, Virgin Islands, & Guam TIGER files are block level maps of every village, town and city, and include geocoded block faces with address ranges of street numbers Address matching function is possible – large % of GIS use depends on this capability Data are referenced to US Census – providing capability to analyze population, ethnicity, housing, economic, and much much more! While topologically correct, criticized as to geographical accuracy. Fortunately GIS functions have been added to improve accuracy and many data suppliers have enhanced TIGER files. Image Source: Clarke, Keith C. (2001). 3rd Ed. Getting Started with Geographic Information Systems. p 92.
Vector Graphic: TIGER Example (Goleta, CA) Image Source: Clarke, Keith C. (2001). 3rd Ed. Getting Started with Geographic Information Systems. p 91. fig 3.15 (TIGER files plotted for Goleta, California) Image Source: Clarke, Keith C. (2001). 3rd Ed. Getting Started with Geographic Information Systems. p 91.
Vector Model: DLGs Image Source: Clarke, Keith C. (2001). 3rd Ed. Getting Started with Geographic Information Systems. p. 90. fig. 3.13. (DLG coding format) Widely used format: A great deal of important data available DLG format of the US Geological Survey’s (USGS) National Mapping Division Available in two scales: 1:100,000 (most of the country); 1:24,000 format covers only small portion of country but offers great detail Uses ground coordinates in UTM, truncated to nearest 10 meters Features handled in separate files – hydrology, hypsography (contours & topographic features), transportation, political, etc. Many GIS pkgs can import these files – often require data manipulation especially making records conform to some fixed line length in bites. Image Source: Clarke, Keith C. (2001). 3rd Ed. Getting Started with Geographic Information Systems. p. 90
Vector Graphic: DLG Example Image Source: Clarke, Keith C. (2001). 3rd Ed. Getting Started with Geographic Information Systems. p. 91. fig. 3.14. (Sample DLG obtained from the USGS) Image Source: Clarke, Keith C. (2001). 3rd Ed. Getting Started with Geographic Information Systems. p. 91
Vector Model: Network Source: Heywood, Ian and Sarah Cornelius and Steve Carver. An Introduction to Geographical Information Systems. p. 60. fig. 3.14. Fig. 3.14 Link, turn and stop impedances affecting the journey of a delivery van [or tourist, sales person, etc.] Not just for transportation analysis: stream networks, switches/valves in utility networks, rail/air routes. Source: Heywood, Ian and Sarah Cornelius and Steve Carver. An Introduction to Geographical Information Systems. p. 60. fig. 3.14.
Vector Model: Shapefile (ArcGIS; ESRI) This table represents examples of the shape types of geographic features in a data set for a shapefile Source: Demers, Michael. N. (2000). 2nd Ed. Fundamentals of Geographic Information Systems. p. 114. fig 4.17. Non-topological vector model Despite efficiencies of topological data structures, advances in computer processing speeds & storage capacity may have reduced need for explicit topology. This table represents examples of the shape types of geographic features in a data set for a shapefile. Stores geometry & attribute information for geographic features in a data set Geometry for a feature is stored as a shape comprising a set of vector coordinates linked to their attributes; there are 14 shape types (see slide) each describing particular entities or entity combinations Shapefiles usually comprise 3 separate & distinct types of files: main files, index files, and database tables. main file (e.g., counties.shp) is a direct access, variable record length file that contains the shape as a list of vertices. index file (e.g., counties.shx) contains character length & offset (spaces) information for locating the values database table/dBase (e.g., counties.dbf) that contains the attributes that describe the shapes. shapefile generally requires less processing power than topological counterpart. Demers, Michael. N. (2000). 2nd Ed. Fundamentals of Geographic Information Systems. p. 114. fig 4.17.
Vector Model: Others (HPGL, CAD/.dxf PostScript/ASCII,) Source: Clarke, Keith C. (2001). 3rd Ed. Getting Started with Geographic Information Systems. p. 89. fig. 3.12. Some alternative industry-standard vector formats. Headers have been removed. Graphic is the same 4 point rectangle in each case. Distinction between industry & commonly used standards fro GIS data is 1) formats that preserve actual ground coordinates of data and 2) those that use and alternative page coordinate description of the map. The latter being used when a map is drafted for display in a computer mapping program or in the data display (ArcView=Layout View) of a GIS. HPGL (Hewlett Packard Graphics Language): Page description language designed for use with plotters and printers. Files are plain ASCII text. Each line of file contains one move command, so a line segment connects two successive lines or points. Format works w/a minimum of header information so files can be easily written or edited. Header can be manipulated to change scaling, size, colors, etc. HPGL is unstructured format – does NOT store topology. PostScript: Another page description language, developed by Adobe Corp. for desktop & professional publishing products. Most laser quality printers use it as the printer device control format. PostScript uses ASCII files & has complex headers which control a large number of functions: fonts, patterns, scaling, etc. In GIS, PS usually used to export or print a finished map. Coordinates are given with respect to a printed page. AutoCad.dxf: Popular CAD package by Autodesk has made .dxf (digital exchange format) commonplace. AutoCAD Map GIS software uses these formats as well. Simple ASCII files. Very large and mandatory file header contains huge amounts of metadata & file default info. DXF does NOT support topology. It does allow user to maintain information in separate layers. DXF is importable by almost all GIS packages other than those that use raster formats. Source: Clarke, Keith C. (2001). 3rd Ed. Getting Started with Geographic Information Systems. p. 89. fig. 3.12.
Vector Data Structures/Models Advantages Good representation of entity data models Compact data structure Topology can be described explicitly – therefore good for network analysis Coordinate transformation & rubber sheeting is easy Accurate graphic representation at all scales Retrieval, updating and generalization of graphics & attributes are possible Source: Principles of Geographic Information Systems. p 70
Vector Data Structures/Models Disadvantages Complex data structures Combining several polygon networks by intersection & overlay is difficult; uses considerable computer power Display & plotting often time consuming and expensive; especially high quality drawings, coloring, and shading Spatial analysis within basic units such as polygons is impossible without extra data because they are considered to be internally homogeneous Simulation modeling of processes of spatial interaction over paths not defined by explicit topology is more difficult than with raster structures because each spatial entity has a different shape & form. Source: Principles of Geographic Information Systems. p 70
Raster Data Structures/Models Advantages Simple data structures Location-specific manipulation of attribute data is easy Many kinds of spatial analysis and filtering may be used Mathematical modeling is easy because all spatial entities have a simple, regular shape The technology is cheap Many forms of data are available Source: Principles of Geographic Information Systems. p 70
Raster Data Structures/Models Disadvantages Large data volumes Using large grid cells to reduce data volumes reduces spatial resolution; loss of information & inability to recognize phenomenologically defined structures Crude raster maps are inelegant though graphic elegance is becoming less of a problem Coordinate transformations are difficult & time consuming unless special algorithms & hardware are used and even then may result in loss of information or distortion of grid cell shape. Source: Principles of Geographic Information Systems. p 70