GIS Data Models GEOG 370 Christine Erlien, Instructor
GIS Data Models: Why? Knowing how GIS data are structured helps us to use GIS programs more effectively –Basic computer file structures –Database structures
Basic computer file structures What is where? –Computer file structures allow the computer to store, order, & search data Types: –Simple list –Ordered sequential –Indexed file
Basic computer file structures: Simple list Simple List –Most basic –No order, no organization –Input is simple just add on –Searching difficult & inefficient –Example: If my class roster were ordered based on when you added this class
Basic computer file structures: Ordered sequential files Ordered sequential files –Records ordered by alphabetic or numerical character sequence How? Algorithm: divide and conquer –Record compared to records preceding & following to determine which 1/2 to search –Repeat until done –Inserting a record is slow –Searching more efficient than simple list
Basic computer file structures: Ordered sequential files Example file: Chapel Hill Cary Durham Graham Greensboro Raleigh To add: Maggie Valley What’s the process?
Basic computer file structures Indexed files –Database index Can be built for field that uniquely identifies a record (primary key) or other fields Used to determine the location of rows in a file that satisfy some condition Keys & indexes can be extracted & sorted and original file accessed faster than the original file could be sorted –Types Direct: Each record searched for particular properties Inverted: Index based on anticipated search criteria
Indexed files Inverted index Direct index
Advantages –Quicker (i.e., reduces computational time) Disadvantages –Inverted Requires knowledge of likely search criteria Data additions require recalculation of index Basic computer file structures: Indexed files
Databases & Database Structures What is where? –Geographic searches data retrieval –Data retrieval requires data organization
Databases & Database Structures Database: Collection of multiple files –Requires more elaborate structure for management DBMS: Database Management System Database structure types –Hierarchical data structures –Network systems –Relational database systems
Database Structures: Hierarchical Hierarchical data structures –One-to-many (parent-child) relationship –Requires relationship be defined before structure & decision rules developed –Advantage: Easy to search –Disadvantage: Knowledge of all questions that might be asked necessary –Unanticipated criteria make search impossible Large index files memory intensive, slow access
Hierarchical Database Structures
Database Structures: Network Systems Network Systems –Allow users to move from data item to data item through a series of pointers Pointers: Computer structures that direct a piece of data to all others to which it relates (connect one file location to another) –Pointers indicate relationships among data items
Database Structures: Network Systems
Advantages: –Less rigid than hierarchical structure –Can handle many-to-many relationships –Reduce data redundancy –Greater search flexibility Disadvantages: –In very complex GIS databases, the number of pointers can get quite large storage space
Database Structures: Relational Databases Predominant in GIS Tuples: Ordered records/rows of attribute values Primary Key: Unique identifier for each record in a relational table Lu_codeCrop typeStatusCost Row cropsActive1000/ha OrchardsDormant1500/ha RangelandActive900/ha Row cropsActive1100/ha Garden farms Active1250/ha Row cropsDormant1050/ha
Database Structures: Relational Databases Joining tables Relational join –Matching data from one table to corresponding data in another table –How? Link the primary key to the foreign key Primary Key: Unique identifier in 1 st table Foreign key: Column in 2 nd table to which primary key is linked
Database Structures: Relational Databases
Relational DB & Normal Forms Normal forms: A set of rules established to indicate the form tables should take Goal: Reduce database redundancy database performance is better First normal form –Table must contain columns & rows –Columns will be used for searches, so only one value per cell
Second normal form –Every column that is not the primary key should be dependent on the primary key On the entire primary key if primary key is comprised of more than one column Relational DB & Normal Forms | PART | WAREHOUSE | QUANTITY | WAREHOUSE-ADDRESS | Key: Part & Warehouse together Address only dependent on warehouse portion of key | PART | WAREHOUSE | QUANTITY || WAREHOUSE | WAREHOUSE-ADDRESS | Example from William Kent, "A Simple Guide to Five Normal Forms in Relational Database Theory", Communications of the ACM 26(2), Feb. 1983,
Relational DB & Normal Forms Third Normal Form –Nonprimary keys must depend on primary key –Primary key does not depend on any nonprimary key | EMPLOYEE | DEPARTMENT | LOCATION | Key field: Employee Location is redundant & not dependent on key field | EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION |
Normalization of Database Tables Normalization: Process of organizing data in a database –Creating tables & establishing relationships between them according to rules of normal form –Goal: Make the database more flexible by eliminating redundancy and inconsistent dependency
Normalization of Database Tables Problem with data redundancy: –Wastes disk space –Creates maintenance problems If data existing in more than one place must be changed must be changed the same way in each case
Normalization & Normal Forms Describing databases –If the 1st rule is observed, the database is said to be in "first normal form." –If the first 3 rules are observed, the database is considered to be in "third normal form." Additional levels of normalization are possible, but 3rd normal form is considered the highest level necessary for most applications
Recap File types –Simple list –Ordered Sequential –Indexed Databases: Many files –Structure necessary access to data in 1 or more files easier Database types –Hierarchical –Network –Relational