Presentation is loading. Please wait.

Presentation is loading. Please wait.

Quick Lesson on Databases Relational databases are key to managing complex data You’ve been using relational databases with “Joins” and “Relates” in ArcGIS.

Similar presentations


Presentation on theme: "Quick Lesson on Databases Relational databases are key to managing complex data You’ve been using relational databases with “Joins” and “Relates” in ArcGIS."— Presentation transcript:

1 Quick Lesson on Databases Relational databases are key to managing complex data You’ve been using relational databases with “Joins” and “Relates” in ArcGIS GeoDatabases are relational databases Structured Query Language (SQL) is the primary language for relational databases You’ve been using SQL statements in ArcGIS to query data

2 Relational Databases Need to represent data with a complex structure Plot TreeSpecies

3 Database Tables What you’ve seen in ArcGIS only more flexible Tables are made up of “fields” (columns) and “records” (rows) Queries are used to combine and subset tables into new tables Each table should have a unique, integer, ID, referred to as a primary key –Greatly improves query performance

4 Field Data Types Numeric –Float or integer –Auto numbered, use for primary keys Dates –YYYY-MM-DD HH:MM:SS.SS –2013-04-05 14:23:12.34 Text –Specified width –“Variant” width Binary Large Objects (BLOB)

5 What’s Wrong With This? Tree Query LATLONMEASYEARMEASMONMEASDAYCOMMON_NAMEHT 45.446392-122.2361071995622Douglas-fir49 45.446392-122.2361071995622Douglas-fir27 45.446392-122.2361071995622Douglas-fir95 45.446392-122.2361071995622Douglas-fir66 45.446392-122.2361071995622Douglas-fir118 45.446392-122.2361071995622Douglas-fir76 45.446392-122.2361071995622Douglas-fir147 45.456116-122.3977741995622Douglas-fir185 45.456116-122.3977741995622Douglas-fir105 45.456116-122.3977741995622Douglas-fir105 45.456116-122.3977741995622Douglas-fir89 45.193054-122.516671996623Douglas-fir90 45.193054-122.516671996623Douglas-fir95 45.193054-122.516671996623Douglas-fir96 45.193054-122.516671996623Douglas-fir99

6 Relational Databases Allow us to “relate” tables to: –Reduce the overall amount of data Removes duplicates –Makes updates much easier –Improves search speeds

7 Entity-Relationship Diagram ERD –Unified Markup Language (UML) Plot TreeSpecies Entities Relationships One to one One to many Many to many Relationship Types

8 IDLatLonYearMonthDay 145.446392-122.2361071995622 245.193054-122.516671995622 Plot IDCommon Name 1Douglas-fir 2Ponderosa Pine Species IDPlotIDSpeciesIDHeight 11149 21127 31195 41166 511118 …1…… 122190 132195 Tree Primary Key Foreign Key

9 Database Normalization 1.Eliminate duplicate columns from the same table 2.Move fields that have “duplicate” row entries and move them to a related table 3.All field entries should be dependent on the primary key 4.There should be only one primary key in each table

10 Database Dictionary Defines each of the tables and fields in a database A database forms the basis for data management behind many GIS projects, web sites, and organizations Proper documentation is key to long term success! –Database design (including ERDs) –Database Dictionary

11 Geospatial Databases Not required to store spatial data! Provide: –Field types for spatial data: point, polyline, polygon, etc. –Spatial operations: union, intersect, etc. –Spatial queries: return records that overlap with a polygon, etc. –Some provide spatial reference control

12 Relational Databases Enterprise-Level –SQL Server –PostgreSQL –MySQL –Oracle –Sybase File-Level –Geodatabase –MS-Access

13 What we really want What we need from a database: –Distributed, concurrent access (concurrency) –Automatic Backup –Version control –Unlimited amounts of data –Quick data access –Inexpensive –Broad OS Support –File-level copying –GeoSpatial queries, operations, data types

14 What we have SQL ServerPostgreSQLESRI Geodatabase MS-Access ConcurrencyYes No Automatic backup Yes No VersioningNo Data Size100s of millions 100,000? PerformanceFast GoodPoor Cost$600 per CPUFree~$10,000 w/ArcGIS ~$400 OSWindowsAnyWindows File-level copyNo Yes Spatial QueriesYes No Spatial data types Yes No Spatial operations Yes No

15 Structured Query Language (SQL) Comes from the database industry “INSERT”, “DELETE”, and “SELECT” rows in tables Very rich syntax Portions of “SELECT” grammar used heavily in ArcGIS: –Selecting attributes –Raster calculator –Geodatabases

16 Transaction SQL “SQL” is a subset of T-SQL T-SQL allows full management of a database: –Create & drop: Tables, fields/columns, relationships, indexes, views, etc. –Administrative functions Varies some between databases

17 Using SQL All Databases have “query editors” that allow us to write, save, edit, and use SQL queries Use programming languages to “write” queries and “fetch” records from the database

18 SQL: SELECT SELECT Field1, Field2 FROM TableName JOIN TableName2 WHERE Filter1 AND Filter 2 GROUP BY Field1,Field2 ORDER BY Field1 [DESC], Field2 [DESC]

19 Selecting Fields SELECT * –Returns all fields as new table SELECT Field1,Field2 SELECT Table1.Field1,Table2.Field1 –Return specified fields SELECT Table1.Field1 AS NewName –Avoids name collisions

20 IDLatLonYearMonthDay 145.446392-122.2361071995622 245.193054-122.516671995622 Plot IDCommon Name 1Douglas-fir 2Ponderosa Pine Species IDPlotIDSpeciesIDHeight 11149 21127 31195 41166 511118 …1…… 122190 132195 Tree

21 Example 1: All Fields SELECT * FROM Tree Returns all the records and fields in tree IDPlotIDSpeciesIDHeight 11149 21127 31195 41166 511118 …1…… 122190 132195

22 Example 2: Specific Fields SELECT PlotID, Height FROM Tree Returns all rows but only specified fields PlotIDHeight 149 127 195 166 1118 1… 290 295

23 Example 3: Speific Rows SELECT PlotID, SpeciesID FROM Tree WHERE Height>50 Returns all rows but only specified fields PlotIDHeight 195 166 1118 1…

24 Selecting Tables FROM Table1 –Returns contents of one table FROM Table1 INNER JOIN Table2 ON Table2.ForeignKey=Table1.PrimaryKey –Returns records from Table2 that match primary keys in Table1 –Does not return all rows in Table1

25 Example 4: Joining SELECT PlotD,Lat,Lon,Height FROM Trees INNER JOIN Plots ON Trees.PlotID=Plots.ID PlotIDHeightLatLon 14945.446392-122.236107 12745.446392-122.236107 19545.446392-122.236107 16645.446392-122.236107 …

26 Example 4: Joining SELECT PlotD, Lat, Lon, Height, Common_Name FROM Trees INNER JOIN Plots ON Trees.PlotID=Plots.ID INNER JOIN Species ON Trees.SpeciesID= Species.ID PlotIDHeightLatLonCommon_Name 14945.446392-122.236107Douglas-fir 12745.446392-122.236107Douglas-fir 19545.446392-122.236107Douglas-fir 16645.446392-122.236107Douglas-fir …

27 Selecting Tables (con’t) FROM Table1 OUTER JOIN Table2 ON Table2.ForeignKey=Table1.PrimaryKey –Returns all matches between Table1 and Table2 and any records in Table1 that don’t match records in Table2 –Missing values are NULL

28 Filters or “WHERE” clauses SELECT * FROM Table1 WHERE (Field1 Operator Value1) BooleanOperator (Field1 Operator Field2)

29 Filter Examples WHERE: –ID = 1 –Area < 10000 –Area <= 10000 –Name = “Crater Lake” (case dependent) –Name LIKE “Crater Lake” (ignores case, except in PostgreSQL!) Notice: –String values have double quotes –Syntax for strings vary some between databases

30 SQL Comparisons Equals: = Greater than: > Less than: < Greater than or equal: >= Less than or equal: <= Not equal: <> Like: case independent (except in PostgreSQL), string comparison with wild cards (%) –In PosgreSQL use “upper(..)” or “lower(..)”

31 Boolean Operators ABA AND BA OR BNOT ANOT B TTTTFF TFFTFT FTFTTF FFFFTT

32 More Complex Filter Examples WHERE: –Name LIKE “Hawaii” AND Area < 10000 –Species LIKE “Ponderosa” AND DBH > 1

33 ORDER BY SELECT * FROM Table 1 ORDER BY LastName DESC, FirstName DESC Careful with performance on large datasets and string fields

34 GROUP BY Aggregates data SELECT Species,AVG(Height) FROM Trees GROUP BY Species Only aggregated fields can appear in SELECT list

35 SQL INSERT INSERT INTO TableName (Field1,Field2) VALUES (Value1,”Value2”) String values must be in quotes –Other values can also be in quotes If the table has an “auto numbered” ID field, it will be added automatically Otherwise, very difficult to set the ID field

36 SQL DELETE DELETE FROM TableName WHERE ID=Value - Deletes one row DELETE FROM Plot WHERE PlotID=12 - Deletes all rows with PlotID=12 DELETE FROM TableName - Deletes everything in TableName!

37 Database Performance Default Search Indexed Search Primary Key Search

38 Indexes Added to a table –Typically for one field Adds overhead to INSERT and DELETEs Important for: –Large tables –Complex queries –Especially text searches!

39 Maintaining Performance Always use integer, auto numbered primary keys Avoid iterative or hierarchical queries Sometimes code is faster: –Do simple query, load into RAM and sort With REALLY big data, don’t use SQL –NoSQL, accessing data directly, without the use of a relational database package –There are “NoSQL” products in the works Avoid text searches and sorts

40 Rasters and Databases Don’t put rasters into a database! –Makes it impossible to backup and restore the database –Put a file path to the rasters in the database


Download ppt "Quick Lesson on Databases Relational databases are key to managing complex data You’ve been using relational databases with “Joins” and “Relates” in ArcGIS."

Similar presentations


Ads by Google