Types of Data Points: Occurrences, Surveys Polygons: Census, Soils, Refuges Polylines: ? Rasters: Remotely Sensed, Models Volumes: –Marine data 2D + Time: –Climate (PRISM) 4D (3D + Time): –Climate, Currents marineemlab.ucsd.edu
Problems Software/methods do not all support large datasets Performance (i.e. time to develop methods and get final results) Need to “reduce” the size of the data while maintaining the important information Or, get a lot of computers –(more on this later)
File Formats CSV, Txt: Points Shapefiles GeoDatabases “Las” for LiDAR HDF and NetCDF: –General hierarchical data formats –“CF” standard for NetCDF data –ArcGIS supports NetCDF
Data Reduction Point Methods: –Clusters: group related data (spatially, temporally, categorically) –Gridding: find density, mean values –Windowing: moving a “window” over the data (does not reduce processing)
Polygons Generalization/Simplification –Reduce resolution Remove less critical polygons Soil Data for Czech Republic, eusoils.jrc.ec.europa.eu
Temporal Group by: –Month, Season, Decade Model “trends”
Software ArcGIS will work up to a point Then, we have to program –Python: TXT and CSV files Maybe for rasters, ND data –Java: Effectively no limits High performance
Databases The simpler the data is, the faster it is to access: –Small, simple: Text files –Small to Medium, complicated: SQL Databases –Large: Text and binary files –Avoid large, complicated data
BlueSpray Java-based GIS application –Requires Java 7 Built to be: –High-performance –Extensible –Portable –Takes advantage of RAM, processors –Easy to install and use Owned by SchoonerTurtles, Inc. Available at –In early beta
Graphics