Spatial Data Analysis: An Elective Course for Advanced Undergraduates Laura Boehm Vock Gustavus Adolphus College August 1, 2016 lboehmvo@gustavus.edu
Background Liberal arts college with ~2300 students 15-20 math majors per year. 2016 is first year students can graduate with a statistics major. Course satisfies a “capstone” criteria for the major Focus on presentation skills and team projects 5 team projects; Individual final project
Alignment with ASA guidelines Real applications (about half “textbook” and half “real” data) Diverse models and approaches Focus on problem solving Skills: Data manipulation, Simulation, reproducibility, use of discipline-specific knowledge, visualizations, teamwork, communication
Using class time Recapping readings Filling in background stats knowledge Putting knowledge to work in R Meetings with instructor & project teams
Project 3: Ordinary Kriging Team 1: Lili, John, Russ --- Particulate Matter --- 24-hour mean concentration of fine particulate matter (PM2.5) on March 15, 2015. (Yes, 2015). Info about PM2.5 here: http://www3.epa.gov/pm/ Team 2: Lauren, Liam, Kaitlin --- Snow --- Snow totals from snowstorm Feb 2-3, 2016. Cumulative total (inches) as reported by 12:34pm Feb 3. Most were reported between 7-9 am on Feb 3. 1. Plot the observed locations on a map using the map function (from maps package) in R. 2. Plot the observed locations on a map using a Google maps backdrop using the RGooglemaps package. Hint: For 1 & 2 check out AdvancedMapping.Rmd 3. Use ordinary kriging to predict PM/Snow at a grid of spatial locations. Be sure to consider: Which variogram model to use? How to fit parameters in variogram model? (WLS, REML, choosing various starting values) 4. Create images that show the predicted values and the kriging variance. 5. Generate 4 random surfaces from your model and compare them (can use the krige or predict.gstat function, don’t have to do this by hand), and compare to your images from #4 and to the original data.
Teams will produce: 1. A 10-15 minute presentation of the results of both problem 1 and 2. 2. A written document including relevant visualizations, tables, etc, with proper captions and explanations (for both problem 1 and 2) 3. An R script or Rmarkdown document which includes all the code to produce their visualizations. Teams will be graded on: 1. Content of the written document, including quality of visualizations and explanations. (50%) 2. Quality of their presentation including oral communication and visual quality of slides. (30%) 3. Quality and clarity of R code (if I enter appropriate filepaths, can I run it? Are there sufficient comments or notes that I can understand what is going on?) (10%) 4. Contribution to the team (assessed by teammates, and my own observations, e.g. do each speak relatively equally in the presentation?) (10% - will vary per individual)
Course topics Intro/definitions of geostatistical, point process, and areal data. (Gaussian) spatial processes Multivariate normal distribution, defining mean, variance, and covariance through expectations. Stationarity, isotropy Variograms Ordinary Kriging Universal Kriging Extension projects: Indicator kriging, bootstrap interval in OK & UK, anisotropy What are point process data? How do we describe patterns (aggregating, regular, CSR) Checking for CSR: K, L, F, G Monte Carlo testing Intensity estimation Modeling intensity as a Poisson process with covariates Visualizations for marked processes
Primary Skills/Ideas What it means for observations to be correlated Visualization Multiple estimators exist for the same parameter (e.g. WLS and REML for variogram parameters) Variance estimation can be tricky Likelihood estimation Monte Carlo simulation/testing The same statistical idea can work in many areas of application Development of statistical methods is ongoing
Arsenic (As) in MN Groundwater ordinary kriging, universal kriging with geologic soil type predictor
Bee Populations in Finland Tests for spatial heterogeneity
Arsons in Minneapolis point process model with neighborhood income and 2nd order polynomial trend
Challenges Computing, and different levels of skills in computing. Finding an appropriate text Connection/balance of theory and application Balancing needs of our majors with interest from students with little math background Balance of team/individual work during semester
What worked well Team based work, team member evaluations Exciting spatial data sources for projects Lots of creativity in final projects Final poster presentation Sense of accomplishment for students
Books Diggle (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns [3rd ed] Banerjee, Carlin, Gelfand (2014) Hierarchical Modeling and Analysis for Spatial Data [2nd ed] Brunson, Comber (2015) An Introduction to R for Spatial Analysis and Mapping Cressie (1993) Statistics for Spatial Data Data sources gisdata.mn.gov : State datasets from all the agencies. Environmental, agricultural, climate, demographic… Some stuff is point-located, some is county level reports (especially health stuff). www.opentwincities.org/data/ - also includes links to a spreadsheet of all kinds of sources in MN opendata.Minneapolismn.gov/ , nycopendata.socrata.com/, …. EPA also has lots if you search around