Geostatistics: Principles of spatial analysis Anna M. Michalak Department of Civil and Environmental Engineering Department of Atmospheric, Oceanic and Space Sciences The University of Michigan
A.M. Michalak Key Points If the parameter(s) that you are modeling exhibits spatial (and/or temporal) autocorrelation, this feature must be taken into account to avoid biased solutions Spatial (and/or temporal) autocorrelation can be used as a source of information in helping to constrain parameter distributions The field of geostatistics provides a framework for addressing the above two issues
A.M. Michalak Outline Motivation for geostatistical tools What is geostatistics? Traditional applications Application to OCO sampling design Introduction to inverse modeling Application to groundwater contamination Application to CO 2 flux estimation
A.M. Michalak What is Geostatistics? A short answer: An interpolation and extrapolation toolkit A more sophisticated answer: All of the above for modeling spatial relationship of available data and building from such a model (e.g. kriging, stochastic simulation, …) Formal definition Analysis and prediction of spatial or temporal phenomena (e.g. pollutant concentrations, soil porosities, elevations, etc.)
A.M. Michalak Spatial Correlation Measurements in close proximity to each other generally exhibit less variability than measurements taken farther apart. Assuming independence, spatially-correlated data may lead to: 1.Biased estimates of model parameters 2.Biased statistical testing of model parameters Spatial correlation can be accounted for by using geostatistical techniques
A.M. Michalak Parameter Bias Example map of an alpine basin snow depth measurements mean of snow depth measurements (assumes spatial independence) kriging estimate of mean snow depth (assumes spatial correlation) Q: What is the mean snow depth in the watershed?
A.M. Michalak Example cont… H 0 is TRUE 5% H 0 rejected 5% H 0 Rejected H 0 Rejected! H 0 Not Rejected
A.M. Michalak Variogram Model Used to describe spatial correlation z(x) = m(x) + (x)
A.M. Michalak Geostatistics in Practice Main uses: Data integration Numerical models for prediction Numerical assessment (model) of uncertainty
A.M. Michalak Caveats DOESDOESN’T Provide practical solution to real problems Fully automate estimation process Honor data Replace good or additional data Expand from dataCreate data Integrate data Provide causal / physical relationships Save time & effort Geostatistics is a set of decision-making tools
A.M. Michalak Steps in Geostatistical Study Exploratory Data Analysis (EDA) Data cleaning Consistency of data Identification of populations Spatial Continuity Analysis Experimental Analysis, interpretation Quantitative Estimation Uncertainty assessment Account for spatial correlation Integrate hard and soft information Simulation Alternative images of the field Reproduce field heterogeneity Honor all available information
A.M. Michalak Go to Matlab…
A.M. Michalak OCO Satellite Planned launch in September 2008 Will provide global column- integrated CO 2 measurements 1ppm measurement accuracy at a 1000km scale.
A.M. Michalak OCO Measurements 1ppm measurement accuracy at a 1000km scale. Processing all spectral radiances to X CO2 is computationally prohibitive. Limit Sampling to optimal locations
A.M. Michalak OCO Subsampling Strategy Objective: Determine optimal sampling locations as a function of time and space that allow for the interpolation of X CO2 at unsampled locations with estimation error within a set threshold Recent work: Define modeled X CO2 spatial variability using CASA-MATCH data (Olsen and Randerson 2004) subsampled at 1pm local time Preliminary approach for identifying optimal sampling locations
A.M. Michalak Sample Modeled X CO2 Data April July AugustOctober
A.M. Michalak Optimal Sampling Locations Optimal sampling locations = potential sampling locations that will achieve a set estimation error threshold at unsampled locations Estimation error = estimation standard deviation at unsampled locations Geostatistical interpolation tools: Use spatial correlation as a basis of estimation Provide best linear unbiased estimates Quantify associated estimation error
A.M. Michalak Spatial correlation (Variogram model) h1h1 h4h4 h3h3 h2h2 h6h6 h5h Separation Distance, h Semivariance, γ (h) 1
A.M. Michalak Global Spatial Variability ½ variance Correlation Length
A.M. Michalak Global Spatial Variability
A.M. Michalak Local Variability (2000 km radius) 2000 km 5.5 degrees
A.M. Michalak X CO2 Variance and Correlation Length - April Correlation length (km)Variance (ppm 2 )
A.M. Michalak Distance to Achieve 1ppm Uncertainty (h 0 ) h 0 = max distance from the interpolation point to sample for 1ppm error h 0 depends on spatial variability near interpolation point Interpolation at each grid point on a 5.5 o by 5.5 o global grid h 0 =? V max =1ppm
A.M. Michalak Maximum Sampling Interval h 0 - April Maximum sampling interval (km)
A.M. Michalak Regular Grid Sampling Uncertainty AprilJuly
A.M. Michalak Optimal Sampling Locations and Associated Uncertainties AprilJuly
A.M. Michalak Sampling Constraints Aerosols Clouds Satellite track Maximum (sub)sampling rate Albedo Measurement error Temporal aggregation Others?
A.M. Michalak Conclusions from OCO Study X CO2 exhibits strong spatial correlation X CO2 covariance structure is variable in space and time Uniform sampling will not achieve uniform/acceptable interpolation uncertainty Geostatistical tools can be used to incorporate the variability in the X CO2 covariance structure into a subsampling protocol
A.M. Michalak Inverse Modeling
A.M. Michalak Inverse models Geostatistical inverse modeling objective function: H = transport information s = unknown fluxes y = CO 2 measurements R = model-data mismatch covariance Q = spatial/temporal covariance of flux deviations from trend X and = model of the trend Deterministic component Stochastic component
A.M. Michalak Bayesian Inference Applied to Inverse Modeling for Inferring Historical Forcing Posterior probability of historical forcing Prior information about forcing p(y) probability of measurements Likelihood of forcing given available measurements y : available observations (n×1) s : discretized historical forcing (m×1)
A.M. Michalak Dover Air Force Base Case Study Dover Air Force Base located in Delaware, U.S.A. Unconfined aquifer underlain by two-layer aquitard Aquitard cores used to infer PCE and TCE contamination history in aquifer Solute transport controlled by diffusive process:
A.M. Michalak TCE at Location PPC11 Time variation of boundary condition Measured TCE concentration as a function of depth
A.M. Michalak TCE at Location PPC13 Time variation of boundary condition Measured TCE concentration as a function of depth
A.M. Michalak Sources of Atmospheric CO 2 Information North American Carbon Program
A.M. Michalak Longitude Latitude Height Above Ground Level (km) 24 June 2000: Particle Trajectories -24 hours -48 hours -72 hours -96 hours -120 hours What Surface Fluxes to Atmospheric Samples See? Source: Arlyn Andrews, NOAA-GMD
A.M. Michalak Large Regions Inversion TransCom, Gurney et al. (2003) TransCom 3 Sites & Basis Regions
A.M. Michalak Study Goals 1.Estimate carbon fluxes at fine spatial resolution (3.75 o x 5.0 o ) 2.Avoid use of prior flux estimates 3.Incorporate and quantify effect of available auxiliary data Questions: What will be the effect on estimated fluxes and their uncertainties? Is there sufficient information in the atmospheric measurements to “see” the relationship between auxiliary data and fluxes?
A.M. Michalak Auxiliary Data and Carbon Flux Processes: Image Source: NCAR Terrestrial Flux: Photosynthesis (FPAR, LAI, NDVI) Respiration (temperature) Oceanic Flux: Gas transfer ( sea surface temperature, air temperature) Anthropogenic Flux: Fossil fuel combustion (GDP density, population) Other: Spatial trends (sine latitude, absolute value latitude) Environmental parameters: (precipitation, %land use, Palmer drought index)
A.M. Michalak Sample Auxiliary Data
A.M. Michalak Global Inversion Setup Monthly fluxes for 1997 to 2001 at 3.75 o x 5.0 o resolution (s) Atmospheric data from NOAA/ESRL cooperative air sampling network (y) TM3 gridscale basis functions (H) Select subset of auxiliary variables (X) Quantify spatial covariance (Q) Perform inversion to obtain: Influence of auxiliary variables on fluxes (β) Flux best estimates (ŝ) Estimates of uncertainty for s and β ^
A.M. Michalak Final Set of Auxiliary Variables Combined physical understanding with results of VRT to choose final set of auxiliary variables: GDP Density Leaf Area Index (LAI) Fraction of photosynthetically active radiation (FPAR) Percent forest / shrub Precipitation VariableGDPLAIFPARPrecip.F/S + 2 - 2
A.M. Michalak Building up the Best Estimate
A.M. Michalak Location of 22 Transcom Regions
A.M. Michalak Conclusions - Methodology Geostatistical inverse modeling avoids the use of prior flux estimates Covariance structure of flux residuals and model-data mismatch can be quantified using atmospheric data Benefit of auxiliary data can be quantified Fluxes and the influence of auxiliary data are estimated concurrently (w/ uncertainties) Approaches maximizes the use of information while minimizing assumptions Geostatistical inverse modeling not constrained by prior estimates Provides independent validation of bottom-up estimates in well- constrained regions Approach well suited to show inter-annual variability Provides accurate measure of uncertainty
A.M. Michalak Key Points If the parameter(s) that you are modeling exhibits spatial (and/or temporal) autocorrelation, this feature must be taken into account to avoid biased solutions Spatial (and/or temporal) autocorrelation can be used as a source of information in helping to constrain parameter distributions The field of geostatistics provides a framework for addressing the above two issues
A.M. Michalak Acknowledgments Collaborators: Pieter Tans, Adam Hirsch, Lori Bruhwiler, Kevin Schaefer, Wouter Peters, Andy Jacobson NOAA/CMDL Alanood Alkhaled, Sharon Gourdji, Charles Humphriss, Meng Ying Li, Miranda Malkin, Kim Mueller, and Shahar Shlomi, UM Bhaswar Sen and Charles Miller, JPL Kevin Gurney, Purdue U. Peter Kitanidis, Stanford U. Funding sources: Elizabeth C. Crosby Research Award University Corporation for Atmospheric Research (UCAR) National Oceanic and Atmospheric Administration (NOAA) National Aeronautic and Space Administration (NASA) and Jet Propulsions Laboratory (JPL) National Science Foundation (NSF) Michigan Space Grant Consortium (MSGC) Data providers: NOAA / CMDL cooperative air sampling network Seth Olsen (LANL) and Jim Randerson (UCI) Christian Rödenbeck, MPIB Kevin Schaefer, NOAA / ESRL NOAA CDC NASA, EROS USGS, CEISIN, Global Precipitation Climatology Centre, UCAR
A.M. Michalak QUESTIONS?