Space-Time Data Modeling A Review of Some Prospects Upmanu Lall Columbia University.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Spatial point patterns and Geostatistics an introduction
Spatial point patterns and Geostatistics an introduction
Space-Time The ESRI Time Project – Comments by Steve Kopp
TNO orbit computation: analysing the observed population Jenni Virtanen Observatory, University of Helsinki Workshop on Transneptunian objects - Dynamical.
SPATIAL DATA ANALYSIS Tony E. Smith University of Pennsylvania Point Pattern Analysis Spatial Regression Analysis Continuous Pattern Analysis.
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.

Deterministic Solutions Geostatistical Solutions
Statistics, data, and deterministic models NRCSE.
Dynamic Flood Risk Conditional on Climate Variation: A New Direction for Managing Hydrologic Hazards in the 21 st Century? Upmanu Lall Dept. of Earth &
Providing distributed forecasts of precipitation using a Bayesian nowcast scheme Neil I. Fox & Chris K. Wikle University of Missouri - Columbia.
STAT 592A(UW) 526 (UBC-V) 890-4(SFU) Spatial Statistical Methods NRCSE.
Why Geography is important.
Space-time Modelling Using Differential Equations Alan E. Gelfand, ISDS, Duke University (with J. Duan and G. Puggioni)
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Stat 592A Spatial Statistical Methods NRCSE.
Statistical Tools for Environmental Problems NRCSE.
Spatial statistics 2 Stat 518 Sp 08. Ordinary kriging where and kriging variance.
Groundwater permeability Easy to solve the forward problem: flow of groundwater given permeability of aquifer Inverse problem: determine permeability from.
Review of Lecture Two Linear Regression Normal Equation
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Time Series Analysis.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Spatial Statistics in Ecology: Area Data Lecture Four.
Regional climate prediction comparisons via statistical upscaling and downscaling Peter Guttorp University of Washington Norwegian Computing Center
Geographic Information Science
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Spatial Statistics in Ecology: Continuous Data Lecture Three.
It’s About Time Mark Otto U. S. Fish and Wildlife Service.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Spatio-Temporal Surface Vector Wind Retrieval Error Models Ralph F. Milliff NWRA/CoRA Lucrezia Ricciardulli Remote Sensing Systems Deborah K. Smith Remote.
Spatial Interpolation III
9.3 and 9.4 The Spatial Model And Spatial Prediction and the Kriging Paradigm.
Mixed Effects Models Rebecca Atkins and Rachel Smith March 30, 2015.
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
Estimation Method of Moments (MM) Methods of Moment estimation is a general method where equations for estimating parameters are found by equating population.
Short course on space-time modeling Instructors: Peter Guttorp Johan Lindström Paul Sampson.
Statistics……revisited
Exposure Assessment for Health Effect Studies: Insights from Air Pollution Epidemiology Lianne Sheppard University of Washington Special thanks to Sun-Young.
Creating Hydrologic Information Systems David R. Maidment Utah State University 9 February 2004.
WCRP Extremes Workshop Sept 2010 Detecting human influence on extreme daily temperature at regional scales Photo: F. Zwiers (Long-tailed Jaeger)
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
INTEGRATING SATELLITE AND MONITORING DATA TO RETROSPECTIVELY ESTIMATE MONTHLY PM 2.5 CONCENTRATIONS IN THE EASTERN U.S. Christopher J. Paciorek 1 and Yang.
Stochastic Hydrology Random Field Simulation Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Spatial Point Processes Eric Feigelson Institut d’Astrophysique April 2014.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Space-time processes NRCSE. Separability Separable covariance structure: Cov(Z(x,t),Z(y,s))=C S (x,y)C T (s,t) Nonseparable alternatives Temporally varying.
Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.
 Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing.
Biointelligence Laboratory, Seoul National University
Why Model? Make predictions or forecasts where we don’t have data.
Jeffrey Anderson, NCAR Data Assimilation Research Section
Chapter 7. Classification and Prediction
The general linear model and Statistical Parametric Mapping
Chapter 5 Part B: Spatial Autocorrelation and regression modelling.
The General Linear Model
CH 5: Multivariate Methods
The General Linear Model (GLM)
Paul D. Sampson Peter Guttorp
Stochastic Hydrology Random Field Simulation
Filtering and State Estimation: Basic Concepts
Arc Hydro and Time Series
The general linear model and Statistical Parametric Mapping
The General Linear Model (GLM)
Environmental Statistics
The General Linear Model
Presentation transcript:

Space-Time Data Modeling A Review of Some Prospects Upmanu Lall Columbia University

Irregularly recorded water quality data form an Attribute Series A point feature class defines the spatial framework Many variables defined at each point Time of measurement is irregular May be derived from a Laboratory Information Management System Field samples LaboratoryDatabase

Fecal Coliform in Galveston Bay (Irregularly measured data, ) Coliform Units per 100 ml Tracking Analyst Demo

Nexrad over South Florida Real-time radar rainfall data calibrated to raingages Received each 15 minutes 2 km grid Stored by SFWMD in Arc Hydro time series format

Time series from gages in Kissimmee Flood Plain 21 gages measuring water surface elevation Data telemetered to central site using SCADA system Edited and compiled daily stage data stored in corporate time series database called dbHydro Each time series for each gage in dbHydro has a unique dbkey (e.g. ahrty, tyghj, ecdfw, ….)

Domain of Applications Given Space-time data on one or more variables: –Forecast or conditionally simulate process at unobserved space-time locations One variable conditional on others or on s-t index Multivariate field Arbitrary process model or related to physics –Smooth or filter data to recover process Interpret residuals as “noise” or high or low frequency space- time correlated random field that may relate to covariates Aggregate-disaggregate process in space and/or time –Clustering, classification, fusion, mining, risk assessment, insurance, data assimilation Build on same basic ideas, but ……lives to fight another day

Key Concepts and Building Blocks Linear Model Generalized Linear Model Generalized Least Squares Generalized Additive Model –Nonlinear, Nonparametric Mixture Models Multi-Resolution/Frequency Domain Models Random Fields State Space Models Bayesian Models Hierarchical Bayesian Models  Recommended Framework

Linear Model y = X  +e e~N(0,  2 ); X = {1, t, sin(  t), log(t), loess(t)} y i ~N(  i,  2 )   =x i  e.g., TREND MODEL with uncorrelated errors Generalized Linear Model Two major changes 1. y i assumed to come from any member of the exponential family, e.g., Binomial, Gamma, Poisson, Gumbel…. 2. Link Function (transformed mean is linear in predictors) Example: y is rain or no rain – Binomial Link function: logistic reg. Generalized Least Squares Allow noise process to be spatially or temporally correlated e.g., e~N(0,  ), where  is a covariance matrix Then Recursive Max. Likelihood Solution Example: Serially Correlated Errors (AR1) and X={1} Likelihood Models

Summary For space-time, spatial or time series models, we can consider a common general framework:  Data i = Trend (mean i ) + correlated noise i  Data may correspond to a non-normal model, including mixtures of exponential family members (GLM)  mean i can depend nonlinearly on space or time index or covariates  Correlated noise can be modeled as a time series process, or using variograms; space and time correlations are possible (GLS)  Typically stationary noise processes are considered, but mixtures can be used to build nonstationary models  Spatial correlation functions can be parametric or nonparametric  If mean i is a constant, and the noise process is weakly stationary (correlation depends only on lag or separation), then  for time data, we have a traditional time series model,  for spatial processes, the ordinary Kriging model, and  for space-time data we could have a markov random field model  GLS+GLM=Likelihood Models  a nice and obvious segway to Bayesian Estimation

The setting Data Z(s,t) at multiple, irregularly sampled spatial locations s at certain times t –Sampling in space usually irregular and sparse, but could be on a grid, and data may represent changing support –Z(.,.) may be multivariate: represent a vector of variables at each sampling point, or the same variable at a point and/or an areal value (multi-resolution) –Time is ordered, but space is not. Space may be continuous, time may be continuous or discrete Cases to consider: –Fixed spatial locations, all sampled at the same time – forecast or conditionally simulate process at other times Can estimate space and time covariance matrices for this set –Space and time sampling locations vary

GLM Mean function can depend on s, t and covariates as before For data irregularly sampled in time, a “variogram” like idea can be used to compute correlations – define and evaluate form and parameters

Example – Spatial Time Series We have N locations and T times at which we wish to model process: all locations have data at fixed time (missing values allowed) State Space Model/Dynamic Linear Model Data: Z(s,t)Process: Y(s,t) Measurement Equation: Z t = F t Y t + e t e t ~N(0,  t ),  t = N*N spatial covariance State Equation: Y t = G t Y t-1 + n t n t ~N(0,  n t ),  n t = P*P spatial covariance Z t = N*T data matrix Y t =P*T state space F t = N*P Observation to State map G t = P*P matrix These two matrices are allowed to change with time  nonstationary model G t could be lag 1 auto and cross-correlations  lot of parameters!! F t could be Identity if P=N, else it can be used to map obs sites to grid averages

An example of a space-time model that follows from this formulation is: The data Z(s,t) is modeled as a space-time mean field + measurement or high frequency noise The mean process is related to a set of covariates or predictors (can be s, t) The “regression” coefficient of this mean function is modeled as a spatially averaged random walk process with correlation across predictors (  ) spatially varying random walk perturbation in the coefficients with correlation across perturbation in  and scale control

How can we estimate a reliable model that has so many parameters and structure? Need a lot of data Recognize that there is a lot of shared information in space-time data sets Model structure allows exploration of spatial and temporal means and effects – mean can be separable also Bayesian and Hierarchical Bayesian Models for Inference

y 

From C. Wikle w/o permission

The H matrix is now a tridiagonal matrix  dramatic reduction in the number of parameters from assuming a spatial neighborhood model State Equation

From C. Wikle w/o permission

X = space-time data matrix on habitat covariates, e.g., human population, temperature, precipitation, land use So, diffusion coefficient can be spatially and temporally variable conditional on predictors Can build similar relations for growth etc

From C. Wikle w/o permission

u Log( )

References Banerjee, S., B. R. Carlin, A. E. Gelfand, Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall, 2004 Gelman, A., et al, Bayesian Data Analysis, Chapman and Hall, 2004 Anselin, L., Space-Time Models, Gelfand, A. E., On the change of support problem for space-time data, Biostatistics, 2001, 2(1), P Kyriakidis, P.C., and A.G. Journel, Geostatistical Space–Time Models: A Review, Mathematical Geology, 1999, 31(6), Cesare, L. D., D. Meyers, D. Posa, Estimating and modeling space-time correlation structures, Statistics & Probability Letters, 2001, 51, Wikle, C.K., L. M. Berliner, N. Cressie, Hierarchical Bayesian space-time models, Env. And Ecological Statistics, 1998, 5, Wikle, C.K., Ralph F. Milliff, Doug Nychka, and L. Mark Berliner, Spatiotemporal Hierarchical Bayesian Modeling: Tropical Ocean Surface Winds, JASA, 2001, 96(454), Huang, H-C, G. Johannesson and N. Cressie, Multi-Resolution Spatio-Temporal Modeling,