"Did your model account for earthworms?" Rick Paik Schoenberg, UCLA

Slides:



Advertisements
Similar presentations
Introduction to modelling extremes
Advertisements

Los Padres National Forest
Lecture 11 (Chapter 9).
Brief introduction on Logistic Regression
Scaling Laws, Scale Invariance, and Climate Prediction
Nguyen Ngoc Anh Nguyen Ha Trang
Christine Smyth and Jim Mori Disaster Prevention Research Institute, Kyoto University.
Multiple Regression [ Cross-Sectional Data ]
1 1.MLE 2.K-function & variants 3.Residual methods 4.Separable estimation 5.Separability tests Estimation & Inference for Point Processes.
1 Applications of point process modeling, separability testing, & estimation to wildfire hazard assessment 1.Background 2.Problems with existing models.
Forecasting occurrences of wildfires & earthquakes using point processes with directional covariates Frederic Paik Schoenberg, UCLA Statistics Collaborators:
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Practical GLM Modeling of Deductibles
Determining and Distributing Real-Time Strategic-Scale Fire Danger Assessments Chris Woodall and Greg Liknes USDA North Central Research Station, Forest.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Simulating global fire regimes & biomass burning with vegetation-fire models Kirsten Thonicke 1, Allan Spessa 2 & I. Colin Prentice
Practical GLM Analysis of Homeowners David Cummings State Farm Insurance Companies.
1 Applications of space-time point processes in wildfire forecasting 1.Background 2.Problems with existing models (BI) 3.A separable point process model.
Comparison of Models for Analyzing Seasonal Activity using Longitudinal Count Data Daniel J. Hocking and Kimberly J. Babbitt University of New Hampshire.
1 1.Definitions & examples 2.Conditional intensity & Papangelou intensity 3.Models a) Renewal processes b) Poisson processes c) Cluster models d) Inhibition.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Incorporating Climate and Weather Information into Growth and Yield Models: Experiences from Modeling Loblolly Pine Plantations Ralph L. Amateis Department.
Entropy generation transient analysis of a grassfire event through numerical simulation E. Guelpa V. VERDA (IEEES-9), May 14-17, 2017, Split, Croatia.
Discussion on significance
23. Inference for regression
Chapter 15 Multiple Regression Model Building
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and 2 Now, we need procedures to calculate  and 2 , themselves.
Notes on Weighted Least Squares Straight line Fit Passing Through The Origin Amarjeet Bhullar November 14, 2008.
Chapter 4: Basic Estimation Techniques
Statistics 350 Lecture 3.
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Chapter 4 Basic Estimation Techniques
Wind and slope contribution in a grassfire second law analysis
Regression Analysis AGEC 784.
Chapter 8 Linear normal models.
26134 Business Statistics Week 5 Tutorial
Notes on Logistic Regression
Some tricks for estimating space-time point process models.
Basic Estimation Techniques
The Index and Payment Solutions of Typhoon Index Insurance for Rubber Trees in Hainan Province of China Xinli Liu1, Tao Ye2, Jing Dong1 , Miluo Yi2, Shuyi.
What is Correlation Analysis?
Linear Mixed Models in JMP Pro
Chapter 2 – Properties of Real Numbers
The Maximum Likelihood Method
26134 Business Statistics Week 6 Tutorial
USGS Getty Images Frederic Paik Schoenberg, UCLA Statistics
Simple Linear Regression - Introduction
Variable Selection for Gaussian Process Models in Computer Experiments
Model evaluation for forecasts of wildfire activity and spread
The Practice of Statistics in the Life Sciences Fourth Edition
Basic Estimation Techniques
I271B Quantitative Methods
Statistics Review ChE 477 Winter 2018 Dr. Harding.
Rick Paik Schoenberg, UCLA Statistics
Rick Paik Schoenberg, UCLA Statistics
FSOI adapted for used with 4D-EnVar
Geology Geomath Chapter 7 - Statistics tom.h.wilson
The impact of occupants’ behaviour on urban energy demand
Our theory states Y=f(X) Regression is used to test theory.
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Tutorial 1: Misspecification
Pemeriksaan Sisa dan Data Berpengaruh Pertemuan 17
Heteroskedasticity.
Korelasi Parsial dan Pengontrolan Parsial Pertemuan 14
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and 2 Now, we need procedures to calculate  and 2 , themselves.
Chapter 3 General Linear Model
Mathematical Foundations of BME Reza Shadmehr
9. Binary Dependent Variables
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

"Did your model account for earthworms?" Rick Paik Schoenberg, UCLA 1. Missing covariates, confounding, and misspecification. 2. Missing covariate results involving point process models. 3. Examples involving eq weather and alternatives to BI.

1. Missing covariates and confounding. Some models for wildfire occurrence use a function of vegetation type, rel. greenness, fuel moisture, precipitation, temperature, windspeed, and perhaps other variables. Missing other confounding variables is inevitable. Even if your model incorporated many other variables, you would likely have a major mis-specification problem. Also, data on some of the variables would be missing or impractical to obtain.

Burning Index (BI) NFDRS: Spread Component (SC) and Energy Release Component (ERC), each based on dozens of equations. BI = [10.96 x SC x ERC] 0.46 Uses daily weather variables, drought index, and vegetation info. Human interactions excluded.

Some BI equations: (From Pyne et al., 1996:) Rate of spread: R = IR x (1 + fw + fs) / (rbe Qig). Oven-dry bulk density: rb = w0/d. Reaction Intensity: IR = G’ wn h hMhs. Effective heating number: e = exp(-138/s). Optimum reaction velocity: G’ = G’max (b / bop)A exp[A(1- b / bop)]. Maximum reaction velocity: G’max = s1.5 (495 + 0.0594 s1.5) -1. Optimum packing ratios: bop = 3.348 s -0.8189. A = 133 s -0.7913. Moisture damping coef.: hM = 1 - 259 Mf /Mx + 5.11 (Mf /Mx)2 - 3.52 (Mf /Mx)3. Mineral damping coef.: hs = 0.174 Se-0.19 (max = 1.0). Propagating flux ratio: x = (192 + 0.2595 s)-1 exp[(0.792 + 0.681 s0.5)(b + 0.1)]. Wind factors: sw = CUB (b/bop)-E. C = 7.47 exp(-0.133 s0.55). B = 0.02526 s0.54. E = 0.715 exp(-3.59 x 10-4 s). Net fuel loading: wn = w0 (1 - ST). Heat of preignition: Qig = 250 + 1116 Mf. Slope factor: fs = 5.275 b -0.3 (tan f)2. Packing ratio: b = rb / rp.

In practice, most of these variables are not recorded, and the BI for instance is estimated using just a few variables like RH, windspeed, temp, veg. type, and precipitation. For a given BI range, area burned varies dramatically based on month. This may be due to missing covariates or to mis-specification.

In practice, most of these variables are not recorded, and the BI for instance is estimated using just a few variables like RH, windspeed, temp, veg. type, and precipitation. For a given BI range, area burned varies dramatically based on month. This may be due to missing covariates or to mis-specification. In general, a model might not be completely invalidated just because of a missing variable. Under some circumstances, the parameters in the model can be estimated consistently in the absence of missing variables. This is well known but has been difficult to find a citation for.

Point process modeling. Consider l(t, x1, …, xk; q). [For fires, x1=location, x2 = area.]

Separable Estimation for Point Processes Consider l(t, x1, …, xk; q). [For fires, x1=location, x2 = area.] Say l is multiplicative in mark xj if l(t, x1, …, xk; q) = q0 lj(t, xj; qj) l-j(t, x-j; q-j), where x-j = (x1,…,xj-1, xj+1,…,xk), and q-j and l-j are defined similarly. If l ~is multiplicative in xj and if one of these holds, then qj, the partial MLE, is consistent (Ogata 1978, Schoenberg 2016): S l-j(t, x-j; q-j) dm-j = g, for all q-j. S lj(t, xj; qj) dmj = g, for all qj. ~ S lj(t, x; q) dm = S lj(t, xj; qj) dmj = g, for all q.

Individual Covariates: If l is multiplicative, lj(t,xj; qj) = f1[X(t,xj); b1] f2[Y(t,xj); b2], X and Y are independent, and the log-likelihood is differentiable w.r.t. b1, then the partial MLE of b1 is consistent. If the missing component j is additive: lj(t,xj; qj) = f1[X(t,xj); b1] + f2[Y(t,xj); b2], and f2 is small (S f2(Y; b2)2 / f1(X;~b1) dm / T ->p 0), ~ then the partial MLE b1 is consistent.

Impact 1. If a variable is missing but it has a small additive effect on the rate, then you can still have consistent parameter estimates. An example is earthquake weather.

Earthquake weather example. l(t,x,y,m) = mr(x,y)exp(n Temp(t) + Kg(t-ti, x-xi, y-yi ; mi), with g(u,x,y ; mi) = (u+c)-p exp{a(mi-M0)} (||x+y||2 + d)-q. model with temp model without temp

Impact 1. If a variable is missing but it has a small additive effect on the rate, then you can still have consistent parameter estimates. An example is earthquake weather. 2. Model building. When building a model with many variables, such as BI, misspecification is a major problem. One way to help avoid misspecification is to look at the effect of one explanatory variable at a time on your response variable, using e.g. kernel smoothing. For instance, one might model the rate as exponential in RH and windspeed, linear in temp, and linear in precip. with a threshold. A simple model built this way, using the same variables as BI, fits much better than BI.

r = 0.16 (sq m)

(sq m) (F)

Relative AICs (Poisson - Model, so higher is better): Model Construction Wildfire incidence seems roughly multiplicative. (only marginally significant in separability test) Windspeed. RH, Temp, Precip. Tapered Pareto size distribution f, smooth spatial background m. [*] l(t,x,a) = f(a) m(x) b1exp(b2RH + b3WS) (b4 + b5Temp)(max{b6 - b7Prec,b8}) Relative AICs (Poisson - Model, so higher is better): Poisson RH BI Model [*] 262.9 302.7 601.1

% of fires correctly alarmed Comparison of Predictive Efficacy False alarms per year % of fires correctly alarmed BI 150: 32 22.3 Model [*]: 34.1 BI 200: 13 8.2 15.1

Conclusions: Build initial attempts at point process models by examining each covariate individually. Surely important variables will be missing from the model. If the missing variables have a small additive effect on the response, or if the missing variables have a purely multiplicative effect, then parameter estimates will be consistent despite the missing variables. References: Ogata Y (1978). The asymptotic behaviour of maximum likelihood estimators for stationary point processes. Annals of the Institute of Statistical Mathematics 30, 243-261. Schoenberg FP (2016). A note on the consistent estimation of spatial-temporal point process parameters. Statistica Sinica 26, 861-879.