Task 6 Statistical Approaches Bob Youngs NGA Workshop #6 July 19, 2004 July 19, 2004 Peer-NGA Project
Truncated Data Unknown number of recordings where value of yi < Ztrunc , value of xi is unknown (Toro, 1981) July 19, 2004 Peer-NGA Project
Truncated Data Statistical Model July 19, 2004 Peer-NGA Project
Fit to Truncated Data Ignoring Effect July 19, 2004 Peer-NGA Project
Fit Using Truncated Data Model July 19, 2004 Peer-NGA Project
Fit to Simulated Data July 19, 2004 Peer-NGA Project
Fit to Truncated Simulated Data July 19, 2004 Peer-NGA Project
Uncertain and Missing Predictor Variables Uncertain predictors Magnitude Distance/Rupture Geometry Site parameters (discrete and continuous) Missing predictors Rupture Geometry (for smaller events) July 19, 2004 Peer-NGA Project
Predictor Variable Uncertainty General model Y = f(X) + ε Observe W which is imprecisely related to X Two types of error processes Error Model W = f(X) + U applies when one wants X, but cannot measure it precisely – “classical” measurement error Regression Calibration Model X = f(W) + U one can measure W precisely, but quantity of interest X is variable – often applies to laboratory studies July 19, 2004 Peer-NGA Project
Magnitude Uncertainty (Rhoades, BSSA, 1997) Start with random (mixed) effects model Reported magnitude, , contains error δi [N(0,si2)] Revised mixed effects model Solution obtained using “standard” approaches, including analytical inversion of variance matrix July 19, 2004 Peer-NGA Project
Magnitude Uncertainty for NGA Models likely to be non-linear in magnitude Reported magnitude, , contains error δi [N(0,si2)] Revised mixed effects model Variance matrix terms due to error in magnitude i now vary over j, - as a result not analytically invertible July 19, 2004 Peer-NGA Project
Simulation Extrapolation Approach Applied in cases where W=X+U with U N(0,s2) Simulate a series of data sets with increasingly large measurement error Wb,i(λ)=Wi + λ½Ub,i where Ub,i are simulated error terms with 0 mean and variance s2 For each value of λ average the parameters of the model Θ over many simulations to obtain an average value July 19, 2004 Peer-NGA Project
Simulation Extrapolation (continued) Define a functional relationship for Extrapolate back to λ = -1 Coefficients -1 1 2 July 19, 2004 Peer-NGA Project
Example Application of Simulation Extrapolation Approach Applied in cases where W=X+U with U N(0,s2) Simulate a series of data sets with increasingly large measurement error Wb,i(λ)=Wi + λ½Ub,i where Ub,i are simulated error terms with 0 mean and variance s2 For each value of λ average the parameters of the model Θ over many simulations to obtain an average value July 19, 2004 Peer-NGA Project
Assess the Effect of Magnitude Uncertainty Start with a “True” Model Simulate PGA values from “True” model using NGA M-R disribution Calculate mean of model parameters from simulated data sets (parametric bootstrap) Obtain simulated data set where fitted parameters are closest to “True” Model Using data set from 2, increase sigma in M using NGA M values. Obtain mean parameter from 500 simulations of uncertain M July 19, 2004 Peer-NGA Project
Simulated Data July 19, 2004 Peer-NGA Project
July 19, 2004 Peer-NGA Project
July 19, 2004 Peer-NGA Project
July 19, 2004 Peer-NGA Project
July 19, 2004 Peer-NGA Project
July 19, 2004 Peer-NGA Project
Missing Predictor Variables Site classification variables VS30, NEHRP Categories, Other Site Categories, Depth to VS of 1.5 km/sec Rupture geometry variables Directivity variables Hanging wall/footwall determinations Confined to smaller events/distant recordings where effect is believed to be minimal? July 19, 2004 Peer-NGA Project
Reason for Missing Predictors Independent of all data Dependent on value of the missing predictor Dependent on the values of other predictors July 19, 2004 Peer-NGA Project
Pattern of Missing Predictors Univariate Monotone Special Random July 19, 2004 Peer-NGA Project
Missing Data Methods Complete-case analysis Easily implemented Valid inferences when missing predictors depend upon data May lead to elimination of a lot of useful information Useful starting result July 19, 2004 Peer-NGA Project
Missing Data Methods Imputation Multiple Imputation Missing X’s estimated from correlations with other X’s or X’s and Y’s Typically down weight imputed observations Multiple Imputation Simulate multiple data sets incorporating uncertainty in estimated missing X’s Provides method for incorporation effect of uncertainty in imputation on estimation July 19, 2004 Peer-NGA Project
Missing Data Methods Maximum Likelihood Bayesian Simulation Methods Need a model for joint distribution of Y and X, including missing X’s Random missing patterns will need iterative approaches Bayesian Simulation Methods e.g. Gibbs sampler Computer intensive (multiple thousands of simulations) July 19, 2004 Peer-NGA Project
Missing/Uncertain Data If missing X’s are estimated from an external model (e.g. VS30– becomes an uncertain predictor problem Simulation methods appear to be useful for both problems Implement these methods at later stage of model development to obtain final coefficients and their uncertainty Develop an implementation of each developer’s final model to quantify the effects of missing/uncertain data and provide parameter uncertainty July 19, 2004 Peer-NGA Project