Variable Selection for Gaussian Process Models in Computer Experiments

Slides:



Advertisements
Similar presentations
Rachel T. Johnson Douglas C. Montgomery Bradley Jones
Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department
Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.
Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.
Introduction to Design of Experiments by Dr Brad Morantz
Design Rule Generation for Interconnect Matching Andrew B. Kahng and Rasit Onur Topaloglu {abk | rtopalog University of California, San Diego.
Teaching Statistics Using Stata Software Susan Hailpern BSN MPH MS Department of Epidemiology and Population Health Albert Einstein College of Medicine.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan Florida: January, 2006.
Data Basics. Data Matrix Many datasets can be represented as a data matrix. Rows corresponding to entities Columns represents attributes. N: size of the.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan April, 2006.
Experiments and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan January, 2006.
Modeling and Validation of a Large Scale, Multiphase Carbon Capture System William A. Lane a, Kelsey R. Bilsback b, Emily M. Ryan a a Department of Mechanical.
Crystal Linkletter and Derek Bingham Department of Statistics and Actuarial Science Simon Fraser University Acknowledgements This research was initiated.
Using Resampling Techniques to Measure the Effectiveness of Providers in Workers’ Compensation Insurance David Speights Senior Research Statistician HNC.
Yaomin Jin Design of Experiments Morris Method.
Corinne Introduction/Overview & Examples (behavioral) Giorgia functional Brain Imaging Examples, Fixed Effects Analysis vs. Random Effects Analysis Models.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Multifactor GPs Suppose now we wish to model different mappings for different styles. We will add a latent style vector s along with x, and define the.
STA 216 Generalized Linear Models Meets: 2:50-4:05 T/TH (Old Chem 025) Instructor: David Dunson 219A Old Chemistry, Teaching.
GoldSim Technology Group LLC, 2006 Slide 1 Sensitivity and Uncertainty Analysis and Optimization in GoldSim.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Categorical Independent Variables STA302 Fall 2013.
July 11, 2006Bayesian Inference and Maximum Entropy Probing the covariance matrix Kenneth M. Hanson T-16, Nuclear Physics; Theoretical Division Los.
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
Dario Grana and Tapan Mukerji Sequential approach to Bayesian linear inverse problems in reservoir modeling using Gaussian mixture models SCRF Annual Meeting,
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.
Uncertainty Quantification and Bayesian Model Averaging
Data Modeling Patrice Koehl Department of Biological Sciences
Why Model? Make predictions or forecasts where we don’t have data.
Bayesian Semi-Parametric Multiple Shrinkage
Sofija Zagarins1, PhD, Garry Welch1, PhD, Jane Garb2, MS
BINARY LOGISTIC REGRESSION
Group Analyses Guillaume Flandin SPM Course London, October 2016
Chapter 3: Maximum-Likelihood Parameter Estimation
ASEN 5070: Statistical Orbit Determination I Fall 2014
Bayesian data analysis
Dr.MUSTAQUE AHMED MBBS,MD(COMMUNITY MEDICINE), FELLOWSHIP IN HIV/AIDS
2nd Level Analysis Methods for Dummies 2010/11 - 2nd Feb 2011
Analyzing Redistribution Matrix with Wavelet
STA 216 Generalized Linear Models
Design and Analysis of Nonlinear Models for the Mars 2020 Rover
Random Effects & Repeated Measures
STA 216 Generalized Linear Models
Comparisons among methods to analyze clustered multivariate biomarker predictors of a single binary outcome Xiaoying Yu, PhD Department of Preventive Medicine.
Paul D. Sampson Peter Guttorp
Filtering and State Estimation: Basic Concepts
SMEM Algorithm for Mixture Models
? Dynamical properties of simulated MEG/EEG using a neural mass model
Contrasts & Statistical Inference
Presented by Nagesh Adluru
Unit XI: Data Analysis in nursing research
Generally Discriminant Analysis
Fixed, Random and Mixed effects
Multivariate Methods Berlin Chen
Thinking critically with psychological science
Contrasts & Statistical Inference
Bayesian Inference in SPM2
Facultad de Ingeniería, Centro de Cálculo
Contrasts & Statistical Inference
Uncertainty Propagation
Presentation transcript:

Variable Selection for Gaussian Process Models in Computer Experiments Crystal Linkletter and Derek Bingham Department of Statistics and Actuarial Science Simon Fraser University David Higdon and Nick Hengartner Statistical Sciences Discrete Event Simulations Los Alamos National Laboratory Kenny Q. Ye Department of Epidemiology and Population Health Albert Einstein College of Medicine Introduction Computer simulators often require a large number of inputs and are computationally demanding. A main goal of computer experimentation may be screening, identifying which inputs have a significant impact on the process being studied. Gaussian spatial process (GASP) models are commonly used to model computer simulators. These models are flexible, but make variable selection challenging. We present reference distribution variable selection (RDVS) as a new approach to screening for GASP models. Gaussian Spatial Process Model To model the response from a computer experiment, we use a Bayesian version of the GASP model originally used by Sacks et al. (1989): y(X): Simulator response – (n x 1) vector X: Input to the computer code – (n x p) design matrix : White-noise process, independent of z(X) The Gaussian spatial process, z(X), is specified to have mean zero and covariance function Under this parameterization, if k is close to one, the kth input is not active. RDVS is a method for gauging the relative magnitudes of the correlation parameters k. Results Simulated Example We used a 54-run space-filling Latin hypercube design with p=10 factors. The response is generated by: A GASP model is used to analyse the generated response and the RDVS algorithm is used to identify the first four factors as active: Posterior distributions for correlation parameters of 10 factors. The horizontal line marks the 10th percentile of the reference distribution. Correlation parameters with posterior medians below this line indicate active factors. Taylor Cylinder Experiment A 118-run 5-level nearly-orthogonal design was used. Exploratory analysis suggests factor 6 is important, otherwise significant factors are difficult to identify: RDVS identifies factor 6 and six other factors as having a significant impact on cylinder deformation. Discussion RDVS is able to correctly identify when none of the true factors are active. This variable selection technique complements methods in sensitivity analysis. It can be used as a precursor to alternative visualization and ANOVA approaches to screening. The method is robust to the specification of the prior distributions. Since the inert variable is assigned the same prior as the true factors, the method self-calibrates. Conclusions and Future Research RDVS is a new method for variable selection for Bayesian Gaussian Spatial Process models. The methodology is motivated by asking: what would the posterior distribution of the correlation parameter for an inert factor look like given the data? The approach is Bayesian and only requires the generation of an inert factor, but the screening has a frequentist flavour, using the distribution of the inert factor as a reference distribution. Future research: Using a linear regression model for the mean of the GASP model Using RDVS for variable selection for other models. Computer Experiment Example Taylor Cylinder Experiment (Los Alamos National Lab) This is a finite element code used to simulate the high velocity impact of a cylinder. In the experiment, copper cylinders (length 5.08 cm, radius 1 cm) are fired into a fixed barrier at a velocity of 177 m/s. The cylinder length after impact is used as the outcome. The process is governed by 14 parameters which control the behaviour of the cylinder after impact. Over the limited range that the computer experiment exercises the simulator, it is expected that the response is dominated by only a few of the 14 parameters. RDVS Algorithm To implement RDVS, a factor which is known to be inert is appended to the design matrix X. This provides a benchmark against which the other input factors can be compared. Algorithm Augment the design matrix by adding a new design column corresponding to an inert factor. Find the posterior median of the correlation parameter corresponding to the dummy factor. Repeat steps 1. and 2. many times to obtain the distribution of the posterior median of an inert factor to use as a reference distribution. Compare the posterior medians of the correlation parameters of the true factors to the reference distribution. The percentile of the reference distribution used for comparison reflects the rate of falsely identifying an inert factor as active. Acknowledgements This research was initiated while Linkletter, Bingham and Ye were visiting the Statistical Sciences group at Los Alamos National Laboratory. This work was supported by a grant from the Natural Sciences and Engineering Research Council of Canada. Ye’s research supported by NSD DMS-0306306.