Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav A. Dror & David M. Steinberg July /40 International Conference on DOE – Nankai University
2/40 Overview Introduction – Designs for GLM’s Local D-optimal Designs Robust Designs Sequential Designs Conclusions Robust Experimental Design for multivariate GLM Technical reports and MATLAB macros available at
3/40 D-optimal GLM designs Theory like that for linear model, but with a crucial, difference. Fisher’s information matrix changes: F T F F T WF D-optimality: maximize (Local D-optimal, and Local D-Efficiency)
4/40 Introduction – Visualization
5/40 Introduction – Main Objectives Construction of an algorithm to find Local D-optimal Designs Generalization: From locally optimal designs into robust designs (which take account of the uncertainty in the model parameters) Further robustness – for different link functions, linear predictors, etc. Sequential design – use data to estimate the model and improve the design as the experiment runs.
6/40 Overview Introduction Local D-optimal Designs Robust Designs Sequential Designs Conclusions Robust Experimental Design for multivariate GLM
7/40 Local D-optimal designs – Algorithm Mimics algorithms for linear models. Main element – a row exchange procedure. Rows are added or deleted, weighting the regression functions in accord with the mean value. Timing: 1 second for a 16 point Poisson regression with 5 variables + interactions (accuracy 2 decimal places)
8/40 Introduction Local D-optimal Designs Robust Designs –Clustering: Motivating Example –Clustering vs. Bayesian Designs –Clustering vs. Compromise Designs –Linear Predictor and Link function Robustness –Ink Production Example Sequential Designs Conclusions Overview Robust Experimental Design for multivariate GLM
9/40 Clustering – Motivating Example Proximity of 25 local D-optimal designs for a logistic model with intercept value uncertainty
10/40 Overview Introduction Local D-optimal Designs Robust Designs –Clustering: Motivating Example –Clustering vs. Bayesian Designs –Clustering vs. Compromise Designs –Linear Predictor and Link function Robustness –Ink Production Example Conclusions Robust Experimental Design for multivariate GLM
11/40 CLUSTERING vs. BAYESIAN DESIGNS (1) Chaloner & Larntz (1989) Design Criterion: maximize the mean (over a prior distribution) of the information matrix log determinant Their optimal Bayesian Design: Uses 7 support points Reported value of for the criterion
12/40 Both designs (almost) meet sufficient requirements for optimality proof CLUSTERING vs. BAYESIAN DESIGNS (2) K-means Clustering over 100 Local Designs Local Designs’ coefficients: Low-Discrepancy sequence (Niederreiter’s)Niederreiter Number of Support Points Average Log Determinant of the Information Matrix Chaloner and Larntz (1989) Reported Value Evaluated over 10,000 Coefficients vectors
13/40 CLUSTERING vs. BAYESIAN DESIGNS (3) Expect Bayesian to be generally better But… If Clustering does not fall much: Simplicity of creation Considerably less computational needs Extension to multivariate problems – almost trivial
14/40 Overview Introduction Local D-optimal Designs Robust Designs –Clustering: Motivating Example –Clustering vs. Bayesian Designs –Clustering vs. Compromise Designs –Linear Predictor and Link function Robustness –Ink Production Example Conclusions Robust Experimental Design for multivariate GLM
15/40 Clustering vs. Multivariate Compromise Designs (1) Woods, Lewis, Eccleston and Russell (Technometrics, May 2006): –A method for finding exact designs for experiments in which there are several explanatory variables –Use Simulated Annealing to find a design with the same criterion as Chaloner & Larntz –They note that evaluating the integral is too computationally intensive for incorporation within a search algorithm, and therefore average over a partial set
16/40 Clustering vs. Multivariate Compromise Designs (2) Crystallography experiment –4 variables (rate of agitation during mixing, volume of composition, temperature and evaporation rate) –Affect the probability that a new product is formed –First order logistic model (with no interactions) –16 (/48) observations –Parameter space: (demonstrating algorithm’s superiority) Performance evaluated using median and minimum Local D-Efficiencies relative to 10,000 random parameter vectors
17/40 Clustering vs. Multivariate Compromise Designs (3) Minimum Efficiency Median Efficiency Design Standard 2 4 factorial Woods’ Compromise design
18/40 Clustering vs. Multivariate Compromise Designs (4) Clustering procedure (1): –First, created Local Designs for 100 parameter vectors (Neiderreiter sequence)Neiderreiter –1,600 points K-means clustering (K=16) 30 seconds 0.25 seconds Minutes Minimum Efficiency Median Efficiency Design Standard 2 4 factorial Woods’ Compromise Clustering (1) [0.06,0.12][0.38,0.42]
19/40 Clustering vs. Multivariate Compromise Designs (5) Clustering procedure (2): –Choose the cluster with highest average log determinant of information matrix, over N clustering repetitions: Minutes Minimum Efficiency Median Efficiency Design Standard 2 4 factorial Woods’ Compromise Clustering (1) [0.06,0.13] 0.42 [0.416,0.430] Clustering (2)
20/40 Clustering vs. Multivariate Compromise Designs (6) Fast procedure Examine effect of # of Support points Number of Support Points Approximate Efficiency Median Efficiency Minimum Efficiency 20 seconds
21/40 Clustering vs. Multivariate Compromise Designs (7) Minutes Minimum Efficiency Median Efficiency Design Standard 2 4 factorial Woods’ Compromise Clustering (1) Clustering (2) [0.141, ] [0.415, 0.432] Clustering (3) Crystallography experiment - summary
22/40 Clustering vs. Multivariate Compromise Designs (6) Advantageous byproduct of clustering: Number of Support Points Approximate Efficiency Median Efficiency Minimum Efficiency 20 seconds
23/40 Overview Introduction Prior Work Local D-optimal Designs Robust Designs –Clustering: Motivating Example –Clustering vs. Bayesian Designs –Clustering vs. Compromise Designs –Linear Predictor and Link function Robustness –Ink Production Example Conclusions Robust Experimental Design for multivariate GLM
24/40 Robustness for Linear Predictors and Link functions (again from Woods et al.) 2 variables 2 linear predictors: with / without interactions 2 link functions: Probit / CLL Given (known) coefficients values
25/40 Overview Introduction Local D-optimal Designs Robust Designs –Clustering: Motivating Example –Clustering vs. Bayesian Designs –Clustering vs. Compromise Designs –Linear Predictor and Link function Robustness –Ink Production Example Conclusions Robust Experimental Design for multivariate GLM
26/40 Ink Production Example (1) A Poisson Model 5 Variables Normally Distributed Coefficients values uncertainty Uncertainty about interaction effects Centroid design reasonably efficient
27/40 Ink Production Example (2) 5 Tubes, each with different chemical Each tube: Chosen concentration (fixed volume) Ink quality classification: # of imperfect marks (on a standard printed test page) Low concentrations – low quality, unusable High concentrations – expensive Model building based on experts opinions
28/40 Ink Production Example (3) Model building based on experts opinions
29/40 Ink Production Example (4) Full Factorial D-Efficiency:
30/40 Ink Production Example (5) Cluster Design D-Efficiency:
31/40 Ink Production Example (6) Centroid Design D-Efficiency:
32/40 Ink Production Example (6) Centroid Design D-Efficiency Cluster Design D-Efficiency
33/40 Ink Production Example (7) Efficiency Equivalent Sample Size
34/40 Overview Introduction Local D-optimal Designs Robust Designs Sequential Designs Conclusions Robust Experimental Design for multivariate GLM
35/40 Sequential Designs Good design requires knowledge of coefficients. Use the data thus far to assess the model and the coefficients. Augment the design accordingly. Bayesian framework is natural. Robust Experimental Design for multivariate GLM
36/40 Sequential Designs Current methods: Bruceton (Dixon and Mood 1948) Langlie (1965) Neyer (1994) Wang, Smith & Ye (2006) Robust Experimental Design for multivariate GLM
37/40 Sequential Designs Robust Experimental Design for multivariate GLM Our method can be applied with many factors and in both fully sequential and group-sequential settings. Current methods are limited to: One-factor experiments. Fully sequential experiments.
38/40 Efficiency Comparison Efficiency One-stage ROBUST SEQUENTIAL Median: % quantile: 0.30 Median: % quantile: points
39/40 Overview Introduction Local D-optimal Designs Robust Designs Sequential Designs Conclusions Robust Experimental Design for multivariate GLM
40/40 Summary & Conclusions Local D-optimal designs for GLM can be easily found Clustering a database of local D-optimal designs creates a robust design Clustering is Robust for many uncertainty types: –parameter space, linear predictors, link functions, … Simple procedure, minimal computational resources Speed allows exploration of various designs and investigation of different number of support points Outperforms more sophisticated and complex design optimization methods Efficient sequential designs by combining the ideas with a Bayesian updating approach.