Space-Filling DOEs Design of experiments (DOE) for noisy data tend to place points on the boundary of the domain. When the error in the surrogate is due.

Slides:



Advertisements
Similar presentations
Rachel T. Johnson Douglas C. Montgomery Bradley Jones
Advertisements

TARGET DETECTION AND TRACKING IN A WIRELESS SENSOR NETWORK Clement Kam, William Hodgkiss, Dept. of Electrical and Computer Engineering, University of California,
What Could We Do better? Alternative Statistical Methods Jim Crooks and Xingye Qiao.
Space-Filling DOEs These designs use values of variables inside range instead of at boundaries Latin hypercubes, one popular space- filling DOE uses as.
Sampling plans for linear regression
Describing Quantitative Variables
ECG Signal processing (2)
Pattern Recognition and Machine Learning
Sampling plans Given a domain, we can reduce the prediction error by good choice of the sampling points The choice of sampling locations is called “design.
Copyright © Cengage Learning. All rights reserved.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Reliability based design optimization Probabilistic vs. deterministic design – Optimal risk allocation between two failure modes. Laminate design example.
Searching for the Minimal Bézout Number Lin Zhenjiang, Allen Dept. of CSE, CUHK 3-Oct-2005
Visual Recognition Tutorial
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Recent Development on Elimination Ordering Group 1.
INTEGRALS 5. INTEGRALS We saw in Section 5.1 that a limit of the form arises when we compute an area.  We also saw that it arises when we try to find.
8-1 Quality Improvement and Statistics Definitions of Quality Quality means fitness for use - quality of design - quality of conformance Quality is.
Petter Mostad Linear regression Petter Mostad
PROBABILITY AND SAMPLES: THE DISTRIBUTION OF SAMPLE MEANS.
Linear Discriminant Functions Chapter 5 (Duda et al.)
12.3 – Measures of Dispersion
Monte Carlo Methods in Partial Differential Equations.
Lecture II-2: Probability Review
Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area.
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
PATTERN RECOGNITION AND MACHINE LEARNING
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
Sampling Methods  Sampling refers to how observations are “selected” from a probability distribution when the simulation is run. 1.
Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Polynomials and other functions. Graphing Polynomials Can you find the end behavior? Can you identify the zeros, roots, x-intercepts, or solutions? Can.
1 E. Fatemizadeh Statistical Pattern Recognition.
4. Numerical Integration. Standard Quadrature We can find numerical value of a definite integral by the definition: where points x i are uniformly spaced.
1 Using Multiple Surrogates for Metamodeling Raphael T. Haftka (and Felipe A. C. Viana University of Florida.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
More About Clustering Naomi Altman Nov '06. Assessing Clusters Some things we might like to do: 1.Understand the within cluster similarity and between.
5-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. May 28, 2009 Inventory # Chapter 5 Six Sigma.
International Conference on Design of Experiments and Its Applications July 9-13, 2006, Tianjin, P.R. China Sung Hyun Park, Hyuk Joo Kim and Jae-Il.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Chapter 7 Statistical Inference: Estimating a Population Mean.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 CLUSTER VALIDITY  Clustering tendency Facts  Most clustering algorithms impose a clustering structure to the data set X at hand.  However, X may not.
ESD.70J Engineering Economy Module - Session 21 ESD.70J Engineering Economy Fall 2010 Session Two Xin Zhang – Prof. Richard de Neufville.
INTEGRALS We saw in Section 5.1 that a limit of the form arises when we compute an area. We also saw that it arises when we try to find the distance traveled.
5 INTEGRALS.
Copyright © 2005, SAS Institute Inc. All rights reserved. Statistical Discovery. TM From SAS. 1 Minimum Potential Energy Designs Bradley Jones & Christopher.
Optimization formulation Optimization methods help us find solutions to problems where we seek to find the best of something. This lecture is about how.
01/26/05© 2005 University of Wisconsin Last Time Raytracing and PBRT Structure Radiometric quantities.
EE201C Final Project Adeel Mazhar Charwak Apte. Problem Statement Need to consider reading and writing failure – Pick design point which minimizes likelihood.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Graphs Another good way to organize this data is with a Graph. Graph – a diagram that shows a relationship between two sets of numbers. So do we have two.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
Estimating standard error using bootstrap
Sampling plans for linear regression
Plotting in Excel KY San Jose State University Engineering 10.
A Primer on Running Deterministic Experiments
Questions from lectures
4. Numerical Integration
Reliability based design optimization
COORDINATE PLANE The plane containing the "x" axis and "y" axis.
Mathematical Foundations of BME
Advanced Algebra Unit 1 Vocabulary
Sampling Plans.
Presentation transcript:

Space-Filling DOEs Design of experiments (DOE) for noisy data tend to place points on the boundary of the domain. When the error in the surrogate is due to unknown functional form, space filling designs are more popular. These designs use values of variables inside range instead of at boundaries Latin hypercubes uses as many levels as points Space-filling term is appropriate only for low dimensional spaces. For 10 dimensional space, need 1024 points to have one per orthant.

Monte Carlo sampling Regular, grid-like DOE runs the risk of deceptively accurate fit, so randomness appeals. Given a region in design space, we can assign a uniform distribution to the region and sample points to generate DOE. It is likely, though, that some regions will be poorly sampled In 5-dimensional space, with 32 sample points, what is the chance that all orthants will be occupied? –(31/32)(30/32)…(1/32)=1.8e-13.

Example of MC sampling With 20 points there is evidence of both clamping and holes The histogram of x 1 (left) and x 2 (above) are not that good either.

Latin Hypercube sampling Each variable range divided into n y equal probability intervals. One point at each interval

Latin Hypercube definition matrix For n points with m variables: m by n matrix, with each column a permutation of 1,…,n Examples Points are better distributed for each variable, but can still have holes in m-dimensional space.

Improved LHS Since some LHS designs are better than others, it is possible to try many permutations. What criterion to use for choice? One popular criterion is minimum distance between points (maximize). Another is correlation between variables (minimize). Matlab lhsdesign uses by default 5 iterations to look for “best” design. The blue circles were obtained with the minimum distance criterion. Correlation coefficient is The red crosses were obtained with correlation criterion, the coefficient is

More iterations With 5,000 iterations the two sets of designs improve. The blue circles, maximizing minimum distance, still have a correlation coefficient of compared to for the red crosses. With more iterations, maximizing the minimum distance also reduces the size of the holes better. Note the large holes for the crosses around (0.45,0.75) and around the two left corners.

Reducing randomness further We can reduce randomness further by putting the point at the center of the box. Typical results are shown in the figure. With 10 points, all will be at 0.05, 0.15, 0.25, and so on.

Empty space In higher dimensions, the danger of large holes is greater. The figure is taken from paper by Goel et al. (details in notes). It compares LHS design on right with D-optimal design (optimal for noisy data). Instead of maximizing minimum distance it seems that it would be better to minimize the volume of the largest void. Why don’t we do that? Figure 2. Illustration of the largest spherical empty space inside the three-dimensional design space (20 points): (a) D-optimal design and (b) LHS design.

Mixed designs D-optimal designs may leave much space inside. LHS designs may leave out the boundary and lead to large extrapolation errors. It may be desirable to combine the two. In low dimensional spaces you can add the vertices to LHS designs. In higher dimensional spaces you can generate a larger LHS design and choose a D-optimal subset.

Problems Write a routine to generate LHS designs and iterate using the two criteria and compare how well you do against lhsdesign for 10 points in 2 dimensions. Compare the maximum minimum distance obtained with 1,000 iterations of lhsdesign when you generate (n+1)(n+2) points in n dimensions (typical number used to fit a quadratic polynomial), for n=2, 4, 6.