C. F. Jeff Wu+ (joint with Roshan Joseph+ & Tirthankar Dasgupta* )

Slides:



Advertisements
Similar presentations
Sampling plans for linear regression
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Partially Observable Markov Decision Process (POMDP)
Monte Carlo Methods and Statistical Physics
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
Resource Management of Highly Configurable Tasks April 26, 2004 Jeffery P. HansenSourav Ghosh Raj RajkumarJohn P. Lehoczky Carnegie Mellon University.
Planning under Uncertainty
Kuang-Hao Liu et al Presented by Xin Che 11/18/09.
x – independent variable (input)
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
Dasgupta, Kalai & Monteleoni COLT 2005 Analysis of perceptron-based active learning Sanjoy Dasgupta, UCSD Adam Tauman Kalai, TTI-Chicago Claire Monteleoni,
Optimization Methods One-Dimensional Unconstrained Optimization
Radial Basis Function Networks
Elements of the Heuristic Approach
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.
Engineering Statistics ENGR 592 Prepared by: Mariam El-Maghraby Date: 26/05/04 Design of Experiments Plackett-Burman Box-Behnken.
Chapter 11Design & Analysis of Experiments 8E 2012 Montgomery 1.
Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.
An Efficient Sequential Design for Sensitivity Experiments Yubin Tian School of Science, Beijing Institute of Technology.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
Probabilistic Algorithms Evolutionary Algorithms Simulated Annealing.
1 Efficient experimentation for nanostructure synthesis using Sequential Minimum Energy Designs (SMED) V. Roshan Joseph +, Tirthankar Dasgupta* and C.
Robust Synthesis of Nanostructures C.F.Jeff Wu* Georgia Institute of Technology (joint with Tirthankar Dasgupta*, Christopher Ma +, Roshan Joseph*, Z L.
Javad Azimi, Ali Jalali, Xiaoli Fern Oregon State University University of Texas at Austin In NIPS 2011, Workshop in Bayesian optimization, experimental.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Dense-Region Based Compact Data Cube
A Primer on Running Deterministic Experiments
Bounded Nonlinear Optimization to Fit a Model of Acoustic Foams
Heuristic Optimization Methods
12. Principles of Parameter Estimation
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Line Fitting James Hayes.
A paper on Join Synopses for Approximate Query Answering
Adapted from Nanosense
Clustering (3) Center-based algorithms Fuzzy k-means
Dynamical Statistical Shape Priors for Level Set Based Tracking
Artificial Intelligence (CS 370D)
Comparing Genetic Algorithm and Guided Local Search Methods
Markov chain monte carlo
Hidden Markov Models Part 2: Algorithms
Haim Kaplan and Uri Zwick
Objective of This Course
Using Baseline Data in Quality Problem Solving
SMEM Algorithm for Mixture Models
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
CSE 589 Applied Algorithms Spring 1999
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Instructors: Fei Fang (This Lecture) and Dave Touretzky
METHOD OF STEEPEST DESCENT
Lecture 4: Econometric Foundations
Adaptive Perturbation Theory: QM and Field Theory
CS202 - Fundamental Structures of Computer Science II
More on Search: A* and Optimization
The loss function, the normal equation,
CS 188: Artificial Intelligence Fall 2008
Mathematical Foundations of BME Reza Shadmehr
CSE 185 Introduction to Computer Vision
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
DESIGN OF EXPERIMENTS by R. C. Baker
12. Principles of Parameter Estimation
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Stochastic Methods.
Presentation transcript:

Minimum Energy Designs – from Nanostructure Synthesis to Sequential Optimization C. F. Jeff Wu+ (joint with Roshan Joseph+ & Tirthankar Dasgupta* ) +Georgia Institute of Technology *Harvard University

What are Nanostructures? Functional structures designed from atomic or molecular scale with at least one characteristic dimension measured in nanometers (1 nm = 10-9 meter). Exhibits novel and significantly improved physical, chemical and biological properties, phenomena and processes. Building blocks for nano-devices. Likely to impact many fields ranging from electronics, photonics and optoelectronics to life sciences and healthcare.

Statistical modeling and analysis for robust synthesis of nanostructures Dasgupta, Ma, Joseph, Wang and Wu (2008), J. Amer. Stat. Assoc. Robust conditions for synthesis of Cadmium Selenide (CdSe) nanostructures derived New sequential algorithm for fitting multinomial logit models. Internal noise factors considered.

Fitted quadratic response surfaces & optimal conditions

The need for more efficient experimentation A 9x5 full factorial experiment was too expensive and time consuming. Quadratic response surface did not capture nanowire growth satisfactorily (Generalized R2 was 50% for CdSe nanowire sub-model).

What makes exploration of optimum difficult? Complete disappearance of morphology in certain regions leading to large, disconnected, non-convex yield regions. Multiple optima. Expensive and time-consuming experimentation 36 hours for each run Gold catalyst required

“Actual” contour plot of CdSe nanowire yield Obtained by averaging yields over different substrates. Large no-yield (deep green region). Small no-yield region embedded within yield regions. Scattered regions of highest yield.

How many trials needed to hit the point of maximum yield ? Pressure Temperature

A 5x9 full-factorial experiment 17 out of 45 trials wasted (no morphology)! Pressure Yield = f(temp, pressure)

Why are traditional methods inappropriate ? Need a sequential approach to keep run size to a minimum. Fractional factorials / orthogonal arrays Large number of runs as number of levels increase. Several no-morphology scenarios possible. Do not facilitate sequential experimentation. Response Surface Methods Complexity of response surface. Categorical (binary in the extreme case) possible. No clever search algorithm.

The Objective To find a design strategy that Is model-independent, Can “carve out’’ regions of no-morphology quickly, Allows for exploration of complex response surfaces, Facilitates sequential experimentation.

Pros and Cons of space filling designs LHD (McKay et al. 1979), Uniform designs (Fang 2002) are primarily used for computer experiments. Can be used to explore complex surfaces with small number of runs. Model free. Not designed for sequential experimentation. No provision to carve out regions of no-morphology quickly.

Sequential Minimum Energy Designs (SMED) Physical connection: treat design points as positively charged particles. Y = 0 Charge inversely proportional to yield, e.g., q = 1-yield q2 = 1.0 E = Kq1q2 / d Pressure Y = 40% q1 = 0.6

What position will a newly introduced particle occupy? q2 = 1.0 Total Potential Energy Minimized !! Pressure q1 = 0.6

Key idea Pick a point x. Conduct experiment at x and observe yield p(x). Assign charge q(x) inversely proportional to p(x), e.g., . Use to update your knowledge about yields at various points in the design space Pick the next point as the one that minimizes the total potential energy in the design space.

The next design point

How the algorithm works

Inverse distance weighting as interpolator Not yet an algorithm, q(x) needs to be “predicted”. Use inverse distance weighting to assign charges to each yellow point based on yields observed at red (sampled) points: . The yellow point that minimizes the potential energy with the four red points, is the next choice.

The SMED algorithm

Choice of a Because , where . Lemma 1: For a = 1/pg , if xn = xg for some n = n0, then xn = xg, for . Once it reaches xg , SMED will stick to the global optimum (i.e., total energy ). Undesirable to choose a < 1/pg ; see Theorem 2 later.

Choice of tuning constants In practice, pg will not be known. Thus a will be estimated iteratively. First, let’s examine the performance for deterministic yield functions with fixed a (a = pg-1) and g.

Performance with known a

Performance with known a (with different starting points and g=1)

Convergence of SMED

Proof (Continued) For any , Since is a convergent sequence and , of as , a contradiction. □

Divergence of SMED with wrong a Theorem 2. Under same assumptions, if a<1/pg , then is a dense subset of . Proof based on similar ideas. Implications: Smed sequence will visit every part of the design region, an erratic behavior like the Peano Curve. The proofs reveal how and work together to move the sequence toward the optima.

Accelerated SMED For a convergent , its d values → 0. Then the corresponding q values must also go to 0, i.e., , explaining why a = 1/pg. By flipping this argument, we can move SMED subsequence quickly out of a region with low q values (i.e., get out of a peak already identified) by redefining the q values for this subsequence to a much higher value. This will force SMED to move quickly out of the region.

Performance Comparison SMED Accelerated SMED

Criteria for estimator of a

Iterative estimation of a Fit the logistic model Where is the asymptotic value of the fitted logistic curve. Use

Some performance measures for n0 - run designs .

Performance evaluation with nanowire yield data

Modified Branin function A standard test function in global optimization: , has three global minima. To create a large nonconvex and disconnected no-yield region, use modified Branin function where

Performance with modified Branin function

Performance with modified Branin function (contd.)

Random functions In actual practice the yield function is random. We actually observe

Performance of usual algorithm with random functions Result of 100 simulations, starting point = (0,0). Concern: as r decreases, the number of cases in which the global optimum is identified reduces.

Improved SMED for random response Instead of an interpolating function, use a smoothing function to predict yields (and charges) at unobserved points. Update the charges of selected points as well, using the smoothing function. Local polynomial smoothing used. Two parameters: nT (threshold number of iterations after which smoothing is started). l (smoothing constant; small l: local fitting).

Improved performance with smoothing algorithm, r = 10

Summary A new sequential space-filling design SMED proposed. SMED is model independent, can quickly “carve out” no-morphology regions and allows for exploration of complex surfaces. Origination from laws of electrostatics. Some desirable convergence properties. Modified algorithm for random functions. Performance studied using nanowire data, modified Branin (2 dimensional) and Levy-Montalvo (4 dimensional) functions.

Predicting the future Use my SMED ! What the hell! I don’t want to use this stupid strategy for experimentation ! Use my SMED ! Stat Nano Image courtesy : www.cartoonstock.com

Thank you

How many trials? Let’s try one factor at-a-time! Could not find optimum Almost 50% trials wasted (no yield) Too few data for statistical modeling Pressure Temperature

Sequential experimentation strategies for global optimization SDO, a grid-search algorithm by Cox and John (1997) Initial space-filling design. Prediction using Gaussian Process Modeling. Lower bounds on predicted values used for sequential selection of evaluation points. Jones, Schonlau and Welch (1998) Similar to SDO. Expected Improvement (EI) Criterion used. Balances the need to exploit the approximating surface with the need to improve the approximation.

Why they are not appropriate Most of them good for multiple optima, but do not shrink the experimental region fast. Algorithms that reduce the design space (Henkenjohann et al. 2005) assume connected and convex failure regions. Initial design may contain several points of no-morphology. Current scenario focuses more on quickly shrinking the design space.

Performance in higher-dimensions (Levy-Montalvo function)