1 C. F. Jeff Wu School of Industrial and Systems Engineering Georgia Institute of Technology Statistical design and modeling of experiments with high-tech.

Slides:



Advertisements
Similar presentations
The University of Michigan Georgia Institute of Technology
Advertisements

Design of Experiments Lecture I
CHAPTER 13 M ODELING C ONSIDERATIONS AND S TATISTICAL I NFORMATION “All models are wrong; some are useful.”  George E. P. Box Organization of chapter.
Rigid-Frame Structures
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
Mehdi Amirijoo1 Dynamic power management n Introduction n Implementation, levels of operation n Modeling n Power and performance issues regarding.
Experimental Uncertainties: A Practical Guide What you should already know well What you need to know, and use, in this lab More details available in handout.
14-1 Introduction An experiment is a test or series of tests. The design of an experiment plays a major role in the eventual solution of the problem.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering On-line Alert Systems for Production Plants A Conflict Based Approach.
Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.
Evaluating Hypotheses
1 Statistical Work in Nanomaterial Research A Statistical Approach to Quantifying the Elastic Deformation of Nanomaterials. (X. Deng, C. F. J. Wu, V. R.
VTSLM images taken again at (a) 4.5  (T=84.7K), (b) 3.85  (T=85.3K), (c) 22.3  (T=85.9K), and (d) 31.6  (T=86.5K) using F-H for current and A-C for.
Control Charts for Variables
Statistics: The Science of Learning from Data Data Collection Data Analysis Interpretation Prediction  Take Action W.E. Deming “The value of statistics.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Value of Information for Complex Economic Models Jeremy Oakley Department of Probability and Statistics, University of Sheffield. Paper available from.
Correlation and Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Announcements Mid-Term Test next Tuesday in class! (Oct. 7 th, 9:30-11am, Rm 136LLP) Will cover all of classes Lec 1-10 plus (qualitatively) on Lec 11–
1 14 Design of Experiments with Several Factors 14-1 Introduction 14-2 Factorial Experiments 14-3 Two-Factor Factorial Experiments Statistical analysis.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Inference for regression - Simple linear regression
Slide # 1 SPM Probe tips CNT attached to a Si probe tip.
Pion test beam from KEK: momentum studies Data provided by Toho group: 2512 beam tracks D. Duchesneau April 27 th 2011 Track  x Track  y Base track positions.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Estimation of Statistical Parameters
Lecture 14 Sections 7.1 – 7.2 Objectives:
2.002 Tutorial Presentation Problem 1-Atomic Force Microscopy Justin Lai.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Geographic Information Science
1 A Bayesian statistical method for particle identification in shower counters IX International Workshop on Advanced Computing and Analysis Techniques.
TRIBOELECTRIC PHENOMENA IN PARTICULATE MATERIALS - Role of Particle Size, Surface Properties, and Vapor - Scott C. Brown 1 Team: Yakov Rabinovich 1, Jennifer.
An Efficient Sequential Design for Sensitivity Experiments Yubin Tian School of Science, Beijing Institute of Technology.
Piezoelectric Nanogenerators Based on Zinc Oxide Nanowire Arrays Zhong Lin Wang1,2,3* and Jinhui Song1 14 APRIL 2006 VOL 312 SCIENCE Presented by Yiin-Kuen(Michael)
Molecular Dynamics Simulations of Compressional Metalloprotein Deformation Andrew Hung 1, Jianwei Zhao 2, Jason J. Davis 2, Mark S. P. Sansom 1 1 Department.
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
Statistics Presentation Ch En 475 Unit Operations.
LECTURE 3: ANALYSIS OF EXPERIMENTAL DATA
Issues concerning the interpretation of statistical significance tests.
Why are there so few key mutant clones? Why are there so few key mutant clones? The influence of stochastic selection and blocking on affinity maturation.
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
Molecular Dynamics Study of Ballistic Rearrangement of Surface Atoms During Ion Bombardment on Pd(001) Surface Sang-Pil Kim and Kwang-Ryeol Lee Computational.
1 The Role of Statistics in Engineering ENM 500 Chapter 1 The adventure begins… A look ahead.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Kinesin hydrolyses one ATP per 8-nm step Mark J. Schnitzer*† & Steven M. Block†‡ Departments of * Physics and † Molecular Biology, and ‡ Princeton Materials.
BASIC STATISTICAL CONCEPTS Statistical Moments & Probability Density Functions Ocean is not “stationary” “Stationary” - statistical properties remain constant.
BME 353 – BIOMEDICAL MEASUREMENTS AND INSTRUMENTATION MEASUREMENT PRINCIPLES.
Robust Synthesis of Nanostructures C.F.Jeff Wu* Georgia Institute of Technology (joint with Tirthankar Dasgupta*, Christopher Ma +, Roshan Joseph*, Z L.
Statistics Presentation Ch En 475 Unit Operations.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
F5 Performance Management. 2 Section C: Budgeting Designed to give you knowledge and application of: C1. Objectives C2. Budgetary systems C3. Types of.
Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.
Statistics Presentation
CHAPTER 29: Multiple Regression*
Imaging Structural Proteins
Jen Chao Presentation November 20, 2008
Direct Visualization of a DNA Glycosylase Searching for Damage
ENM 310 Design of Experiments and Regression Analysis Chapter 3
Shamik Sen, Shyamsundar Subramanian, Dennis E. Discher 
Flow-Enhanced Stability of Rolling Adhesion through E-Selectin
Timescales of Inference in Visual Adaptation
Probing the Energy Landscape of the Membrane Protein Bacteriorhodopsin
Lecturer Dr. Veronika Alhanaqtah
Volume 102, Issue 2, Pages (January 2012)
Extracting Dwell Time Sequences from Processive Molecular Motor Data
Biological Science Applications in Agriculture
14 Design of Experiments with Several Factors CHAPTER OUTLINE
Presentation transcript:

1 C. F. Jeff Wu School of Industrial and Systems Engineering Georgia Institute of Technology Statistical design and modeling of experiments with high-tech applications A statistical trilogy: data collection, analysis, decision making Examples in high-tech applications:  nano technology  cell biology  complex system simulations

2 A Statistical Trilogy I. Data collection: II. Data modeling (incl. inference): III. Optimization and decision making:

3 A Statistical Trilogy I. Data collection: experimental design, sample surveys. II.Data modeling (incl. inference): regression, analysis of variance, time series analysis, survival data analysis. III. Optimization and decision making: decision analysis, Bayesian method.

4 What’s Next? The High-Tech Revolution Availability of massive data: cannot do design of experiments, but can do data mining and data experimentation. "The sexy job in the next 10 years will be statisticians,” Google chief economist (NY Times, 2009/8/5) Physical experiments replaced by computer experiments (savings in cost and time, more feasible): a definite opportunity. Other opportunities abound (nanotechnology, molecular medicine, biotech devices, alternative fuel): unknown territory, tremendous promises.

5 Statistical Work in Nano Technology The nano part is based on two papers: –A Statistical Approach to Quantifying the Elastic Deformation of Nanomaterials (X. Deng, V. R. Joseph, W. Mai*, Z. L. Wang*, C. F. J. Wu). Proc. Nat. Acad. Sciences, 106, , –Robust optimization of the output voltage of nanogenerators by statistical design of experiments (J.Song*, H.Xie, W.Wu*, V.R.Joseph, C.F.J.Wu, Z.L.Wang*). Nano Research, 3(9), 613-9, *School of Materials Science and Engineering, Georgia Tech

6 A Statistical Approach to Quantifying the Elastic Deformation of Nanomaterials Existing method and drawbacks A new method: Sequential Profile Adjustment by Regression (SPAR) Demonstration on nanobelt data

7 Introduction One-dimensional (1D) nanomaterials: fundamental building blocks for constructing nanodevices and nanosystems. Important to quantify mechanical property such as elastic modulus of 1D nanomaterials: dictate their applications in nanotechnology. A common strategy is to deform a 1D nanostructure using an AFM (Atomic Force Microscopy) tip. Schematic diagram of AFM

8 Method of Experimentation and Modeling Mai and Wang (2006, Appl. Phys. Lett.) proposed a new approach to measure the elastic modulus of ZnO nanobelt (NB). The AFM tip scans along the length of the NB under a constant applied force. A series of bending profiles of the same NB are obtained by sequentially changing the magnitude of the contact force. AFM images of a suspended ZnO nanobelt

9 Free-Free Beam Model Mai and Wang (2006) suggested a free-free beam model (FFBM) to quantify the elastic deflection (with free boundary condition): The deflection v of NB at x is determined by where E is the elastic modulus, L is the width of trench, and I is the moment of inertia. FFBM gives better fit than clamped-clamped beam model. A L x h F B x h F L

10 FFBM Profiles Example The profiles are calculated based on FFBM. The force F changes from low 78 nN to high 261 nN.

11 Profiles of the Nanobelt Experiment AFM image profiles of NB under load forces from low 78 nN to high 261 nN. Initial bias of the nanobelt: –The NB is not perfectly straight: initial bending during sample manipulation. –The profile curves in Figure are not smooth: caused by a small surface roughness (around 1 nm) of the NB.

12 MW Method Eliminate the initial bias: Normalize profiles by subtracting the first profile (acquired at 78 nN) from the profiles in (a). The elastic modulus is estimated by fitting the normalized AFM image profiles using the FFBM. (MW method)

13 Problem with MW Method Subtracting the first profile to normalize the data can result in poor estimation if the first profile behaves poorly. Systematic biases can occur during the measurement, Inconsistent (order reversal) pattern: profiles at applied force 235, 248 and 261 nN lie above on those obtained at lower force F = 209 and 222 nN. This pattern persists in the normalized profiles.

14 Problem with MW Method Subtracting the first profile to normalize the data can result in poor estimation if the first profile behaves poorly. Systematic biases can occur during the measurement. Inconsistent (order reversal) pattern: profiles at applied force 235, 248 and 261 nN lie above on those obtained at lower force F = 209 and 222 nN. This pattern persists in the normalized profiles. 235 nN 248 nN 261 nN 209 nN 222 nN

15 Problem with MW Method Subtracting the first profile to normalize the data can result in poor estimation if the first profile behaves poorly. Systematic biases can occur during the measurement. Inconsistent (order reversal) pattern: profiles at applied force 235, 248 and 261 nN lie above on those obtained at lower force F = 209 and 222 nN. This pattern persists in the normalized profiles. 235 nN 248 nN 261 nN 209 nN 222 nN 157 nN 170 nN 183 nN 131 nN 144 nN

16 Counter Measures Experimenters: drop the data (i.e., five belts) that exhibit inconsistency. –loss of data and waste of information. Statisticians: keep the data, use statistical modeling to remove the inconsistency. –remaining information in data be utilized.

17 SPAR: A New Method The FFBM itself cannot explain the inconsistency. –Requires a more general model to include other factors besides the initial bias. Propose a general model to incorporate the initial bias and other potential systematic biases. Use model selection to choose an appropriate model. The method is called sequential profile adjustment by regression (SPAR).

18

19 Causes of Systematic Biases The changes of boundary conditions: –Can be nonlinear and irreversible during the measurement. –Can cause the occasional stick-slip events. The wear and tear of AFM tip and the nanobelt surface. The lateral shifting and sliding, and other artifacts. Because of the nano scale, such causes are more acute in nano experiment and can occur at any stage of the experiment.

20 Model Selected from Deflection Data

21 F 13 = 235 nN F 14 = 248 nN F 15 = 261 nN F 11 = 209 nN F 12 = 222 nN

22 F 13 = 235 nN F 14 = 248 nN F 15 = 261 nN F 11 = 209 nN F 12 = 222 nN Matching the FFBM better, but inconsistent pattern persists 

23 F 13 = 235 nN F 14 = 248 nN F 15 = 261 nN F 11 = 209 nN F 12 = 222 nN Inconsistent pattern removed

24 The δ 12 term over-corrects and moves the curves down; this is rectified by adding δ 10 ; curves are moved up, middle part smoothed better match with FFBM.

25 std reduced by 50%.

26 Mechanistic vs. Statistical Modeling The error and noise of the experiment are stochastic in nature. It is difficult to develop a catch-all mechanistic model. –The mechanistic model is deterministic and predictive. A purely statistical model lacks prediction power. The proposed mechanistic-empirical modeling strategy can be a useful approach. –Make the statistical corrections physically meaningful. –Improve the estimation of physical parameters.

27 Understanding Cell Adhesion State Using Hidden Markov Model C. F. Jeff Wu + (joint with Y. Hung*, V. Zarnitsyna §, Yijie Wang +, & C. Zhu § ) + Georgia Tech, Industrial & Systems Engineering *Rutgers, the State University of New Jersey § Georgia Tech, Biomedical Engineering Based on NIH-GMS Grant

Cell adhesion Motivated by the statistical analysis of biomechanical experiments at Georgia Tech. Cell adhesion: binding of a cell to another cell or surface.  Mediated by interaction between cell adhesion proteins (receptors) and the molecules that they bind to (ligands). Biologists describe the receptor-ligand binding as a key-to-lock type relation. What makes cells sticky? When, how, and to what cells adhere? Why important? It plays an important role in many physiological and pathological processes and in tumor metastasis in cancer study. 28

Thermal fluctuation experiment It uses reduced thermal fluctuations to indicate the presence of receptor-ligand bonds. Objective: Identify association and dissociation points for receptor-ligand bonds. Accurate estimation of these points is essential because it is required for precise measurement of bond lifetimes and waiting times, it forms the basis for subsequent estimation of the kinetic parameters. 29

Experimental setting A micropipette red blood cell with a bead (probe) glued to its apex (left) was aligned against another bead (target) aspirated by another pipette (right). (Developed at Georgia Tech.) Driven by a piezoelectric translator, a computer-programmed test cycle consisted of an approach-push-retract-hold-return cycle. During the holding period, the left pipette was held stationary to allow the probe and the target to contact via thermal fluctuations, thereby providing an opportunity for the receptors and ligands to interact. Position of probe was tracked by image analysis software to produce data. 30

Interested in the thermal fluctuation during the holding period. Bond formation is equivalent to adding a molecular spring in parallel to the force transducer spring to stiffen the system the fluctuation decreases when a receptor-ligand bond forms and resumes when the bond dissociates. Data Bond forms Bond dissociates 31

Challenges Challenges in identifying the bond association/dissociation points: Points are not directly observable. Observations are not independent. In practice, data contains an unknown number of bond types and each bond associated with different fluctuation decreases due to their string strength difference. 32

Challenges Challenges in identifying the bond association/dissociation points: Points are not directly observable. Can only be detected by variance changes. Observations are not independent. In practice, data contains an unknown number of bond types and each bond associated with different fluctuation decreases due to their string strength difference. 33

Challenges Challenges in identifying the bond association/dissociation points: Points are not directly observable. Can only be detected by variance changes. Observations are not independent. Need to take into account cell memory effect. Binding probability increases if there is a binding in the immediate past. In practice, data contains an unknown number of bond types and each bond associated with different fluctuation decreases due to their string strength difference. 34

Challenges Challenges in identifying the bond association/dissociation points: Points are not directly observable. Can only be detected by variance changes. Observations are not independent. Need to take into account cell memory effect. Binding probability increases if there is a binding in the immediate past. In practice, data contains an unknown number of bond types and each bond associated with different fluctuation decreases due to their string strength difference. 35

Hidden Markov Models (HMM) Framework Assume the probe fluctuates with different variances that correspond to different underlying binding states. These states, including no bond and a number of distinct types of bonds, are not observable but the process of these binding states change can be captured by a Markov chain model. Such Markov chain process can also be used to capture the cell memory effect. 36

Hidden Markov Model with two states 37

Hidden Markov Model with two states 38

Hidden Markov Model with two states 39

Hidden Markov Model with two states 40

Hidden Markov Model with two states 41

Transition Probability in HMM denotes the prob. of going from state i to state j A large indicates a memory effect Called “Hidden” because the Markov chain transition works underneath the normal distribution N(μ i,σ i ²) for state i 42

Analysis Results for Two States 43

HMM with three states No bond, P-selectin bond, L-selectin bond: P/L-selectin are different proteins on cell surface. They play an important role in transiently rolling process of cell. It is known that L-selectin has a more stiff string than P-selectin σ L ² < σ p ². This physical knowledge allows us to focus the HMM on the variance change as an indication of chang of bond type. 44

Thermal fluctuation data: Three states 45

Estimation for HMM : No bond (state 0) more likely transits to P-bond (state 1) than to L-bond (state 2) : P-bond more likely transits to L-bond than to no bond : not much difference Estimates attached with statistical significance 46

Analysis for three states 47

48 Why computer experiments?

49 Some examples

50 Uncertainty Quantification Statistical Meta-Modeling of Computer Experiments

51 GP with quanti/quali factors: Data Center Thermal Distribution

Configuration Variables for Data Center Example Five quantitative factors: rack temperature rise, rack power, diffuser angle, diffuser flow rate, ceiling height Three qualitative factors: diffusor location, hot-air return-vent location, power allocation 52

Gaussian Process Models with Quantitative and Qualitative Factors 53

Summary Statistics not used in some high-tech applications, e.g., Nobel-winning experimental effects (or Science, Nature) should be “obvious”. It has made impact in industrial work when “incremental” improvement needs statistical tools; increasingly popular for high-tech work when “subtle” effects need to be ascertained. Massive online data is the biggest opportunity for stat, e.g., webpage design and optimization using stat doe. Major role in complex stochastic system study. 54

55