1 Bootstrap Confidence Intervals in Variants of Component Analysis Marieke E. Timmerman 1, Henk A.L. Kiers 1, Age K. Smilde 2 & Cajo J.F. ter Braak 3 1.

Slides:



Advertisements
Similar presentations
Review bootstrap and permutation
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
1 Bootstrap Confidence Intervals for Three-way Component Methods Henk A.L. Kiers University of Groningen The Netherlands.
3 pivot quantities on which to base bootstrap confidence intervals Note that the first has a t(n-1) distribution when sampling from a normal population.
Chapter 13 Conducting & Reading Research Baumgartner et al Data Analysis.
THE MEANING OF STATISTICAL SIGNIFICANCE: STANDARD ERRORS AND CONFIDENCE INTERVALS.
Topics: Inferential Statistics
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
Today Concepts underlying inferential statistics
Types of Control I. Measurement Control III. Experimental Control II. Statistical Control (Reliability and Validity) (Internal Validity) (External Validity)
The Sampling Distribution Introduction to Hypothesis Testing and Interval Estimation.
Bootstrapping applied to t-tests
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Xitao Fan, Ph.D. Chair Professor & Dean Faculty of Education University of Macau Designing Monte Carlo Simulation Studies.
AM Recitation 2/10/11.
Statistical Computing
 1  Outline  stages and topics in simulation  generation of random variates.
Applications of bootstrap method to finance Chin-Ping King.
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Andrew Thomson on Generalised Estimating Equations (and simulation studies)
1 Nonparametric Methods II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Performance of Resampling Variance Estimation Techniques with Imputed Survey data.
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
Sampling and Confidence Interval Kenneth Kwan Ho Chui, PhD, MPH Department of Public Health and Community Medicine
Managerial Economics Demand Estimation & Forecasting.
Biostatistics Unit 5 – Samples. Sampling distributions Sampling distributions are important in the understanding of statistical inference. Probability.
Resampling techniques
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Academic Research Academic Research Dr Kishor Bhanushali M
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Chapter 3: Statistical Significance Testing Warner (2007). Applied statistics: From bivariate through multivariate. Sage Publications, Inc.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Tuesday, April 8 n Inferential statistics – Part 2 n Hypothesis testing n Statistical significance n continued….
Nonparametric Methods II 1 Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Lecture 4: Likelihoods and Inference Likelihood function for censored data.
Case Selection and Resampling Lucila Ohno-Machado HST951.
Multivariate Data Analysis Chapter 3 – Factor Analysis.
Lecture 4 Confidence Intervals. Lecture Summary Last lecture, we talked about summary statistics and how “good” they were in estimating the parameters.
1 Probability and Statistics Confidence Intervals.
Topics Semester I Descriptive statistics Time series Semester II Sampling Statistical Inference: Estimation, Hypothesis testing Relationships, casual models.
Modern Approaches The Bootstrap with Inferential Example.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Project Plan Task 8 and VERSUS2 Installation problems Anatoly Myravyev and Anastasia Bundel, Hydrometcenter of Russia March 2010.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Quantifying Uncertainty
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Design and Data Analysis in Psychology I English group (A) Salvador Chacón Moscoso Susana Sanduvete Chaves Milagrosa Sánchez Martín School of Psychology.
Estimating standard error using bootstrap
Based on “An Introduction to the Bootstrap” (Efron and Tibshirani)
ESTIMATION.
Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.
Why the Normal Distribution is Important
Linear Mixed Models in JMP Pro
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Types of Control I. Measurement Control II. Statistical Control
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Objectives 6.1 Estimating with confidence Statistical confidence
Objectives 6.1 Estimating with confidence Statistical confidence
Lecture 4: Likelihoods and Inference
Lecture 4: Likelihoods and Inference
Bootstrapping and Bootstrapping Regression Models
Presentation transcript:

1 Bootstrap Confidence Intervals in Variants of Component Analysis Marieke E. Timmerman 1, Henk A.L. Kiers 1, Age K. Smilde 2 & Cajo J.F. ter Braak 3 1 Heymans Institute of Psychology, University of Groningen 2 Biosystems Data Analysis, University of Amsterdam 3 Biometris, Wageningen University The Netherlands

2 Some background of this work Validation (Harshman, 1984) –Theoretical appropriateness –Computational correctness –Explanatory validity –Statistical reliability

3 Some background of this work Statistical reliability ( Smilde, Bro & Geladi (2004) Multi-way analysis, p. 146 ) is related to... the stability of solutions to resampling, choice of dimensionality and confidence intervals of the model parameters. The statistical reliability is often difficult to quantify in practical data analysis, e.g., because of small sample sets or poor distributional knowledge of the system.’

4 Statistical reliability Model choice –choice of dimensionality –stability of solutions to resampling Inference –stability of solutions to resampling –confidence intervals (CIs) of the model parameters How to estimate CIs in component analysis? And what about the quality?

5 Confidence intervals of model parameters Observed random Sample x  parameters = s(x) Population Distribution Function F  parameters θ Confidence Intervals (CI): derived from sampling distribution of

6 Bootstrap Confidence intervals Observed random Sample x  parameters = s(x) Population Distribution Function F  parameters θ Empirical Distribution Function Bootstrap Sample x *  parameters = s(x*)

7 Example: CI for population mean μ θ=μθ=μ

8 θ=μθ=μ

9 Key questions for the Bootstrap procedure 1.Sample drawn from which Population(s)? 2.What is s(x) exactly? 3.If s(x) is non-unique, how to make s(x*) comparable? 4.How to define EDF? 5.How to estimate CIs from distribution of ?

10 What’s next… Principal Component Analysis –Various answers to the key questions –Simulation study: What’s the quality of the various resulting CIs? Real multi-way/block methods –Tucker3/PARAFAC –Multilevel Component Analysis –Principal Response Curve Model

11 Principal Component Analysis X (I  J):observed scores of I subjects on J variables Z: standardized scores of X F (I  Q): Principal component scores A (I  Q): Principal loadings Q: Number of selected principal components T ( Q  Q): Rotation matrix

12 1. Sample drawn from which Population(s)? ‘observed scores of I subjects on J variables’

13 2. What is s(x) exactly? Loadings: 1. Principal loadings (A Q ) 2. Rotated loadings (A Q T) a. Procrustes rotation towards external structure b. use one, fixed criterion (e.g., Varimax) c. search for ‘the optimal simple solution’ Oblique case: correlations between components Variance accounted for

14 3. If s(x) is non-unique, how to make s(x*) comparable? Loadings: 1. Principal loadings (A Q ) Sign of Principal loadings (A Q ) is arbitrary: reflect columns of A Q * to the same direction

15 1. Principal loadings (A Q ) Sign of Principal loadings (A Q ) is arbitrary: reflect columns of A Q * to the same direction

16 2. Rotated loadings (A Q T) a. Procrustes rotation towards external structure: none (A Q T* is unique)

17 2. Rotated loadings (A Q T) b. use one, fixed criterion (e.g., Varimax) Sign & order of Varimax rotated loadings is arbitrary: reflect & reorder columns of A Q T*

18 2. Rotated loadings (A Q T) c. search for ‘the optimal simple solution’ How are bootstrap solutions A Q T* found? –For each bootstrap solution: look for ‘optimal simple loadings’ (unfeasible): reflect & reorder columns of A Q T* –Procrustes rotation towards ‘optimally simple’ sample loadings: none (A Q T* is unique)

19 ‘Fixed criterion’ versus ‘Procrustes towards (simple) sample loadings’ Instable varimax rotated solutions over samples? Varimax rotated bootstrap solutions Procrustes rotated bootstrap solutions

20 –non-parametric: X b : rowwise resampling of Z –semi-parametric: –parametric: elements of X b from particular p.d.f. 4. How to define the EDF?

21 5. How to estimate CIs from the distribution of ?

22 Based on bootstrap standard error (se*) –Wald () –...

23 Percentile based methods –BC a method (Bias Corrected and Accelerated, corrects for potential Bias and skewness of bootstrap distribution) –… –percentile method

24 Quality of CI?  Coverage central 1-2 α CI: [CI left ;CI right ) P(θ CI right )= α with θ population parameter θ

25 But, what is the population parameter θ? –Results from PCA on population data –Orientation Population loadings should match Bootstrap loadings… 1. Principal loadings (A Q *) 2. Rotated loadings (A Q T*) a. Procrustes rotation towards external structure b. use one, fixed criterion (e.g., Varimax) c. search for ‘the optimal simple solution’ -B searches for optimal simple loadings -Procrustes rotation towards ‘optimally simple’ sample loadings Bootstrap Varimax Bootstrap Procrustes

26 Simulation study CI’s for Varimax rotated Sample loadings Data properties varied: –VAF in population (0.8,0.6,0.4) –number of variables (8, 16) –sample size (50, 100, 500) –distribution of component scores (normal, leptokurtic, skew) –simplicity of loading matrix (simple, halfsimple, complex) Design completely crossed, 1000 replicates per cell

27 Simplicity of loading matrix  Stability of Varimax solution of samples

28 Quality criteria for 95%CI’s P(θ CI right )= α 95%coverage (1-prop(θ CI right ))*100% Exceeding Percentage (EP) ratio prop(θ CI right )

29

30 EP ratio (symmetry of coverage) Bootstrap CI’s: Wald, Percentile, BC a In case of skew statistic distributions (i.e., high loadings, small sample size): –BC a by far best –Wald performs poor (bootstrap & asymptotic) Other conditions: hardly any differences

31

32 Empirical example ItemSampleBC a Varimax BC a Procrustes 1.43[.16,.61][.21,.57] 2-.08[-.26,.10][-.27,.08] ……

33 Key questions for the Bootstrap procedure 1.Sample drawn from which Population(s)? 2.What is s(x) exactly? 3.If s(x) is non-unique, how to make s(x*) comparable? 4.How to define EDF? 5.How to estimate CIs from distribution of ?

34 Real multi-way methods Tucker3/PARAFAC 1. Sample drawn from which Population(s)? Which mode(s) are considered fixed, which are random? Examples: subjects, measurement occasions, variables measurement occasions (of one subject), variables, situations judges, food types, variables

35 Tucker3/PARAFAC 2. What is s(x) exactly? T3: Component matrices, for fixed modes only. Core matrix. Possibly after rotation… PF: Component matrices, for fixed modes only. 3. If s(x) is non-unique, how to make s(x*) comparable? T3: Depends on view on rotation… PF: Reflect and reorder

36 Multi-block methods Multilevel Component Analysis, for hierarchically ordered multivariate data Examples: –inhabitants within different countries –measurement occasions within different subjects...             ...  ... 

37

38 National character Weighted PCA (Dis)similarities between inhabitants within each country Simultaneous Component Analysis

39 1. Sample drawn from which population(s)? Which mode(s) are considered fixed, which are random? inhabitants within different countries measurement occasions within different subjects pupils within classes

40 Another multi-block method Principal response curve model for longitudinal multivariate data, obtained from objects within experimental conditions ‘How is the development over time influenced by the experimental conditions?’

41 first PRCs of Invertebrate data

42 doses group (d=0,…,D) replicate (i d =1,…,I d ) time (t=1,…,T=11) 12 … 11 d=0 (control) i 0 =1,…, I0I0 … … … d=Di 0 =1,…, IDID Experimental Design:

43 Results from a simulation experiment: –BC a confidence bands quality improves with decreasing replicate variation, and simpler error structures with increasing sample size...but even sample size of 20 replicates per condition generally yields satisfactory results

44 To conclude How to estimate CIs in component analysis? –Use the bootstrap! –5 Key questions for the Bootstrap procedure uniqueness of sample solution? which modes are random/fixed?... And what is the quality? –Generally reasonable