Theoretical and empiral rationale for using unrestricted PCA solutions

Slides:



Advertisements
Similar presentations
Helen Gaeta, David Friedman, & Gregory Hunt Cognitive Electrophysiology Laboratory New York State Psychiatric Institute Differential Effects of Stimulus.
Advertisements

Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Discrim Continued Psy 524 Andrew Ainsworth. Types of Discriminant Function Analysis They are the same as the types of multiple regression Direct Discrim.
CHAPTER 19 Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
A quick introduction to the analysis of questionnaire data John Richardson.
Method of Soil Analysis 1. 5 Geostatistics Introduction 1. 5
Factor Analysis Psy 524 Ainsworth.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
ERP DATA ACQUISITION & PREPROCESSING EEG Acquisition: 256 scalp sites; vertex recording reference (Geodesic Sensor Net)..01 Hz to 100 Hz analogue filter;
Canonical Correlation Analysis and Related Techniques Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia.
Applied Quantitative Analysis and Practices LECTURE#11 By Dr. Osman Sadiq Paracha.
Chapter 9 Factor Analysis
Applied Quantitative Analysis and Practices
All slides © S. J. Luck, except as indicated in the notes sections of individual slides Slides may be used for nonprofit educational purposes if this copyright.
Thursday AM  Presentation of yesterday’s results  Factor analysis  A conceptual introduction to: Structural equation models Structural equation models.
Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Introduction Can you read the following paragraph? Can we derive meaning from words even if they are distorted by intermixing words with numbers? Perea,
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Education 795 Class Notes Factor Analysis Note set 6.
Principal components analysis (PCA) as a tool for identifying EEG frequency bands: I. Methodological considerations and preliminary findings Jürgen Kayser,
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
A direct comparison of Geodesic Sensor Net (128-channel) and conventional (30-channel) ERPs in tonal and phonetic oddball tasks Jürgen Kayser, Craig E.
All slides © S. J. Luck, except as indicated in the notes sections of individual slides Slides may be used for nonprofit educational purposes if this copyright.
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Methods for Dummies M/EEG Analysis: Contrasts, Inferences and Source Localisation Diana Omigie Stjepana Kovac.
Effects of Word Concreteness and Spacing on EFL Vocabulary Acquisition 吴翼飞 (南京工业大学,外国语言文学学院,江苏 南京211816) Introduction Vocabulary acquisition is of great.
Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for.
Effect of cognitive-behavioral therapy on brain activity related to stimulus-response conflict processing in Gilles de la Tourette Syndrome Lori Baltazar1,4.
Unsupervised Learning
Audio-spinal reflex response in human limb muscles
Cortical evoked potentials to an auditory illusion: Binaural beats
Thomas Andrillon, Sid Kouider, Trevor Agus, Daniel Pressnitzer 
Attention Components and Creative Potential: An ERP Exploration
Portfolio Risk Lecture 14.
Neurofeedback of beta frequencies:
A High-Density EEG investigation of the Misinformation Effect: Differentiating between True and False Memories John E. Kiat & Robert F. Belli Department.
Brain Electrophysiological Signal Processing: Preprocessing
Making Use of Associations Tests
Factor analysis Advanced Quantitative Research Methods
Elementary Statistics
Findings for Healthy Adults and Depressed Patients
Interacting Roles of Attention and Visual Salience in V4
Interpreting Principal Components
Measuring Change in Two-Wave Studies
Signal and Noise in fMRI
Measuring latent variables
Adrian G. Fischer, Markus Ullsperger  Neuron 
Thomas Andrillon, Sid Kouider, Trevor Agus, Daniel Pressnitzer 
Volume 66, Issue 6, Pages (June 2010)
6.1 Introduction to Chi-Square Space
Dynamic Causal Modelling for M/EEG
Principal Component Analysis
Volume 49, Issue 3, Pages (February 2006)
Volume 28, Issue 5, Pages e3 (March 2018)
Machine Learning for Visual Scene Classification with EEG Data
Binocular Rivalry Requires Visual Attention
Athanassios G Siapas, Matthew A Wilson  Neuron 
Making Use of Associations Tests
Michael J. Frank, Brion S. Woroch, Tim Curran  Neuron 
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Canonical Correlation Analysis and Related Techniques
Exploratory Factor Analysis. Factor Analysis: The Measurement Model D1D1 D8D8 D7D7 D6D6 D5D5 D4D4 D3D3 D2D2 F1F1 F2F2.
Measuring latent variables
Theoretical rationale and empirical evaluation
Sequential Effects on the Oddball P300 in Young and Older Adults
Unsupervised Learning
Presentation transcript:

Theoretical and empiral rationale for using unrestricted PCA solutions to identify and measure ERP components Jürgen Kayser and Craig E. Tenke Department of Biopsychology, New York State Psychiatric Institute, New York http://psychophysiology.cpmc.columbia.edu Although principal components analysis (PCA) is widely used to determine “data-driven” ERP components, it is unclear if and how specific methodological choices may affect factor extraction. The effects of three variations, i.e., 1) Type of association matrix (correlation / covariance), 2) Form of Varimax rotation (scaled / unscaled), and 3) Number of components extracted and rotated, were considered and systematically investigated when applying temporal PCA (tPCA) to ERP data. Introduction Real ERP data, collected from healthy, right-handed adults during a visual half-field study (see Figure 4), were repeatedly submitted to tPCA (BMDP-4M; Dixon 1992 BMDP Statistical Software Manual (Vol. 2). Berkeley, CA: University of California Press). Columns of the data matrix represented time (110 sample points from –100 to 1,000 ms), and rows consisted of subjects (16), conditions (4), and electrode sites (30). tPCAs were performed for three extraction / rotation criteria: 1) Covariance matrix / Varimax rotation on raw data 2) Correlation matrix / Varimax rotation 3) Covariance matrix / Varimax rotation on standardized variables 110 tPCAs were computed for each extraction / rotation condition, by systematically increasing the number of components to be extracted from 1 to 110 (= number of variables) Methods This is the default in SPSS 6.1 for the covariance matrix! The usefulness of the extracted factors can be evaluated by specific know-ledge about the variance distribution of ERPs, which are characterized by the removal of baseline activity. The variance should be small for sample points before and shortly after stimulus onset (across and within cases), but large near the end of the recording epoch and at ERP component peaks. Whereas a covariance matrix preserves this information, it is lost by a cor-relation matrix that assigns equal weights to each sample point, yielding the possibility that small but systematic variations may form a factor. These considerations were evaluated and confirmed with simulated ERP data (see Figures 1–3). Theoretical Rationale VARIABLES = 128. CASES = 1920. / VARIABLE USE = 11 to 120. / FACTOR METHOD = PCA. NUMB = {Factors to be extracted}. {Extraction Method} / ROTATE METHOD = VMAX. / Figure 4. Grand average ERPs for 16 healthy adults for neutral and negative visual stimuli at 30 recording sites, averaged across hemifield of presentation (250 ms exposure in a visual half-field paradigm). Data from Kayser et al 2000 Int J Psychophysiol 36(3):211-236. A) B) Extraction Method: FORM = COVA. Extraction Method: FORM = CORR. or FORM = COVA. LOAD = CORR. A) B) A) B) 1 3 .. 12 3 .. 12 2 3 4 20 .. 109 20 .. 109 Figure 1. A) Invariant waveform template (128 sample points, 100 samples/sec, 200 ms baseline) used to generate two pseudo ERP data sets for 30 electrode ‘sites’ and 20 ‘subjects.’ A ‘topography’ was introduced by scaling the template for selected sites with a factor of 0.5 (Fp1/2), 0.8 (F7/8, F3/4, Fz), or 1.2 (C3/4, Cz). For the second data set, random noise (range ±0.25 µV, uniform distribution) was added to each sample point. B) ERP ‘group’ average of noise data set. 5 Figure 5. Sequences of factor loadings of the covariance–based solutions (A) and overlaid loadings of Factor 3 for restricted (12) and liberal (20) extraction criteria (B). Figure 6. Sequences of factor loadings of the correlation–based solutions (A) and overlaid loadings of Factor 3 for restricted (12) and liberal (20) extraction criteria (B). 6 7 Factors to be extracted Pseudo ERP Data Pseudo ERP Data + Noise 8 9 10 A) B) 20 30 Figure 2. A) Pseudo ERPs at four electrode sites (Fp1, Fz, Cz, Pz). A constant, low-level voltage offset (-0.01 µV) was systematically applied to the pre-stimulus baseline (-200 .. -50 ms) at every other electrode (e.g., see Pz in inset). B) Pseudo ERPs as in A), but with random noise added. Note that the low-level offset at Pz is lost (see inset). 109 820 440 260 170 640 330 130 560 50 90 870 430 250 70 - 170 10 50 90 120 630 Factor Loadings Limiting the number of components changed the morphology of some components considerably (see Figures 5B and 6B). However, more liberal or unlimited extraction criteria did not degrade or change high-variance components. Instead, their inter-pretability was improved by more distinctive time courses with narrow and unambiguous peaks (i.e., low secondary loadings; see Figures 5A and 6A). Some physiologically meaningful ERP components that are small in amplitude and/or topographically localized (e.g., P1) were found to have a PCA counterpart (e.g., Factor 130; see Figure 8A), that were lost with restricted solutions due to their low overall variance contributions. Covariance-based factors had more distinct time courses (i.e., lower secondary loadings) than the corresponding correlation-based factors (Figures 5B and 6B), thereby allowing a better interpretation of their electrophysiological relevance. Correlation-based solutions were likely to produce artificial factors that merely reflected small but systematic variations when the ERP waveform intersected the baseline (i.e., zero; cf. Factors –70, 10, and 50 in Figures 6A and 8B). Scaling covariance-based PCA factors before rotation approxi-mated correlation-based solutions, and ultimately yielded the same coefficients (factor loadings) when all components were rotated (see Figure 6A). The same systematic approach using auditory ‘oddball’ ERP data yielded comparable results for a different set of task-specific PCA factors. Results A) B) 820 440 260 170 640 330 130 560 50 90 10 120 630 870 430 250 70 - P1 factor 2 41 78 Explained variance [%] Number of factors extracted 100.0 48.1 0.0 14.4 94.5 45.8 0.1 1.1 0.1 1.1 1.0 C) D) A) B) Pseudo ERP Data Pseudo ERP Data + Noise Figure 3. Time course of factor loadings for the first PCA factors extracted from the covariance or correlation matrix for pseudo ERP data with (B) and without noise (A). The covariance-based PCA extracted a component (factor 1), that accu-rately reflected the introduced variance shape for both data sets. The correlation-based PCA only produced a component (factor 1) that indicated the direction, but not the size of variations from zero (i.e., from baseline). Similarly, the constant low-level offset was disproportionally reflected in another component (factor 2) for the noise-free data. Factor extractions of the unscaled covariance matrix are preferable to correlation- / scaled covariance-based PCA solutions. For ERP data, there is no reason to restrict the number of factors to be extracted. Conclusions Figure 7. Overlaid factor loadings and factor score topographies of the first 10 covariance- (A, C) or correlation-based (B, D) PCA components extracted from the unrestricted (109) solution, identified by peak latencies of factor loadings.