Principal Component Analysis (PCA) Principal component analysis (PCA) creates new variables (components) that consist of uncorrelated, linear combinations.

Slides:



Advertisements
Similar presentations
SP 225 Lecture 11 Introduction to Hypothesis Testing.
Advertisements

BIOLOGY- SEMESTER 1.
Chapter 2 The Process of Experimentation
1 Presentation to the Subcommittee on Oversight and Investigations of the House Energy and Commerce Committee. Stephen McIntyre Toronto Ontario Washington.
1 Presentation to the Subcommittee on Oversight and Investigations of the House Energy and Commerce Committee. Stephen McIntyre Toronto Ontario Washington.
Ch 2 Review.
What is Science?.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
The Scientific Method.
Lecture 7: Principal component analysis (PCA)
Principal Components An Introduction Exploratory factoring Meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.
A Bayesian hierarchical modeling approach to reconstructing past climates David Hirst Norwegian Computing Center.
Biol 500: basic statistics
Topic 3: Regression.
Time Series Analysis and Index Numbers Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana.
Tables, Figures, and Equations
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
A Genetic Algorithm-Based Approach for Building Accurate Decision Trees by Z. Fu, Fannie Mae Bruce Golden, University of Maryland S. Lele, University of.
Business Forecasting Used to try to predict the future Uses two main methods: Qualitative – seeking opinions on which to base decision making – Consumer.
Lect 6 chapter 3 Research Methodology.
Life Science Agenda: 8/04/15 Learning Target: Warm- up:
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
Science This introductory science course is a prerequisite to other science courses offered at Harrison Trimble. Text: Nelson, Science 10 Prerequisite:
Methodology for producing the revised back series of population estimates for Julie Jefferies Population and Demography Division Office for.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Principal Component Analysis: Preliminary Studies Émille E. O. Ishida IF - UFRJ First Rio-Saclay Meeting: Physics Beyond the Standard Model Rio de Janeiro.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Environmental Science Chapter 2 – Scientific Tools Test Review
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Data Collection and Processing (DCP) 1. Key Aspects (1) DCPRecording Raw Data Processing Raw Data Presenting Processed Data CompleteRecords appropriate.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Taguchi. Abstraction Optimisation of manufacturing processes is typically performed utilising mathematical process models or designed experiments. However,
The Hockey Stick Model Daniela Domeisen, Josh Gellers, and Heather Raven W4400 December 5, 2006.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Inference for a Population Mean
20 Questions Statisticians Should Ask! Edward J. Wegman George Mason University ASA – A Statistical Consensus on Global Warming October 27, 2007.
Lecture 12 Factor Analysis.
Correlation & Regression Analysis
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Sample Size Determination
BME 353 – BIOMEDICAL MEASUREMENTS AND INSTRUMENTATION MEASUREMENT PRINCIPLES.
Feature Selection and Extraction Michael J. Watts
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
Principal Component Analysis
How to Construct a Seasonal Index. Methods of Constructing a Seasonal Index  There are several ways to construct a seasonal index. The simplest is to.
Investigating the Hockey Stick Climate Model EAS Dr. Wang 4/22/08Robert Binion.
Scientific Inquiry. The Scientific Process Scientific Process = Scientific Inquiry.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Expected Return and Risk. Explain how expected return and risk for securities are determined. Explain how expected return and risk for portfolios are.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Yandell - Econ 216 Chap 1-1 Chapter 1 Introduction and Data Collection.
Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for.
Statistical analysis.
Confidence Intervals.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Statistical analysis.
Introduction to Paleoclimatology
Testing the ‘Hockey Stick’
Proxy Measures of Past Climates
Principal Component Analysis (PCA)
Descriptive Statistics vs. Factor Analysis
Product moment correlation
Principal Component Analysis
DESIGN OF EXPERIMENTS by R. C. Baker
Presentation transcript:

Principal Component Analysis (PCA) Principal component analysis (PCA) creates new variables (components) that consist of uncorrelated, linear combinations of the original variables. PCA is used to simplify the data structure and still account for as much of the total variation in the original data as possible.

Simple Case: Stock Market Data Can the data be reduced to just one linear combinations of the original variables be used without loosing much information?

3 Steps for PCA 1)Calculate the correlation matrix 2)Calculate the eigenvectors of the correlation matrix 3)Multiply the eigenvectors by the standardized original data. The first principal component (PC1) is a linear combination of the standardized data where the first eigenvector is used as the weights.

Standardized closing values of 2006 Dow Index vs 2006 S&P 500 Simple Case: Stock Market Data

Direction of first principal component (the first eigenvalue). Simple Case: Stock Market Data

Rotating the data to the first principal component. PC1 is a linear combination of the standardized data with the first eigenvector is used as the weights. Simple Case: Stock Market Data

LAB: Principal Component Analysis in Environmental Studies The Debate Over Statistical Techniques Used in the Derivation of the Global Warming Hockey Stick Graph Figure 1: The instrumental record of global average temperatures.

The Hockey Stick Graph Figure 2: Mann’s 1998 Hockey Stick Graph

The Hockey Stick Graph Figure 2: Mann’s 1998 Hockey Stick Graph

The Hockey Stick Graph

In 1998 Mann, Bradley, and Hughes (MBH) used a modified PCA to reduce 70 series of proxy data to one principal component (PC1). MBH’s graph was widely used as evidence of global warming. In 2003 McIntyre and McKitrick (MM), claimed that the graph was not correct – but had a significant amount of trouble getting published. In 2005 MM published a simulation study that showed that MBH’s modified PCA technique would consistently result in a hockey stick shape. In 2006 Ed Wegman provided an ad-hoc committee report to congress on the “Hockey Stick Global Climate Reconstruction”, The Hockey Stick Graph

MBH used data from , 581observations for each of the 70 proxy variables (tree ring data) Each variable would typically be standardized by the following formula: MBH used a ‘decentered’ standardization: What is the mean and standard deviation of a ‘decentered’ variable? How will this impact principal component analysis? The Hockey Stick Graph

Questions 1and 2: Generate a matrix of random AR(1) data. AR(1) data follows the general pattern of tree ring growth in many trees. Question 3: Standardize the data matrix Question 4: Perform PCA on a random AR(1) matrix with 70 series. Question 5: Write a function that repeats question 4 ten times. Question 6: Write a function that repeats question 5, but uses a ‘decentered’ standardization. Does it look like ‘hockey stick’ shaped graphs occur more often with decentered data? Can we conduct a more thorough simulation study? Simulation Study of the Hockey Stick Graph

The Hockey Stick Graph 1)Why do you think that the IPCC and supporters of the Kyoto accord prominently featured Mann’s (i.e. MBH’s) graph? 2)This paper shows reasons to believe that MBH’s graph was developed inappropriately; does this mean that there is no global warming? 3)State specifically how you would expect proponents and opponents to respond to MM’s and MBH’s work for their own political/personal benefit? 4)In 2006, the Chairman of the Committee on Energy and Commerce as well as the Chairman of the Subcommittee on Oversight and Investigations requested an Ad Hoc committee, chaired by Edward Wegman, to review the controversy between MM and MBH. This committee claimed there was improper use of principle component analysis in MBH’s work. Wegman’s report hasn’t been widely publicized. In addition, according to Wegman[i], he has been personally slandered and called a patsy for the Republican Party – even though he has stated publicly that he voted for Al Gore in Why do you believe this material hasn’t been made more public? Should inaccurate mathematical details remain hidden if it results in creating a better environment?[i] 5)Other scientists have essentially stated that while Mann’s statistical analysis was incorrect; Mann’s conclusion (global warming) is correct and the focus should be on global warming and not the technical details[ii]. Do you agree with this assessment?[ii] 6)Wegman’s report and MM [ p. 8] describe the difficulty of obtaining the original data (and algorithm) from MBH and Nature (where MBH’s article was published). Under a court subpoena, MBH has shared the raw data, however, to date, they have refused to share the code used in conducting Mann’s analysis and no one has been able to perfectly replicate his results. Do you feel that researchers and journals should be required to share data after an article has been published? Does your opinion change if the data collection was paid for by the US government? 7)Do you believe that research involving new/advanced statistical techniques should be reviewed by statisticians before it is published? 8)What can be done to ensure proper information is appropriately communicated to the public? What are the consequences of inaccurate data being highly publicized?

Week 1: Review of Statistics 101 Lab: Making connections between the two sample t-test, ANOVA, and regression Week 2-3: Randomization Tests/Nonparametric Tests Activity: Westvaco discrimination case Week 4-6: Multiple Regression Intro Lab: How much is your car worth? Lab: Population control and economic growth Week 7-9: Designing an Experiment Intro Lab: Weight gain in pigs Lab: Perfection- reaction time tests Week 10-12: Principal Component Analysis Intro Lab: Stock market values Lab: Global warming and the hockey stick graph Week 13 and 14: Final Projects Proposed Course