Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify.

Slides:



Advertisements
Similar presentations
Step three: statistical analyses to test biological hypotheses General protocol continued.
Advertisements

Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.
Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
An Introduction to Multivariate Analysis
Factor Analysis Continued
Chapter Nineteen Factor Analysis.
Dimension reduction (1)
Visualizing and Exploring Data Summary statistics for data (mean, median, mode, quartile, variance, skewnes) Distribution of values for single variables.
Chapter 17 Overview of Multivariate Analysis Methods
Lecture 7: Principal component analysis (PCA)
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Lecture 6 Ordination Ordination contains a number of techniques to classify data according to predefined standards. The simplest ordination technique is.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
MANOVA LDF & MANOVA Geometric example of MANVOA & multivariate power MANOVA dimensionality Follow-up analyses if k > 2 Factorial MANOVA.
10/17/071 Read: Ch. 15, GSF Comparing Ecological Communities Part Two: Ordination.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #18.
Multivariate Analysis Techniques
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Correspondence Analysis Chapter 14.
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Some matrix stuff.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #19.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
Available at Chapter 13 Multivariate Analysis BCB 702: Biostatistics
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
1 Hair, Babin, Money & Samouel, Essentials of Business Research, Wiley, Learning Objectives: 1.Explain the difference between dependence and interdependence.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Lecture 12 Factor Analysis.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
Education 795 Class Notes Factor Analysis Note set 6.
Principle Component Analysis and its use in MA clustering Lecture 12.
Copyright © 2011, 2005, 1998, 1993 by Mosby, Inc., an affiliate of Elsevier Inc. Chapter 19: Statistical Analysis for Experimental-Type Research.
Principal Component Analysis (PCA)
Multidimensional Scaling and Correspondence Analysis © 2007 Prentice Hall21-1.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Principal Component Analysis
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Dimension Reduction in Workers Compensation
Principal Component Analysis (PCA)
Multidimensional Scaling and Correspondence Analysis
Descriptive Statistics vs. Factor Analysis
Measuring latent variables
Principal Components Analysis
Principal Component Analysis
Multidimensional Scaling
Chapter_19 Factor Analysis
Correspondence Analysis
Measuring latent variables
Presentation transcript:

Dimension Reduction Methods

statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify complex relationships between cases and/or variables –makes it easier to recognize patterns

identify and describe ‘dimensions’ that underlie the input data –may be more fundamental than those directly measured, and yet hidden from view reduce the dimensionality of the research problem –benefit = simplification; reduce number of variables you have to worry about identifying sets of variables with similar “behaviour” How?

Basic Ideas imagine a point scatter in multivariate space: –the specific values of the numbers used to describe the variables don’t matter –we can do anything we want to the numbers, provided they don’t distort the spatial relationships that exist among cases some kinds of manipulations help us think about the shape of the scatter in more productive ways

imagine a two dimensional scatter of points that show a high degree of correlation … x y bar-x bar-y orthogonal regression…

Why bother? more “efficient” description –1 st var. captures max. variance –2 nd var. captures the max. amount of residual variance, at right angles (orthogonal) to the first the 1 st var. may capture so much of the information content in the original data set that we can ignore the remaining axis

other advantages… you can score original cases (and variables) in new space, and plot them… spatial arrangements may reveal relationships that were hidden in higher dimension space may reveal subsets of variables based on correlations with new axes…

length width

“size” “shape”

Storage / Cooking Cooking PUBLIC PRIVATE DOMESTIC RITUAL Ritual candelero Service?

Principal Components Analysis (PCA) why: clarify relationships among variables clarify relationships among cases when: significant correlations exist among variables how: define new axes (components) examine correlation between axes and variables find scores of cases on new axes

r = 0r = -1r = 1 x4x4 x3x3 x2x2 x1x1 pc 2 pc 1 component loading eigenvalue: sum of all squared loadings on one component

eigenvalues the sum of all eigenvalues = 100% of variance in original data proportion accounted for by each eigenvalue = ev/n (n = # of vars.) correlation matrix; variance in each variable = 1 –if an eigenvalue < 1, it explains less variance than one of the original variables –but.7 may be a better threshold… ‘scree plots’ – show trade-off between loss of information, and simplification

Mandara Region knife morphology

J. Yellen – San ethnoarchaeology (1977) CAMP:the camp identification number (1-16.) LENGTH:the total number of days the camp was occupied. INDIVID:the number of individuals in the principal period of occupation of the camp. Note that not all individuals were at the camp for the entire LENGTH of occupation. FAMILY:the number of families occupying the site. ALS:the absolute limit of scatter; the total area (m²) over which debris was scattered. BONE:the number of animal bone fragments recovered from the site. PERS_DAY:the actual number of person-days of occupation (not the product of INDIVID*LENGTH—not all individuals were at the camp for the entire time.)

Correspondence Analysis (CA) like a special case of PCA — transforms a table of numerical data into a graphic summary hopefully a simplified, more interpretable display  deeper understanding of the fundamental relationships/structure inherent in the data a map of basic relationships, with much of the “noise” eliminated usually reduces the dimensionality of the data…

derived from methods of contingency table analysis  most suited for analysis of categorical data: counts, presence-absence data possibly better to use PCA for continuous (i.e., ratio) data but, CA makes no assumptions about the distribution of the input variables… CA – basic ideas

simultaneously R and Q mode analysis derives two sets of eigenvalues and eigenvectors (  CA axes; analogous to PCA components) input data is scaled so that both sets of eigenvectors occupy very comparable spaces can reasonably compare both variables and cases in the same plots

CA output CA (factor) scores –for both cases and variables percentage of total inertia per axis –like variance in PCA; relates to dispersal of points around an average value –inertia accounted for by each axis  distortion in a graphic display loadings –correlations between rows/columns and axes –which of the original entities are best accounted for by what axis?

“mass” as in PCA new axes maximize the spread of observations in rows / columns –spread is measured in inertia, not variance –based on a “chi-squared” distance, and is assessed separately for cases and variables (rows and columns) contributions to the definition of CA axes is weighted on the basis of row/column totals –ex: pottery counts from different assemblages; larger collections will have more influence than smaller ones

“Israeli political economic concerns” residential codes: As/Af (Asia or Africa) Eu/Am (Europe or America) Is/AA (Israel, dad lives in Asia or Africa) Is/EA (Israel, dad lives in Europe or America) Is/Is (Israel, dad lives in Israel)

“Israeli political economic concerns” “worry” codes ENREnlisted relative SABSabotage MILMilitary situation POLPolitical situation ECOEconomic situation OTHOther MTOMore than one worry PERPersonal economics

Ksar Akil – Up. Pal., Lebanon

Data> Frequency> COUNT Statistics> Data Reduction> CA

Multidimensional Scaling (MDS) aim: define low-dimension space that preserves the distance between cases in original high-dimension space… closely related to CA/PCA, but with an iterative location-shifting procedure… –may produce a lower-dimension solution than CA/PCA –not simultaneously Q and R mode…

ABCD ABCD ‘non-metric’ MDS ‘metric’ MDS

“Shepard Diagram”

Discriminant Analysis (DFA) aims: –calculate a function that maximizes the ability to discriminate among 2 or more groups, based on a set of descriptive variables –assess variables in terms of their relative importance and relevance to discrimination –classify new cases not included in the original analysis

var A var B

DFA DFs = groups-1 –each subsequent function is orthogonal to the last –associated with eigenvalues that reflect how much ‘work’ each function does in discriminating between groups stepwise vs. complete DFA

Figure 6.5: Factor structure coefficients: These values show the correlation between Miccaotli ceramic categories and the first two discriminant functions. Categories exhibiting high positive or negative values are the most important for discriminating among A-clusters.

Figure 6.4: Case scores calculated for the first two functions generated by discriminant analysis, using Miccaotli A-cluster membership as the grouping variable and posterior estimates of ceramic category proportions as discriminating variables.

Figure 6.6: Factor structure coefficients generated by four separate DFA analyses using binary grouping variables derived from Miccaotli A-cluster memberships. A single discriminant function is associated with each A- cluster.