An Introduction to Classification Classification vs. Prediction Classification & ANOVA Classification Cutoffs, Errors, etc. Multivariate Classification.

Slides:



Advertisements
Similar presentations
Transformations & Data Cleaning
Advertisements

Canonical Correlation simple correlation -- y 1 = x 1 multiple correlation -- y 1 = x 1 x 2 x 3 canonical correlation -- y 1 y 2 y 3 = x 1 x 2 x 3 The.
Survey of Major Correlational Models/Questions Still the first & most important is picking the correct model… Simple correlation questions simple correlation.
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Research Hypotheses and Multiple Regression: 2 Comparing model performance across populations Comparing model performance across criteria.
Survey of Major Correlational Models/Questions simple correlation questions obtaining and comparing bivariate correlations statistical control questions.
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control.
Logistic Regression.
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
Describing Relationships Using Correlation and Regression
Research Hypotheses and Multiple Regression Kinds of multiple regression questions Ways of forming reduced models Comparing “nested” models Comparing “non-nested”
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Linear Discriminant Function LDF & MANOVA LDF & Multiple Regression Geometric example of LDF & multivariate power Evaluating & reporting LDF results 3.
Regression Models w/ k-group & Quant Variables Sources of data for this model Variations of this model Main effects version of the model –Interpreting.
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control.
Multiple Regression Models Advantages of multiple regression Important preliminary analyses Parts of a multiple regression model & interpretation Differences.
Multiple Regression Models: Some Details & Surprises Review of raw & standardized models Differences between r, b & β Bivariate & Multivariate patterns.
Multiple Regression Models
Survey of Major Correlational Models/Questions Still the first & most important is picking the correct model… Simple correlation questions simple correlation.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Simple Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas –quantitative vs. binary predictor.
MANOVA LDF & MANOVA Geometric example of MANVOA & multivariate power MANOVA dimensionality Follow-up analyses if k > 2 Factorial MANOVA.
Multiple-group linear discriminant function maximum & contributing ldf dimensions concentrated & diffuse ldf structures follow-up analyses evaluating &
Simple Correlation Scatterplots & r Interpreting r Outcomes vs. RH:
Intro to Parametric & Nonparametric Statistics “Kinds” of statistics often called nonparametric statistics Defining parametric and nonparametric statistics.
Bivariate & Multivariate Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas process of.
Simple Correlation RH: correlation questions/hypotheses and related statistical analyses –simple correlation –differences between correlations in a group.
Parametric & Nonparametric Models for Within-Groups Comparisons overview X 2 tests parametric & nonparametric stats Mann-Whitney U-test Kruskal-Wallis.
Parametric & Nonparametric Models for Univariate Tests, Tests of Association & Between Groups Comparisons Models we will consider Univariate Statistics.
Chapter 9 - Lecture 2 Computing the analysis of variance for simple experiments (single factor, unrelated groups experiments).
Today Concepts underlying inferential statistics
2x2 BG Factorial Designs Definition and advantage of factorial research designs 5 terms necessary to understand factorial designs 5 patterns of factorial.
Intro to Parametric & Nonparametric Statistics Kinds & definitions of nonparametric statistics Where parametric stats come from Consequences of parametric.
Multiple Regression Research Methods and Statistics.
Intro to Parametric Statistics, Assumptions & Degrees of Freedom Some terms we will need Normal Distributions Degrees of freedom Z-values of individual.
Discriminant Analysis Testing latent variables as predictors of groups.
Relationships Among Variables
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Statistics & Statistical Tests: Assumptions & Conclusions Kinds of degrees of freedom Kinds of Distributions Kinds of Statistics & assumptions required.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Hypothesis Testing:.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.
Coding Multiple Category Variables for Inclusion in Multiple Regression More kinds of predictors for our multiple regression models Some review of interpreting.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Regression Models w/ 2 Quant Variables Sources of data for this model Variations of this model Main effects version of the model –Interpreting the regression.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Adjusted from slides attributed to Andrew Ainsworth
Regression Models w/ 2-group & Quant Variables Sources of data for this model Variations of this model Main effects version of the model –Interpreting.
Plotting Linear Main Effects Models Interpreting 1 st order terms w/ & w/o interactions Coding & centering… gotta? oughta? Plotting single-predictor models.
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
Simple Correlation Scatterplots & r –scatterplots for continuous - binary relationships –H0: & RH: –Non-linearity Interpreting r Outcomes vs. RH: –Supporting.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, partial correlations & multiple regression Using model plotting to think about ANCOVA & Statistical control Homogeneity.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
General Linear Model What is the General Linear Model?? A bit of history & our goals Kinds of variables & effects It’s all about the model (not the data)!!
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
Linear Discriminant Function Classification vs. Prediction Classification & ANOVA Classification Cutoffs, Errors, etc. Multivariate Classification & LDF.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Multiple Regression.
Chapter 7. Classification and Prediction
Introduction to Regression Analysis
Data, Univariate Statistics & Statistical Inference
Multiple Regression.
CHAPTER 29: Multiple Regression*
Multiple Regression.
Logistic Regression.
Presentation transcript:

An Introduction to Classification Classification vs. Prediction Classification & ANOVA Classification Cutoffs, Errors, etc. Multivariate Classification & Linear Discriminant Function

An Introduction to Classification Let’s start by reviewing what “prediction” is… Using a person’s scores on one or more variables to make a “best guess” of the that person’s score on another variable (the value of which isn’t known) Classification is very similar … Using a person’s scores on one or more variables to make a “best guess” of the category to which that person belongs (when the category type isn’t known). The difference -- a language “convention” if the “unknown variable” is quantitative -- its called prediction if the “unknown variable” is qualitative -- its called classification

How does classification work??? Let’s start with an “old friend” -- ANOVA In its usual form… There are two qualitatively different IV groups naturally occurring or “created” by manipulation A quantitative DV H0: Mean G1 = Mean G2 Rejecting H0: tells us There is a relationship between the grouping and DV Groups represent populations with different means on the DV Knowing what group a person in allows us to guess their DV score -- mean of that group

Let’s review in a little more detail… Remember the formula for the ANOVA F-test variation between groups size of the mean difference F = = variation within groups variation within groups In words -- F compares the mean difference to the variability around each of those means Which of the following will produce the larger F-test ? The two data sets have the same means, mean difference & N but the difference is…. Data #1 n = 50) group 1 mean = 30 std dev = 5 group 2 mean = 50 std dev = 5 Data #2 n = 50) group 1 mean = 30 std dev = 15 group 2 mean = 50 std dev = 15

Graphical depictions of these data show that the size of F relates to the amount of overlap between the groups Notice: Since all the distributions have n=50, those with more variability are not as tall -- all 4 distributions have the same area Data #1 Data #2 Larger F = more consistent grp dif Smaller F = less consistent grp dif

Let’s consider that last one “in reverse”… Could knowing the person’s score help tell us what qualitative group they are in? …to “classify” them to the proper group? an Example… Research has revealed a statistical relationship between the number of times a person laughs out loud each day (quant variable) and whether they are depressed or schizophrenic (qual grouping variable). Mean laughs Depressed = 4.0 Mean laughs Schizophrenic = 7.0 F(1,34) = 7.00, p <.05 A new (as yet undiagnosed) patient laughs 11 times the first day what’s your “classification” depressed or schizophrenic? Another patient laughs 1 time -- your “classification”? A third new patient laughs 5 times -- your “classification”?

Why were the first two “gimmies” and the last one not? When the groups have a mean difference, a score beyond one of the group means is more likely to belong to that group than to belong to the other group (unless stds are huge) someone who laughs more than the mean for the schizophrenic group is more likely to be schizohrenic than to be depressed someone who laughs less than the mean of the depressive group is more likely to be depressed than to be schizophrenic Even when the groups have a mean difference, a score between the group means is harder to correctly classify (unless stds are miniscule) someone with 5-6 laughs are hardest to classify, because several depressed and schizophrenic folks have this score

Here’s a graphical depiction of the clinical data... X 18 dep. patients x x x mean laughs = 4.0 x x x x x x x x x x x x x x laughs --> o o o o 18 schiz. patients o o o o o mean laughs = 7.0 o o o o o o o o o Looking at this, its easy to see why we would be... confidant in an assignment based on 11 laughs no depressed patients had a score that high confident in an assignment based on 1 laugh no schizophrenic patients had a score that low lacking confidence in an assignment based on 5 or 6 laughs several depressed & schizophrenic patients had 5 or 6

The process of prediction required two things… that there be a linear relationship between the predictor and the criterion (reject H0: r = 0) a formula (y’ = bx + a) to “translate” a predictor score into an estimate of a criterion variable score Similarly, the process of classification requires two things … a statistical relationship between the predictor (DV) & criterion (reject H0: M 1 = M 2 ) a cutoff to “translate” a person’s score on the predictor (DV) into an assignment to one group or the other where should be place the cutoff??? Wherever gives us the most accurate classification !!

X 18 dep. patients x x x mean laughs = 4.0 x x x x x x x x x x x x x x laughs --> o o o o 18 schiz. patients o o o o o mean laughs = 7.0 o o o o o o o o o When your groups are the same size and your group score distributions are symmetrical, things are pretty easy… place the cutoff at a position equidistant from the group means here, the cutoff would be equidistant between 4.0 and 7.0 anyone who laughs more than 5.5 times would be “assigned” as schizophrenic anyone who laughs fewer than 5.5 times would be “assigned” as depressed

x 18 dep. patients x x x mean laughs = 4.0 x x x x x x x x x x x x x x laughs --> o o o o 18 schiz. patients o o o o o mean laughs = 7.0 o o o o o o o o o We can assess the accuracy of the assignments by building a “reclassification table” Actual Diagnosis Assignment Depressed Schizophrenic Depressed 14 4 Schizophrenic 4 14 reclassification accuracy would be 28/36 = 77.78%

When considering simple regression/prediction, we wanted to be able to compare two potential predictors to determine if one would be better -- we used Steiger’s Z-test of H0: r y,x1 = r y,x2 How do we compare two potential classification variables to determine if one is a better basis for accurate classification ? We do it the same way (with one intermediate step) As you might remember from ANOVA, we can express the “effect size” associated with any F as r (or  - same thing) r =  [ F / (F + df error )] So, to compare two potential classification variables compute the ANOVA for each variable (on same sample) convert each F to r compare the r values using Steiger’s Z-test remember that you’ll need the correlation between the two classification variables (r x1,x2 )

An example, Which provides better classification between schiz vs. depression, # times laughing out loud, or score on a “depression scale”? For laughing out loud F(1,34) = translates to r =  [ F / (F + df error )] =  [ 7.0 / ( )] =.413 For the depression scale F(1,34) = translates to r =  [ F / (F + df error )] =  [ 4.0 / ( )] =.324 # laughs and depression scale scores are correlated r =.35 So using Steiger’s Z-test Z =.495, p >.05 so there is no advantage of using one of these predictors over the other the apparent difference in r is not greater than chance

When considering simple regression/prediction, we wanted to be able to compare the same correlation in two different populations - - we used Fisher’s Z-test of H0: r y,x1(pop1) = r y,x1(pop2) How compare the same correlations in two populations to determine if one population would have more accurate assignments ? As you might remember from ANOVA, we can express the “effect size” associated with any F as r (or  - same thing) r =  [ F / (F + df error )]  for each population So, to compare two potential classification variables compute the ANOVA for each sample convert each F to r compare the r values using Fisher’s Z-test

An example, Does # times laughing out loud discriminate between schiz vs. depression better for in-patients or for out-patients. For in-patients F(1,48) = translates to r =  [ F / (F + df error )] =  [ 12.0 / ( )] =.447 Fisher’s Z transformation of  =.45 is.485 For the out-patients F(1,88) = translates to r =  [ F / (F + df error )] =  [ 2.0 / ( )] =.149 Fisher’s Z = 1.86 Z (p=.05) = 1.96, so retain H0: -- same discriminability in both pops

Getting ready for ldf… multiple regression works better than simple regression because a y’ based on multiple predictors is a better estimate of y than a y’ based on a single predictor similarly, classification based on multiple predictors will do better than classification based on a single predictor but, how to incorporate multiple predictors into a classification ?? Like with multiple regression, multiple variables (Xs) are each given a weighting and a constant is added ldf = b 1 * X 1 + b 2 * X 2 + b 3 * X 3 + a the composite variable is called a linear discriminant function function -- constructed from another variables linear -- linear combination of linearly weighted vars discriminant -- weights are chosen so that the resulting has the maximum possible F-test between the groups

So, how does this all work ??? We start with a grouping variable and a set of quantitative (or binary) predictors (what would be DVs if doing ANOVAs) using an algorithm much like multiple regression, the bivariate relationship of predictor to the grouping variable & the collinearities among the predictors are all taken into account and the weights for the ldf formula are derived remember this ldf will have the largest possible F value between the groups a cutoff value for the ldf is chosen the cutoff is chosen (more fancy computation) to maximize % correct reclassification to “use” the formula a person’s values on the variables are put into the formula & their ldf score is computed their score is compared to the cutoff, and they are assigned to one group or the other

“ldf Questions” (very parallel to “regression questions”) ANOVA -- can groups be discriminated using the quant var? Does one quant variable work better than another to discriminate the groups? (Steiger’s Z-test to compare the r for the two quant variables Does a quant variable better discriminate between groups for one population than another? (Fisher’s Z-test to compare the r from the two populations) ldf - can groups be discriminated using a combination of vars? Comparison of nested ldf models (X 2 -change test & McNemar’s X 2 ) Comparison of non-nested ldf models ( McNemar’s X 2 ) Comparison of models across populations (R 2 & McNemar’s X 2 ) Comparison of models across classification rules (McNemar’s X2)