Descriptions. Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction.

Slides:



Advertisements
Similar presentations
Statistical Techniques I EXST7005 Multiple Regression.
Advertisements

Logistic Regression Psy 524 Ainsworth.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Review ? ? ? I am examining differences in the mean between groups
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Regression single and multiple. Overview Defined: A model for predicting one variable from other variable(s). Variables:IV(s) is continuous, DV is continuous.
Moderation: Assumptions
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Multiple Regression Models Advantages of multiple regression Important preliminary analyses Parts of a multiple regression model & interpretation Differences.
Basic Statistical Concepts
Statistics Psych 231: Research Methods in Psychology.
Lecture 6: Multiple Regression
Multiple Regression.
Multiple Regression Research Methods and Statistics.
Correlation and Regression Analysis
Multiple Regression Dr. Andy Field.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Relationships Among Variables
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
Alcohol consumption and HDI story TotalBeerWineSpiritsOtherHDI Lifetime span Austria13,246,74,11,60,40,75580,119 Finland12,524,592,242,820,310,80079,724.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Understanding Regression Analysis Basics. Copyright © 2014 Pearson Education, Inc Learning Objectives To understand the basic concept of prediction.
Regression Analyses II Mediation & Moderation. Review of Regression Multiple IVs but single DV Y’ = a+b1X1 + b2X2 + b3X3...bkXk Where k is the number.
Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
The Goal of MLR  Types of research questions answered through MLR analysis:  How accurately can something be predicted with a set of IV’s? (ex. predicting.
Regression Analyses. Multiple IVs Single DV (continuous) Generalization of simple linear regression Y’ = b 0 + b 1 X 1 + b 2 X 2 + b 3 X 3...b k X k Where.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Chapter 16 Data Analysis: Testing for Associations.
Adjusted from slides attributed to Andrew Ainsworth
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
» So, I’ve got all this data…what now? » Data screening – important to check for errors, assumptions, and outliers. » What’s the most important? ˃Depends.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Linear Regression Chapter 8. Slide 2 What is Regression? A way of predicting the value of one variable from another. – It is a hypothetical model of the.
Assumptions 5.4 Data Screening. Assumptions Parametric tests based on the normal distribution assume: – Independence – Additivity and linearity – Normality.
Linear Regression Chapter 7. Slide 2 What is Regression? A way of predicting the value of one variable from another. – It is a hypothetical model of the.
Multiple Regression David A. Kenny January 12, 2014.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Venn diagram shows (R 2 ) the amount of variance in Y that is explained by X. Unexplained Variance in Y. (1-R 2 ) =.36, 36% R 2 =.64 (64%)
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Multiple Linear Regression An introduction, some assumptions, and then model reduction 1.
PROFILE ANALYSIS. Profile Analysis Main Point: Repeated measures multivariate analysis One/Several DVs all measured on the same scale.
Chapter 17 Basic Multivariate Techniques Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Multiple Regression Scott Hudson January 24, 2011.
Chapter 11 REGRESSION Multiple Regression  Uses  Explanation  Prediction.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
Correlation, Bivariate Regression, and Multiple Regression
Multiple Regression – Part I
Multiple Regression.
Multiple Regression.
Regression.
12 Inferential Analysis.
Multiple Regression – Part II
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
12 Inferential Analysis.
Regression Analysis.
Presentation transcript:

Descriptions

Description Correlation – simply finding the relationship between two scores ○ Both the magnitude (how strong or how big) ○ And direction (positive / negative)

Description  Whereas regression seeks to use one of the variables as the predictor Therefore you have an X variable (IV) - predictor And Y variable (DV) - criterion

Description  Predictor X – variables – more flexible than ANOVA Can be any combination of variables, continuous, Likert, categorical  Dependent Y-variables – usually continuous, but you can predict categorical variables Better with discriminant or log regression

Description  Still not causal design, unless you manipulate the X (IV) variable  However, sometimes very obvious which variable would be predictive Smoking predicts cancer

Research Questions

 Usually want to know the relationship between IV and DV and the importance of each IV  OR Control for some variables variance and then see if other IVs add any additional prediction Compare sets of IV and how predictive they are (which is better)

Research Questions  How good is the equation? Is it better than chance? Or better than using the mean to predict scores?

Research Questions  Importance of IVs Which IVs are the most important? Which contribute the most prediction to the equation?

Research Questions  Adding IVs For example, PTSD scores are predictive of alcohol use After we control for these scores, do meaning in life scores help predict alcohol use?

Research Questions  Non-linear relationships can be assessed and determined So, you can use X 2 to help with curvilinear relationships that you might see when data screening

Research Questions  Controlling for other sets of IVs Using demographics to control for unequal groups or additional variance over being people  Comparing sets of IVs Using several IVs together to be predictive over another set of IVs

Research Questions  Making an equation to predict new people’s scores After you have shown that your IVs are predictive, using those scores to assess new people’s performance Entrance exams for school, military, etc

Equation  Y-hat = A + B1X1 + B2X2 + … Y hat = predicted value for each participant A = constant, value added to each score to predict participants zero (y- intercept)

Equation  Y-hat = A + B1X1 + B2X2 + … B = coefficient ○ Holding all other variables constant for every one unit increase in X there is a B unit increase in Y ○ Slope for that X variable given all others are zero

Equation  Standardized Equation Y-hat = βx1 + βx2 … Beta = standardized B (or z-score B if you like) For each 1 standard deviation increase in X, there is a B standard deviation increase in Y ○ Difficult to interpret ○ BUT! B is standardized to -1 to 1 so you can treat it as if it were r (which means you can tell direction and magnitude)

Equation  Pearson product – moment correlation = R R is the correlation between y and y-hat R 2 = variance accounted for in DV by all the IVs (not just one like r, but ALL of them).

SR  Semipartial correlations = sr = part in SPSS Unique contribution of IV to R2 for those IVs Increase in proportion of explained Y variance when X is added to the equation A/DV variance DV Variance IV 1 IV 2 A

PR  Partial correlation = pr = partial in SPSS Proportion in variance in Y not explained by other predictors but this X only A/B Pr > sr DV Variance IV 1 IV 2 A B

ANOVA = Regression  ANOVA = Regression with discrete variables However, you cannot easily create a ANOVA from a regression Must convert continuous variables into discrete variables, which causes you to lose variance More power with regression

Simple (SLR)  SLR involves only one IV and one DV. It’s called simple because there’s only ONE thing predicting. In this case, beta = r.

Multiple (MLR)  MLR uses several IVs and only one DV. You can use a mix of variables – continuous, categorical, Likert, etc. You can use MLR to figure out which IVs are the most important. ○ 3 Types MLR

Simultaneous/Standard  All of the variables are entered “at once”  Each variable assessed as if it were the last variable entered This “controls” for the other IVs, as we talked about the interpretation of B. Evaluates sr > 0?

Simultaneous/Standard  If you have two highly correlated IVs the one with the biggest sr gets all the variance  Therefore the other IV will get very little variance associated with it and look unimportant

Sequential/Hierarchical  IVs enter the regression equation in an order specified by the researcher  First IV is basically tested against r (since there’s nothing else in the equation it gets all the variance)  Next IVs are tested against pr (they only get the left over variance)

Sequential/Hierarchical  What order? Assigned by theoretical importance Or you can control for nuisance variables in the first step

Sequential/Hierarchical  Using SETS of IVs instead of individuals So, say you have a group of IVs that are super highly correlated but you don’t know how to combine them or want to eliminate them.  Instead you will process each step as a SET and you don’t care about each individual predictor

Stepwise/Statistical  Entry into the equation is solely based on statistical relationship and nothing to do with theory or your experiment

Stepwise/Statistical  Forward – biggest IV is added first, then each IV is added as long as it accounts for enough variance  Backward – all are entered in the equation at first, and then each one is removed if it doesn’t account for enough variance  Stepwise – mix between the two (adds them but then may later delete them if they are no longer important).

Number of People  Ratio of cases to IVs If you have less cases than IVs you will get a perfect solution (aka account for all the variance in the DV) But that doesn’t mean anything…

Number of People  Ratio of cases to IVs Gpower = for how many cases given alpha, power, predictors, etc. Rules of thumb = more than (K) (number of IVs) Or K (for testing importance of predictors)

Number of People  How many people? However…you can have too many people. Any correlation or predictor will be significant with very large N ○ Practical versus statistical significance

Missing Data  Continuous data – linear trend at point, mean replace, etc.  Categorical data – best to leave it out because you can’t guess at it.

Outliers  Now, since IVs are continuous, we want to make sure there are not outliers on both the IVs and DVs Mahalanobis

Outliers  Leverage – how much influence over the slope a point has Cut off rule of thumb = (2K+2)/N  Discrepancy – how far away from other data points a point is (no influence)  Cooks – influence – combination of both leverage and discrepancy Cut off rule of thumb = 4/(N-K-1)

Multicollinearity  If IVs are too highly correlated there are several issues SPSS may not run SPSS picks which variable to go first depending on the type of analysis  Check – bivariate correlation table of IVs (you want it to be correlated with DV!)

Normal/Linear  Normality – we want our IVs and DVs to be normally distributed Residual Histogram  Linearity – relationships between IV and DV should be linear or you will do a special X2 Normality PP Plot

Homogeneity/Homoscedasticity  Homogeneity – you want the IVs/DVs to have equal variances Residual Plot (equal spread up and down - raining)  Homoscedasticity – you want the errors to be spread evenly across the values of the other variables Residual Plot (equal spread up and down across the bottom – megaphones)

Theoretical Assumption  Independence of errors You need to know that the scores of the first person tested are not affecting the scores of the last person tested Mud on a scale

SLR  Data set 1  IV Books – number of books people read Attend – attendance for class  DV Grade – final grade in the class

SLR  Research Question: Does the number of books predict final grade in the course? Does attendance predict final grade in the course?

MLR - Simultaneous  Research Question Do books and attendance both predict final course grade? ○ Overall – together? ○ Individual predictors?

MLR – Hierarchical  Research question: What predicts how well people take care of their cars? We want to first control for demographics (age, gender) And then use extroversion to predict how well people take care of their cars.

MLR Hierarchical  So after controlling for demographics, does extroversion predict?

Interactions  Dummy Coding  Types Two categorical One categorical, One continuous Two continuous

Dummy Coding  A way to do ANOVA in regression If you have two levels, simply type them in as 0 and 1 If you have more than two levels, you need to enter each separately

Dummy Coding  More than two levels: You will need Levels – 1 columns F – value tells you the overall main effect B value – compares that group to the group coded as all zeros

Dummy Coding  After you enter each variable separately, then enter them as a set (or one simultaneous) regression  The significance of the overall model will tell if you if the main effect is significant  B gives you differences between groups (two levels)

Dummy Coding  How many friends do people have? This example is from ANOVA. IV: Health condition – excellent, fair or poor. DV: Number of Friends.

Dummy Coding  Since we have three groups or levels, we’ll need to recode this variable into 2 variables. One for excellent One for fair The blanks for poor.

Dummy Coding  Why not three? Because that would be repetitive.

Interactions  Interactions – well we automatically test for interactions in ANOVA, why not in regression? In regression an interaction says that there are differences in the slope of the line predicting Y from one IV depending on the level of the other IV

Interactions  Nominal variable interactions: So we have two categorical predictors. Example – create interaction term ○ Testing environment by Learning Environment.

Interactions - Nominal  Now that we’ve created our interaction terms, we can test them using a hierarchical regression Step one – main effects Step two – main effects and interactions

Interactions - Nominal  Now we examine step 1 for main effects  Step two for interactions You ignore the main effects in Step 2

Interactions - Nominal  What does all that mean?! After a significant ANOVA, you do a post hoc correct? Simple slopes – post hoc analyses for interactions in regression ○ These are “harder to get” than an ANOVA, but there are less “tests” to run so technically more powerful/less type 1 error

Interactions - Nominal  You will write out the equation and figure out the slopes/means/picture for each condition combination.  Equation = (learning) (testing) (learning X testing) 

Interactions - Nominal  Now we’ll fill in the equation for all the combinations. Learning (0 or 1) Testing (0 or 1) Interaction (0 or 1 depending on the combination).

Interaction - Nominal Dry (0)Wet (1) Dry (0) Wet (1)

Interactions - Mix  Data Set 4 IVs Events – number of events attended Status – low (0) versus high (1) DVs Stress levels

How to  Create interaction Transform > compute > multiply  Run regression as before Step 1 – main effects Step 2 – main effects and interaction

Interactions - Mix  LOW status, look at events slope. B =.121, β =.52, t(57) =3.94, p<.001, indicating that low status people feel more stress as the number of events they attend increases.  HIGH status, look at events slope.  B =.02, β =.10, t(57) =.55, p.=58, indicating that high status people feel the same amount of stress no matter how many events they attend.

Interaction - Mix Low EventsHigh Events Low Status High Status

Interactions - continuous  Most likely combination since you are running a regression Create interaction term first (multiply them together) Books * Attendance Interaction to predict grades.

Interactions – continuous  Pick ONE variable to examine. Let’s go with attendance. You can get the AVERAGE slope for attendance and books. Since we picked attendance, we will look at the slope for books, β=-.532, t(37) = , p=.24. So at average attendance, readings books do not increase your grade.  Let’s create hi and lo terms for ONE of the variables. AttendanceHI, AttendanceLO AttendanceHI by Books, AttendanceLO by Books.

Interactions - continuous  Now, we can’t just use 1 and 0 for different groups So we have to create “hi” and “lo” groups for one variable This theory is also backwards…for the hi group, you subtract 1 SD, for the lo group you add 1SD Basically you are bringing them up or down to the mean

Interaction

Mediation

 Mediation occurs when the relationship between an X variable and a Y variable is eliminated or lowered when an additional Mediator variable is added to the equation.

Mediation Steps  Baron and Kenny Step 1 – use X to predict Y to get c pathway. Step 2 – use X to predict M to get a pathway. Step 3 – use X and M to predict Y to get b pathway. Step 4 – use the same regression to look at the c’ pathway.  Sobel test

Mediation Steps