Statistical Analysis Regression & Correlation Psyc 250 Winter, 2008
Review: Types of Variables & Steps in Analysis
Variables & Statistical Tests Variable TypeExampleCommon Stat Method Nominal by nominal Blood type by gender Chi-square Scale by nominalGPA by gender GPA by major T-test Analysis of Variance Scale by scaleWeight by height GPA by SAT Regression Correlation
Evaluating an hypothesis Step 1: What is the relationship in the sample? Step 2: How confidently can one generalize from the sample to the universe from which it comes? p <.05
Evaluating an hypothesis Relationship in Sample Statistical Significance 2 nom. vars.Cross-tab / contingency table “p value” from Chi Square Scale dep. & 2-cat indep. Means for each category “p value” from t- test Scale dep. & 3+ cat indep. Means for each category “p value” from ANOVA 2 scale vars.Regression line Correlation r & r 2 “p value” from reg or correlation
Evaluating an hypothesis Relationship in Sample Statistical Significance 2 nom. vars.Cross-tab / contingency table “p value” from Chi Square Scale dep. & 2-cat indep. Means for each category “p value” from t- test Scale dep. & 3+ cat indep. Means for each category “p value” from ANOVA 2 scale vars.Regression line Correlation r & r 2 “p value” from reg or correlation
Relationships between Scale Variables Regression Correlation
Regression Amount that a dependent variable increases (or decreases) for each unit increase in an independent variable. Expressed as equation for a line – y = m(x) + b – the “regression line” Interpret by slope of the line: m (Or: interpret by “odds ratio” in “logistic regression”)
Correlation Strength of association of scale measures r = -1 to 0 to perfect positive correlation -1 perfect negative correlation 0 no correlation Interpret r in terms of variance
Mean & Variance
Survey of Class n = 42 Height Mother’s height Mother’s education SAT Estimate IQ Well-being (7 pt. Likert) Weight Father’s education Family income G.P.A. Health (7 pt. Likert)
Frequency Table for:HEIGHT Valid Cum Value Label Value Frequency Percent Percent Percent Total Valid cases 42 Missing cases 0
Frequency Table for:HEIGHT Valid Cum Value Label Value Frequency Percent Percent Percent Total Valid cases 42 Missing cases 0 Descriptive Statistics for:HEIGHT Valid Variable Mean Std Dev Variance Range Minimum Maximum N HEIGHT mean
Variance x i - Mean ) 2 Variance = s 2 = N Standard Deviation = s = variance
Frequency Table for:WEIGHT Valid Cum Value Label Value Frequency Percent Percent Percent Total Valid cases 42 Missing cases 0 Descriptive Statistics for:WEIGHT Valid Variable Mean Std Dev Variance Range Minimum Maximum N WEIGHT mean
Relationship of weight & height: Regression Analysis
“Least Squares” Regression Line Dependent = ( B ) (Independent) + constant weight = ( B ) ( height ) + constant
Regression line
Regression:WEIGHTonHEIGHT Multiple R R Square Adjusted R Square Standard Error Analysis of Variance DF Sum of Squares Mean Square Regression Residual F = Signif F = Variables in the Equation Variable B SE B Beta T Sig T HEIGHT (Constant) [ Equation:Weight = 3.3 ( height ) - 73 ]
Regression line W = 3.3 H - 73
Strength of Relationship “Goodness of Fit”: Correlation How well does the regression line “fit” the data?
Frequency Table for:WEIGHT Valid Cum Value Label Value Frequency Percent Percent Percent Total Valid cases 42 Missing cases 0 Descriptive Statistics for:WEIGHT Valid Variable Mean Std Dev Variance Range Minimum Maximum N WEIGHT mean
Variance = 454
Regression line mean
Correlation: “Goodness of Fit” Variance (average sum of squared distances from mean) = 454 “Least squares” (average sum of squared distances from regression line) = – 295 = / 454 =.35 Variance is reduced 35% by calculating from regression line
r 2 = % of variance in WEIGHT “explained” by HEIGHT Correlation coefficient = r
Correlation:HEIGHTwith WEIGHT HEIGHT WEIGHT HEIGHT ( 42) ( 42) P=. P=.000 WEIGHT ( 42) ( 42) P=.000 P=.
r =.59 r 2 =.35 HEIGHT “explains” 35% of variance in WEIGHT
Sentence & G.P.A. Regression: form of relationship Correlation: strength of relationship p value: statistical significance
G. P. A.
Length of Sentence (simulated data)
Scatterplot: Sentence on G.P.A.
Regression Coefficients Sentence = -3.5 G.P.A. + 18
Sent = -3.5 GPA + 18 “Least Squares” Regression Line
Correlation: Sentence & G.P.A.
Interpreting Correlations r = -22p =.31 r 2 =.05 G.P.A. “explains” 5% of the variance in length of sentence
Write Results “A regression analysis finds that each higher unit of GPA is associated with a 3.5 month decrease in sentence length, but this correlation was low (r = -.22) and not statistically significant (p =.31).”
Multiple Regression Problem: relationship of weight and calorie consumption Both weight and calorie consumption related to height Need to “control for” height
Regression line mean Multiple Regression
Regress weight residuals (dependent variable) on height (independent variable) Statistically “controls” for height: removes effect or “confound” of height.