Regression & Correlation
Review: Types of Variables & Steps in Analysis
Types of Variables Nominal / Categorical: each value is distinct category [gender, blood type, city] Scale / Interval: linear measure, same interval between each value [age, weight, IQ, GPA, SAT, income] Ordinal: ranking, un-equal intervals between values [Likert scale, preference ranking]
Variables & Statistical Tests Variable TypeExampleCommon Stat Method Nominal by nominal Blood type by gender Chi-square Scale by nominalGPA by gender GPA by major T-test Analysis of Variance Scale by scaleWeight by height GPA by SAT Regression Correlation
Evaluating an hypothesis Step 1: What is the relationship in the sample? Step 2: How confidently can one generalize from the sample to the universe from which it comes?
Evaluating an hypothesis Relationship in Sample Statistical Significance 2 ordinal vars.Cross-tab / contingency table “p value” from Chi Square Scale dep. & 2-cat indep. Means for each category “p value” from t- test Scale dep. & 3+ cat indep. Means for each category “p value” from ANOVA 2 scale vars.Regression line Correlation “p value” from reg or correlation
Relationships between Scale Variables Regression Correlation
Regression Amount that a dependent variable increases (or decreases) for each unit increase in an independent variable. Expressed as equation for a line – y = m(x) + b – the “regression line” Interpret by slope of the line: m (Or: interpret by “odds ratio” in “logistic regression”)
Correlation Strength of association of scale measures r = -1 to 0 to perfect positive correlation -1 perfect negative correlation 0 no correlation Interpret r in terms of variance
Mean & Variance
Survey of Class n = 42 Height Mother’s height Mother’s education SAT Estimate IQ Well-being (7 pt. Likert) Weight Father’s education Family income G.P.A. Health (7 pt. Likert)
Frequency Table for:HEIGHT Valid Cum Value Label Value Frequency Percent Percent Percent Total Valid cases 42 Missing cases 0
Frequency Table for:HEIGHT Valid Cum Value Label Value Frequency Percent Percent Percent Total Valid cases 42 Missing cases 0 Descriptive Statistics for:HEIGHT Valid Variable Mean Std Dev Variance Range Minimum Maximum N HEIGHT mean
Variance x i - Mean ) 2 Variance = s 2 = N Standard Deviation = s = variance
Frequency Table for:WEIGHT Valid Cum Value Label Value Frequency Percent Percent Percent Total Valid cases 42 Missing cases 0 Descriptive Statistics for:WEIGHT Valid Variable Mean Std Dev Variance Range Minimum Maximum N WEIGHT mean
Relationship of weight & height: Regression Analysis
“Least Squares” Regression Line Dependent = ( B ) (Independent) + constant weight = ( B ) ( height ) + constant
Regression line
Regression:WEIGHTonHEIGHT Multiple R R Square Adjusted R Square Standard Error Analysis of Variance DF Sum of Squares Mean Square Regression Residual F = Signif F = Variables in the Equation Variable B SE B Beta T Sig T HEIGHT (Constant) [ Equation:Weight = 3.3 ( height ) - 73 ]
Regression line W = 3.3 H - 73
Strength of Relationship “Goodness of Fit”: Correlation How well does the regression line “fit” the data?
Frequency Table for:WEIGHT Valid Cum Value Label Value Frequency Percent Percent Percent Total Valid cases 42 Missing cases 0 Descriptive Statistics for:WEIGHT Valid Variable Mean Std Dev Variance Range Minimum Maximum N WEIGHT mean
Variance = 454
Regression line mean
Correlation: “Goodness of Fit” Variance (average sum of squared distances from mean) = 454 “Least squares” (average sum of squared distances from regression line) = – 295 = / 454 =.35 Variance is reduced 35% by calculating from regression line
r 2 = % of variance in WEIGHT “explained” by HEIGHT Correlation coefficient = r
Correlation:HEIGHTwith WEIGHT HEIGHT WEIGHT HEIGHT ( 42) ( 42) P=. P=.000 WEIGHT ( 42) ( 42) P=.000 P=.
r =.59 r 2 =.35 HEIGHT “explains” 35% of variance in WEIGHT
Sentence & G.P.A. Regression: form of relationship Correlation: strength of relationship p value: statistical significance
G. P. A.
Length of Sentence
Scatterplot: Sentence on G.P.A.
Regression Coefficients Sentence = -3.5 G.P.A. + 18
Sent = -3.5 GPA + 18 “Least Squares” Regression Line
Correlation: Sentence & G.P.A.
Interpreting Correlations r = -22p =.31 r 2 =.05 G.P.A. “explains” 5% of the variance in length of sentence