Download presentation
Presentation is loading. Please wait.
Published byBlaise Taylor Modified over 8 years ago
1
Measurement Math DeShon - 2006
2
Univariate Descriptives Mean Mean Variance, standard deviation Variance, standard deviation Skew & Kurtosis Skew & Kurtosis If normal distribution, mean and SD are sufficient statistics If normal distribution, mean and SD are sufficient statistics
3
Normal Distribution
4
Univariate Probability Functions
5
Bivariate Descriptives Mean and SD of each variable and the correlation (ρ) between them are sufficient statistics for a bivariate normal distribution Mean and SD of each variable and the correlation (ρ) between them are sufficient statistics for a bivariate normal distribution Distributions are abstractions or models Distributions are abstractions or models Used to simplify Used to simplify Useful to the extent the assumptions of the model are met Useful to the extent the assumptions of the model are met
6
2D – Ellipse or Scatterplot Galton’s Original Graph
7
3D Probability Density
8
Covariance Covariance is the extent to which two variables co-vary from their respective means Covariance is the extent to which two variables co-vary from their respective means CaseXYx=X-3y =Y-4xy 112-2 4 223 1 3330 0 4683412 Sum 17 Cov(X,Y) = 17/(4-1) = 5.667
9
Covariance Covariance ranges from negative to positive infinity Covariance ranges from negative to positive infinity Variance - Covariance matrix Variance - Covariance matrix Variance is the covariance of a variable with itself Variance is the covariance of a variable with itself
10
Correlation Covariance is an unbounded statistic Covariance is an unbounded statistic Standardize the covariance with the standard deviations Standardize the covariance with the standard deviations -1 ≤ r ≤ 1 -1 ≤ r ≤ 1
11
Correlation Matrix Table 1. Descriptive Statistics for the Variables Correlations VariablesMeans.d12345678910 Self-rated cog ability4.89.86.81 Self-enhancement4.03.85.34.79 Individualism4.92.89.40.41.78 Horiz individualism5.191.05.41.25.82.80 Vert individualism4.651.11.25.42.84.37.72 Collectivism5.05.74.21.11.08.06.72 Age21.001.70.12.01.17.13.16.01-- Gender1.63.49-.16-.06-.11.07-.11-.02-.01-- Academic seniority2.171.01.17.07.22.23.14.06.45.12-- Actual cog ability10.711.60.17-.02.08.11.03.07-.02-.07.12-- Notes: N = 608; gender was coded 1 for male and 2 for female. Reliabilities (Coefficient alpha) are on the diagonal.
12
Coefficient of Determination r = percentage of variance in Y accounted for by X r 2 = percentage of variance in Y accounted for by X Ranges from 0 to 1 (positive only) Ranges from 0 to 1 (positive only) This number is a meaningful proportion This number is a meaningful proportion
13
Other measures of association Point Biserial Correlation Point Biserial Correlation Biserial Correlation Biserial Correlation Tetrachoric Correlation Tetrachoric Correlation binary variables binary variables Polychoric Correlation Polychoric Correlation ordinal variables ordinal variables Odds Ratio Odds Ratio binary variables binary variables
14
Point Biserial Correlation Used when one variable is a natural (real) dichotomy (two categories) and the other variable is interval or continuous Used when one variable is a natural (real) dichotomy (two categories) and the other variable is interval or continuous Just a normal correlation between a continuous and a dichotomous variable Just a normal correlation between a continuous and a dichotomous variable
15
Biserial Correlation Biserial Correlation When one variable is an artificial dichotomy (two categories) and the criterion variable is interval or continuous When one variable is an artificial dichotomy (two categories) and the criterion variable is interval or continuous
16
Tetrachoric Correlation Estimates what the correlation between two binary variables would be if you could measure variables on a continuous scale. Estimates what the correlation between two binary variables would be if you could measure variables on a continuous scale. Example: difficulty walking up 10 steps and difficulty lifting 10 lbs. Example: difficulty walking up 10 steps and difficulty lifting 10 lbs. Difficulty Walking Up 10 Steps
17
Tetrachoric Correlation Assumes that both “traits” are normally distributed Assumes that both “traits” are normally distributed Correlation, r, measures how narrow the ellipse is. Correlation, r, measures how narrow the ellipse is. a, b, c, d are the proportions in each quadrant a, b, c, d are the proportions in each quadrant
18
Tetrachoric Correlation For α = ad/bc, Approximation 1: Approximation 2 (Digby):
19
Tetrachoric Correlation Example: Example: Tetrachoric correlation = 0.61 Pearson correlation = 0.41 o Assumes threshold is the same across people o Strong assumption that underlying quantity of interest is truly continuous
20
Odds Ratio Measure of association between two binary variables Measure of association between two binary variables Risk associated with x given y. Risk associated with x given y. Example: Example: odds of difficulty walking up 10 steps to the odds of difficulty lifting 10 lb: odds of difficulty walking up 10 steps to the odds of difficulty lifting 10 lb:
21
Pros and Cons Tetrachoric correlation Tetrachoric correlation same interpretation as Spearman and Pearson correlations same interpretation as Spearman and Pearson correlations “difficult” to calculate exactly “difficult” to calculate exactly Makes assumptions Makes assumptions Odds Ratio Odds Ratio easy to understand, but no “perfect” association that is manageable (i.e. { ∞, - ∞ } ) easy to understand, but no “perfect” association that is manageable (i.e. { ∞, - ∞ } ) easy to calculate easy to calculate not comparable to correlations not comparable to correlations May give you different results/inference! May give you different results/inference!
22
Dichotomized Data: A Bad Habit of Psychologists Sometimes perfectly good quantitative data is made binary because it seems easier to talk about "High" vs. "Low" Sometimes perfectly good quantitative data is made binary because it seems easier to talk about "High" vs. "Low" The worst habit is median split The worst habit is median split Usually the High and Low groups are mixtures of the continua Usually the High and Low groups are mixtures of the continua Rarely is the median interpreted rationally Rarely is the median interpreted rationally See references See references Cohen, J. (1983) The cost of dichotomization. Applied Psychological Measurement, 7, 249-253. Cohen, J. (1983) The cost of dichotomization. Applied Psychological Measurement, 7, 249-253. McCallum, R.C., Zhang, S., Preacher, K.J., Rucker, D.D. (2002) On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19-40. McCallum, R.C., Zhang, S., Preacher, K.J., Rucker, D.D. (2002) On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19-40.
23
Simple Regression The simple linear regression MODEL is: The simple linear regression MODEL is: y = 0 + 1 x + describes how y is related to x describes how y is related to x 0 and 1 are called parameters of the model. 0 and 1 are called parameters of the model. is a random variable called the error term. is a random variable called the error term. xy e
24
Simple Regression Graph of the regression equation is a straight line. Graph of the regression equation is a straight line. β is the population y-intercept of the regression line. β 0 is the population y-intercept of the regression line. β 1 is the population slope of the regression line. β 1 is the population slope of the regression line. E(y) is the expected value of y for a given x value E(y) is the expected value of y for a given x value
25
Simple Regression E(y)E(y)E(y)E(y) x Slope 1 is positive Regression line Intercept 0
26
Simple Regression E(y)E(y)E(y)E(y) x Slope 1 is 0 Regression line Intercept 0
27
Estimated Simple Regression The estimated simple linear regression equation is: The estimated simple linear regression equation is: The graph is called the estimated regression line. The graph is called the estimated regression line. b0 is the y intercept of the line. b0 is the y intercept of the line. b1 is the slope of the line. b1 is the slope of the line. is the estimated/predicted value of y for a given x value. is the estimated/predicted value of y for a given x value.
28
Estimation process Regression Model y = 0 + 1 x + Regression Equation E ( y ) = 0 + 1 x Unknown Parameters 0, 1 Sample Data: x y x 1 y 1...... x n y n Estimated Regression Equation Sample Statistics b 0, b 1 b 0 and b 1 provide estimates of 0 and 1
29
Least Squares Estimation Least Squares Criterion Least Squares Criterionwhere: y i = observed value of the dependent variable for the i th observation y i = predicted/estimated value of the dependent variable for the i th observation ^
30
Least Squares Estimation Estimated Slope Estimated Slope Estimated y-Intercept Estimated y-Intercept
31
Model Assumptions 1.X is measured without error. 2.X and are independent 3.The error is a random variable with mean of zero. 4.The variance of , denoted by 2, is the same for all values of the independent variable (homogeneity of error variance). 5.The values of are independent. 6.The error is a normally distributed random variable.
32
Example: Consumer Warfare Number of Ads (X) Purchases (Y) 114 324 218 117 327
33
Example Slope for the Estimated Regression Equation Slope for the Estimated Regression Equation b 1 = 220 - (10)(100)/5 = 5 b 1 = 220 - (10)(100)/5 = 5 24 - (10) 2 /5 24 - (10) 2 /5 y-Intercept for the Estimated Regression Equation y-Intercept for the Estimated Regression Equation b 0 = 20 - 5(2) = 10 b 0 = 20 - 5(2) = 10 Estimated Regression Equation Estimated Regression Equation y = 10 + 5x ^
34
Example Scatter plot with regression line Scatter plot with regression line ^
35
Evaluating Fit Coefficient of Determination Coefficient of Determinationwhere: SST = total sum of squares SST = total sum of squares SSR = sum of squares due to regression SSR = sum of squares due to regression SSE = sum of squares due to error SSE = sum of squares due to error SST = SSR + SSE ^^ r 2 = SSR/SST
36
Evaluating Fit Coefficient of Determination Coefficient of Determination r 2 = SSR/SST = 100/114 =.8772 The regression relationship is very strong because 88% of the variation in number of purchases can be explained by the linear relationship with the between the number of TV ads
37
Mean Square Error An Estimate of 2 An Estimate of 2 The mean square error (MSE) provides the estimate of 2, S 2 = MSE = SSE/(n-2) S 2 = MSE = SSE/(n-2)where:
38
Standard Error of Estimate An Estimate of S An Estimate of S To estimate we take the square root of 2. To estimate we take the square root of 2. The resulting S is called the standard error of the estimate. The resulting S is called the standard error of the estimate. Also called “Root Mean Squared Error” Also called “Root Mean Squared Error”
39
Linear Composites Linear composites are fundamental to behavioral measurement Linear composites are fundamental to behavioral measurement Prediction & Multiple Regression Prediction & Multiple Regression Principle Component Analysis Principle Component Analysis Factor Analysis Factor Analysis Confirmatory Factor Analysis Confirmatory Factor Analysis Scale Development Scale Development Ex: Unit-weighting of items in a test Ex: Unit-weighting of items in a test Test = 1*X1 + 1*X2 + 1*X3 + … + 1*Xn Test = 1*X1 + 1*X2 + 1*X3 + … + 1*Xn
40
Linear Composites Sum Scale Sum Scale Scale A = X 1 + X 2 + X 3 + … + X n Scale A = X 1 + X 2 + X 3 + … + X n Unit-weighted linear composite Unit-weighted linear composite Scale A = 1*X 1 + 1*X 2 + 1*X 3 + … + 1*X n Scale A = 1*X 1 + 1*X 2 + 1*X 3 + … + 1*X n Weighted linear composite Weighted linear composite Scale A = b 1 X 1 + b 2 X 2 + b 3 X 3 + … + b n X n Scale A = b 1 X 1 + b 2 X 2 + b 3 X 3 + … + b n X n
41
Variance of a weighted Composite XY YVar(X)Cov(XY) YCov(XY)Var(Y)
42
Effective vs. Nominal Weights Nominal weights Nominal weights The desired weight assigned to each component The desired weight assigned to each component Effective weights Effective weights the actual contribution of each component to the composite the actual contribution of each component to the composite function of the desired weights, standard deviations, and covariances of the components function of the desired weights, standard deviations, and covariances of the components
43
Principles of Composite Formation Standardize before combining!!!!! Standardize before combining!!!!! Weighting doesn’t matter much when the correlations among the components are moderate to large Weighting doesn’t matter much when the correlations among the components are moderate to large As the number of components increases, the importance of weighting decreases As the number of components increases, the importance of weighting decreases Differential weights are difficult to replicate/cross-validate Differential weights are difficult to replicate/cross-validate
44
Decision Accuracy Truth Yes No Decision Fail Pass True Positive False Positive False Negative True Negative
45
Signal Detection Theory
46
Polygraph Example Sensitivity, etc… Sensitivity, etc…
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.