Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.

Similar presentations


Presentation on theme: "Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow."— Presentation transcript:

1 Advanced Statistical Methods: Continuous Variables http://statisticalmethods.wordpress.com http://statisticalmethods.wordpress.com REVIEW Dr. Irina Tomescu-Dubrow tomescu.1@sociology.osu.edu

2 Descriptive & Inferential Statistics -Descriptive statistics describe samples of subjects in terms of variables or combination of variables; -Inferential statistics test hypotheses (about differences in population(s)/ relationships) on the basis of measurements made on samples of subjects

3 Basic Characteristics of A Distribution (continuous variable) Mean = (Σ X i ) / N Standard deviation & Variance ____ ____ S = √ Var = √ s 2 VAR = s 2 = ∑ (X i – Mean of X) 2 / N

4 Shape Three properties of the distribution’s shape (metric variables): MODALITY KURTOSIS SKEWNESS

5 Skewness -Mean > Median > Mode  positively skewed distribution. -Mean < Median < Mode  negatively skewed distribution. -Mean = Median = Mode  normal distribution.

6 Normal Distribution –Unimodal –Mesokurtic (a moderate peakedness) –Symetric (zero skewness) Carl Friedrich Gauss, 1794

7 Normal distribution Every normal curve (regardless of its mean or standard deviation) conforms to the following rule: ~ 68% of the area under the curve falls within 1 standard deviation of the mean. ~ 95% of the area under the curve falls within 2 standard deviations of the mean. ~ 99.7% of the area under the curve falls within 3 standard deviations of the mean.

8 Normal (probability) distribution

9 Z-scores For finding points in the normal distribution the mean value and the standard deviation are crucial. Z score = (Value – Mean) / Standard deviation Definition: a z-score measures the difference between a raw value (a variable value X i ) and the mean, using the standard deviation of the distribution as the unit of measure. ___ Z score (X i ) = X i – Mean for X / √ s 2 The numerical value of the z-score specifies the distance from the mean expressed in terms of the proportion of the standard deviation.

10 Raw values & Z-scores 1. Shape: the shape of the z-score distribution (i.e. standard distribution) is the same as the shape of the raw-score distribution. 2. The mean: when normally distributed raw scores are transformed into z-scores, the resulting z-score distribution will always have a mean = zero. This fact makes the mean a convenient reference point. 3. The standard deviation: when normally distributed raw scores are transformed into z-scores, the resulting z-score distribution will always have a standard deviation = one.

11 Raw scores for education in 2003, POLPAN data Z-scores for education (i.e. standradized variable), POLPAN data

12 Bivariate Correlation and Regression Assumptions: -Random sampling -Both variables = continuous (or can be treated as such) -Linear relationship between the 2 variables; -Normally distributed characteristics of X and Y in the population

13 Linear Relationship The line = a mathematical function that can be expressed through the formula Y = a + b*X, where Y & X are our variables. Y, the dependent variable, is expressed as a linear function of the independent (explanatory) variable X.

14 In reality data are scattered

15 For z scores

16 Correlation - measures the size & the direction of the linear relation btw. 2 variables (i.e. measure of association) - unitless statistic (it is standardized); we can directly compare the strength of correlations for various pairs of variables The stronger the relationship between X & Y, the closer the data points will be to the line; the weaker the relationship, the farther the data points will drift away from the line. Pearson’s r = the sum of the products of the deviations from each mean, divided by the square root of the product of the sum of squares for each variable. If X and Y are expressed in standrad scores (i.e. z-scores), we have Z(y) = β*Z(x) and r = Σ(Zy*Zx)/N = beta

17 Interpretation of Pearson’s r: 1. Strength of relationship: correlations are on a continuum; any value of r btw. –1 & 1 we need to describe: weak  |0.10| moderate  |0.30| strong  |0.60| 0 = no relationship 2. Direction of relationship: a) Positive correlation: whenever value of r = positive. Interpretation: the two variables move in the same direction. - as X variable goes up, Y increases as well; - as X variable decreases, Y decreases as well. b) Negative correlation: whenever value of r = negative. Interpretation: the two variables move in opposite direction: - as X variable goes up, Y decreases; -as X variable decreases, Y increases. Test of Significance: H 0 : r = 0

18 Table 1. Correlations of Assessment of Socialism in 1988, 1993, 1998, 2003 and 2008 a Correlation coefficients are calculated on panel sample, N = 938; assessment of socialism = 5 categories. **p < 0.01. Year Assessment of Socialism a 19881993199820032008 19881.00 0.150 ** 0.139 ** 0.0620.064 1993 0.150 ** 1.00 0.301 ** 0.310 ** 0.313 ** 1998 0.139 ** 0.301 ** 1.00 0.316 ** 0.318 ** 20030.0620.310 ** 0.316 ** 1.000.381 ** 20080.0640.313 ** 0.318 ** 0.381 ** 1.00

19 Bivariate Regression = used to predict the dependent variable (DV) from the independent variable (IV); -finds the best fitting straight line between X & Y: the line that goes through the means of X and of Y, and minimizes the sum of squared errors (distances) btw. the data points and the line. (Prediction model) predicted value for the dependent variable, Y a = the intercept (the value of Y when X = 0) b = the regression coefficient (the slope), indicating the amount of change in Y given a unit change in X X =the independent variable We call a & b unstandardized coefficients: both are interpreted in the original units (the measurements) of the variables.

20 b= the sum of products of dev. from mean/sum of squares for X

21 Method of Least Squares The prediction errors, called residuals, are defined as the differences between observed and predicted values of Y Y = a + bX + e;  e = Y- (a +b*X)  The regression line minimizes the sum of error terms:

22 Interpretation of b b > 0, positive relationship (X has a positive effect on Y) b < 0, negative relationship (X has a negative effect on Y) b = 0, no relationship (X has no effect on Y) Hypothesis Testing 1) ANOVA and regression 2) T-test for the effect of the independent variable (H0: b = 0)


Download ppt "Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow."

Similar presentations


Ads by Google