Chapter 8 A Statistics Primer

Chapter 8 A Statistics Primer
Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e

© 2007 Pearson Education Canada
Level of Measurement Measures can be designed to have a higher, more complex level or a more basic, rudimentary level Influenced by how the variable is conceptualized Gender: can have only two categories (males and females) Age: can be age (year of birth) or age groups (age categories, e.g., adolescent, young adult, etc.) Influences choice of statistical analysis Shown on Table 8.11 on page 222 Related to measurement error (Chapter 13) © 2007 Pearson Education Canada

Three Levels of Measurement
Nominal: involves no underlying continuum; assignment of numeric values arbitrary Examples: religious affiliation, gender, etc Ordinal: implies an underlying continuum; values are ordered but intervals are not equal. Examples: community size, Likert items, etc. Ratio: involves an underlying continuum; numeric values assigned reflect equal intervals; zero point aligned with true zero. Examples: weight, age in years, % minority © 2007 Pearson Education Canada

Examples of Nominal Level Measures
Do you have a valid driver’s licence? [ ] Yes [ ] No Your sex (Circle number of your answer) 1 Male 2 Female © 2007 Pearson Education Canada

Example of Ordinal Level Measure
The population of the place I considered my hometown when growing up was: Rural area 1 town under 5,  5,000 to 19,  20,000 to 99,  100,000 to 999,  1,000,000 or over 6 © 2007 Pearson Education Canada

Examples of Ratio Level Measures
In the following items, circle a number to indicate the extent to which you agree or disagree with each statement. I would quit my present job if I won $1,000,000 through a lottery. Strongly Disagree Strongly Agree I would be satisfied if my child followed the same type of career as I have. Strongly Somewhat Neither Agree Somewhat Strongly Disagree Disagree nor Disagree Agree Agree © 2007 Pearson Education Canada

Describing an Individual Variable
Statistics provide ways to describe and compare sets of observations (e.g., income levels, infant mortality, morbidity, crime, etc.) Two common ways of describing a distribution (a set of scores in a data set) Measures of central tendency Measures of dispersion © 2007 Pearson Education Canada

Measures of Central Tendency
A number that typifies the central scores of a set of values Mean Median Mode © 2007 Pearson Education Canada

Median The midpoint Used to describe central tendency of ordinal level data Calculated by ordering a set of values and then using the middlemost value (in cases of two middle values, calculate the mean of the two values). Often used when a data set has extreme cases © 2007 Pearson Education Canada

Table 8.7 Median for Extreme Values
CASE # $ INCOME 1. 5,400 2. 6,600 3. 7,700 4. 10,200 5. 13,400 6. 16,400 7. 16,700 8. 18,300 ← $18,300 median value 9. 19,000 10. 20,000 11. 20,500 12. 22,900 13. 24,600 14. 31,500 $54,213 mean value 15. 580,000 © 2007 Pearson Education Canada

Mode The most frequently occurring value Used to describe central tendency of nominal level data (gender, religion, nationality) TABLE 8.8 DISTRIBUTION OF RESPONDENTS BY COUNTRY COUNTRY NUMBER PERCENT Canada ← mode New Zealand Australia TOTAL © 2007 Pearson Education Canada

Measures of Dispersion
Indicates dispersion or variability of values Are scores close together or spread out? Three common measures of dispersion: Range Standard deviation Variance © 2007 Pearson Education Canada

Table 8.9 Two Grade Distributions
SUBJECT MARY BETH Sociology 78 66 Psychology 80 72 Political Science 82 88 Anthropology 90 Philosophy 94 Mean1 Range2 10 28 Standard Deviation3 3.74 12.25 Variance4 14.0 150.00 1Mean = sum of values divided by number of cases 2Range = highest value – lowest value 3See computation in Table 8.10 on p. 221. 4Variance = sd2 © 2007 Pearson Education Canada

Standard Deviation and Variance
The standard deviation measures the average amount of deviation from the mean value of the variable The variance is the standard deviation squared © 2007 Pearson Education Canada

Table 8.10 Computation of Standard Deviation, Beth’s Grades
SUBJECT GRADE Sociology 66 66 – 82 = –16 256 Psychology 72 72 – 82 = –10 100 Political science 88 88 – 82 = 6 36 Anthropology 90 90 – 82 = 8 64 Philosophy 94 94 – 82 = 12 144 MEAN 82.0 TOTAL Note: The “N – 1” term is used when sampling procedures have been used. When population values are used the denominator is “N.” SPSS uses N – 1” in calculating the standard deviation in the DESCRIPTIVES procedure. © 2007 Pearson Education Canada

Standardizing Data Standardizing data facilitates making comparisons between units of different size Also, can standardize data to create variables that have similar variability (Z scores) Standardization of data is commonly done Several methods of standardizing data: proportions, percentages, percentage change, rates, ratios © 2007 Pearson Education Canada

Proportions A proportion represents the part of 1 that some element represents. Proportion female = Number female Total persons Proportion female = 31,216 58,520 Proportion female = .53 The females represent .53 of the population © 2007 Pearson Education Canada

Percentage A percentage represents how often something happens per 100 times A proportion may be converted to a percentage by multiplying by 100 Females constitute 53% of the population © 2007 Pearson Education Canada

Percentage Change Percentage change is a measure of how much something has changed over a given time period. Percentage change is: Time 2 – Time 1 x 100 Time 1 Example: percentage change in number of women in selected occupations (Table 8.13, p. 223) © 2007 Pearson Education Canada

Rates Rates represent the frequency of an event for a standard-sized unit. Divorce rates, suicide rates, crime rates are examples. So if we had 104 suicides in a population of 757,465 the suicide rate per 100,000 would be calculated as follows: SR = x 100,000 = 757,465 There are suicides per 100,000 © 2007 Pearson Education Canada

Ratios A ratio represents a comparison of one thing to another. So if there are 200 burglaries per 100,000 in the U.S. and 57 per 100,000 in Canada, the U.S./Canadian burglary ratio is: US Burglary Rate = = 3.51 Canadian Burglary Rate © 2007 Pearson Education Canada

Normal Distribution Much data in the social and physical world are “normally distributed”; this means that there will be a few low values, many more clustered toward the middle, and a few high values. Normal distributions: symmetrical, bell-shaped curve mean, mode, and median will be similar 68.28% of cases ± 1 standard deviation of mean 95.46% of cases ± 2 standard deviations of mean © 2007 Pearson Education Canada

Figure 8.2 Normal Distribution Curve
© 2007 Pearson Education Canada

Areas Under the Normal Curve
Can determine what proportion of cases fall between two values or above/below a value Steps: Draw normal curve, marking mean and SD, and including lines to represent problem Calculate Z score(s) for the problem Look up value on Table 8.17, page 230 Solve problem. Recall that .5 of cases fall above the mean, and .5 below the mean Convert proportion to percentage, if needed © 2007 Pearson Education Canada

Other Distributions Not all variables are normally distributed Bimodal: two overlapping normally distributed plots weight (females will have lower average rates) Leptokurtic: little variability distribution appears tall and peaked Platykurtic: great deal of variability distribution appears flat and wide Having a normal distribution is important for doing tests of statistical significance (Table 8.18) © 2007 Pearson Education Canada

Figure 8.4 Other Distributions

Describing Relationships Among Variables
Involves three important steps: Decide which variable is to be treated as dependent variable and independent variable Decide on the appropriate procedure for examining the relationship Perform the analysis © 2007 Pearson Education Canada

Methods Selection of statistical method depends upon the level of measurement of the dependent and independent variables Contingency tables: Crosstabs Comparing means: means analysis Correlational analysis: correlation © 2007 Pearson Education Canada

Contingency Tables: Crosstabs
A contingency table cross-classifies cases on two or more variables to show the relation between an independent and dependent variable Uses a nominal dependent variable and an ordinal or nominal independent variable A standard table looks like the one on the following slide. © 2007 Pearson Education Canada

Table 8.19 Plans to Attend University by Size of Home Community
UNIVERSITY PLANS? RURAL TOWN UP TO 5,000 TOWN OVER 5,000 TOTAL N % Plans 69 52.3 44 48.9 102 73.9 215 59.7 No plans 63 47.7 46 51.1 36 26.1 145 40.3 132 100.0 90 138 360 If a test of significance is appropriate for the table, the value for the raw Chi-Square value (which will be introduced in Chapter 9), the degrees of freedom, whether the test is one- or two-tailed, and the probability level should be indicated. © 2007 Pearson Education Canada

Rules for Constructing a Contingency Table
In table titles, name the dependent variable first Place dependent variable on vertical plane Place independent variable on horizontal plane Use variable labels that are clear Run percentages toward the independent variable Report percentages to one decimal point Report statistical test results below table Interpret the table by comparing categories of the independent variable Minimize categories in control tables © 2007 Pearson Education Canada

Comparing Means: Means
Used when dependent variable is ratio Comparison to categories of independent variable (nominal or ordinal) Both t-test and ANOVA may be used (Chapter 9) Presentation may be as shown on the following slide. © 2007 Pearson Education Canada

Table 8.22 Mean Income by Gender
STANDARD DEVIATION NUMBER OF CASES Male $37,052 12,061 142 Female $34,706 10,474 37 COMBINED MEAN $36,567 11,642 179 If a test of significance is appropriate for the table, the values for the t-test or F-test value, the degrees of freedom, whether the test is one- or two-tailed, and the probability level should be indicated. © 2007 Pearson Education Canada

Correlational Analysis: Correlation
Correlational analysis is a procedure for measuring how closely two ratio level variables co-vary together Basis for more advanced procedures: partial correlations, multiple correlations, regression, factor analysis, path analysis and canonical analysis Advantage: can analyze many variables (multivariate analysis) simultaneously Relies on having ratio level measures © 2007 Pearson Education Canada

Two Basic Concerns What is the equation that describes the relation between two variables? What is the strength of the relation between the two? Two visual estimations procedures The linear equation: Y = a + bX Correlation coefficient: r © 2007 Pearson Education Canada

The Linear Equation The linear equation, Y = a + bX, describes the relation between the two variables Components: Y - dependent variable (e.g., starting salary) X - independent variable (e.g., years of post-secondary education) a - the constant, which indicates where the regression line intersects the Y-axis b - the slope of the regression line © 2007 Pearson Education Canada

A. The Linear Equation: A Visual Estimation Procedure
Step 1: Plot the relation on a graph Table 8.24: Sample data set X Y 2 3 3 4 5 4 7 6 8 8 © 2007 Pearson Education Canada

A. The Linear Equation (cont’d)
Step 2: Insert a straight regression line From the regression line one can estimate how much one has to change the independent variable in order to produce a unit change in the dependent variable © 2007 Pearson Education Canada

Step 3: Observe where the regression line crosses the Y axis; this represents the constant in the equation (a = 1.33 on Figure 8.6) Step 4: Draw a line parallel to the X axis and one parallel to the Y axis to form a right-angled triangle Measure the lines; divide the horizontal distance into the vertical distance to compute the b value (72/91 = 0.79) © 2007 Pearson Education Canada

Step 5: If the slope of the regression is such that it is lower on the right-hand side, the b coefficient is negative, meaning the more X, the less Y. If the slope is negative, use a minus sign in your equation Y= a – bX Step 6: Write the equation: Y = (X) The above formula is our visually estimated equation between the two variables Equation used to predict the value of a Y variable given a value of the X variable Done in regression analysis © 2007 Pearson Education Canada

B. Correlation Coefficient: A Visual Estimation Procedure
Goal: to develop a sense of what correlations of different magnitudes look like Correlation coefficient (r) is a measure of the strength of the association between two variables Vary from +1 to –1 Perfect correlations are rare Usually presented by a decimal point, as in .98, .56, –.32 Negative correlations ~ negative slope © 2007 Pearson Education Canada

Figure 8.9 Eight Linear Correlations

Figure 8.9 Eight Linear Correlations (cont’d)

B. Correlation Coefficient (cont’d)
Figure 8.9 graphs 8 relationships Graphing allows you to visually estimate the strength of the association The closer the plotted points are to the regression line (e.g., Plots 1 and 2), the higher the correlation (.99 and .85) Greater spread (e.g., Plots 3 and 4) ~ lower correlation (.53 and .36) Would be difficult to draw regression line if r < .36 © 2007 Pearson Education Canada

B. Correlation Coefficient (cont’d)
Plots 5 and 6: curvilinear: not linear, hence r = 0 Procedure not appropriate for curvilinear relations Plots 7 and 8: problem plots: deviant cases This is one of the reasons it is important to plot relationships; extreme values indicate a non-linear relationship, therefore linear regression procedure are not appropriate for studying these relationships © 2007 Pearson Education Canada

Calculating the Correlation Coefficient
The estimation of the correlation coefficient takes two kinds of variability into account: Variations around the regression line Variations around the mean of Y r2 = 1 – variations around regression variations around mean of Y Can calculate (see p. 245); computer programs used by most researchers today © 2007 Pearson Education Canada

Other Correlation Procedures
Spearman Correlation Appropriate measure of association for ordinal level variables Partial Correlation Measures the strength of association between two variables while simultaneously controlling for the effects of one or more additional variables Also varies from +1 to –1 Commonly used by social researchers © 2007 Pearson Education Canada

Chapter 8 A Statistics Primer

Similar presentations

Presentation on theme: "Chapter 8 A Statistics Primer"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 8 A Statistics Primer

Similar presentations

Presentation on theme: "Chapter 8 A Statistics Primer"— Presentation transcript:

Similar presentations

About project

Feedback