Chapter Eleven A Primer for Descriptive Statistics
Descriptive Statistics A variety of tools, conventions, and procedures for describing variables and relationships between variables
Measurement is the process of assigning numbers to phenomena according to a set of rules Levels of Measurement Nominal: involves no underlying continuum; assignment of numeric values arbitrary Examples: religious affiliation, gender, etc.
Levels of Measurement Ordinal: implies an underlying continuum; values are ordered but intervals are not equal. Examples: Community size, Likert items, etc.
Levels of Measurement Cont. Ratio: involves an underlying continuum; numeric values assigned reflect equal intervals; zero point aligned with true zero. Examples: weight, age in years, % minority
Data Distributions A listing of all the values for any one variable The most basic technique for presenting a large data set is to create a frequency distribution table A systematic listing of all the values on a variable from the lowest to the highest with the # of times (frequency) each value was observed
Normal Distribution A normal distribution roughly follows a bell-shaped curve Bimodal distribution (2 peaks eg. male & female body weight) Platykurtic distribution (flat & wide, great deal of variability) Leptokurtic distribution (peaked, little variability)
Measures of Central Tendency A single numeric value that summarizes the data set in terms of its “average” value. Eg. the nurse researcher uses the value of 98.6 F or 37 C to describe the average adult body temperature
Measures of Central Tendency Mean: calculated by summing values and dividing by number of cases Median: caluculated by ordering a set of values and then using the middle most value (in cases of two middle values, calculated the mean of the two values. Mode: the most frequently occuring value.
Measures of Dispersion Range: calculated by substracting lowest value from the highest value in a set of values. Standard Deviation: a measure reflecting the average amount of deviation in a set of values.
Dispersion Cont. Variance: this measure is simply the standard deviation squared.
Standardizing Data To standardize data is to report data in a way that comparisons between units of different size may be made
Standardizing Data Proportions: represents the part of 1 that some element represents. A so-called batting average is actually a proportion because it represents: BA =Number of Hits Number at Bats
Percentage: a proportion may be converted to a percentage by multiplying by 100. If a players batting “average” is.359 we could convert that to a percentage by multiplying by 100. In this case, the percentage of time the person gets a hit is 35.9%. In short, a percentage represents how often something happens per 100 times.
Percentage Change: a measure of how much something has changed over a given time period. Percentage change is: Time 2 - Time 1 x 100 Time 1 Thus, if there were 25 nurses now compared to 17 five years earlier, the percentage change over the 5 year period would be: (( ) 17) x 100 = 47.1%
Rates: represent the frequency of something for a standard sized unit. Divorce rates, suicide rates, crime rates are examples. So if we had 104 suicides in a population of 757,465 the suicide rate per 100,000 would be calculated as follows: SR = 104 x 100 = ,465 I.e., there are suicides per 100,000
Ratio: represents a comparison of one thing to another. So if there are 200 suicides in the U.S. and 57 per 100,000 in Canada, the U.S./Canadian suicide ratio is: US Suicide Rate = 200 = 3.51 Candian Suicide Rate 57
Normal Distribution Much data in the social and physical world is “normally distributed”. If it is this means that there will be a few low values, many more clustered toward the middle, and a few high values. Normal distributions are: symmetrical, bell-shaped curve mean, mode, and median will be similar 2/3 of cases ± 1 standard deviation of mean 95.6 cases ± 2 standard deviations of mean
Normal Distribution Cont.
Z Scores A Z score represents the distance, in standard deviation units, of any value in a distribution. The Z Score formula is as follows: __ Z = X - X sd
Exercise: Suppose: Income Mean = $72,000; SD = $18,000 Education Mean = 11 years; SD = 4 years SubjectIncome Education Case 180,00014 Case 270,00010 Case 391,00019 Case 456,000 8 Calculation Case 1: Case 1 Z (income) = 80, ,000 =.44 18,000 Case 1 Z (education) = =.75 4 SES score Case 1 = = 1.19
Areas Under the Normal Curve draw normal curve, include lines to represent problem calculate Z score(s) for problem look up value in Table Solve problem, recall that.5 of cases fall above the mean,.5 below convert proportion to percentage, if needed
Exercise: Suppose you wished to know percentage of cases will fall above $100,000 in a sample whose MEAN is $65,000 and the SD is $22,000 Show p. 370 of text Z = , ,000 / 22,000 look up in Table 11.14, p 368 = =.0559 (proportion) x 100 = 5.6% (percentage)
Describing Relationships Between Variables 1. Crosstabular Analysis: used with a nominal dependent variable we cross-classify the information to show the relation between an independent and a dependent variable a standard table looks like the following:
Rules for Crosstabular Tables: in table title, name dependent variable first place dependent variable on vertical axis place independent on horizontal plane use clear variable labels run % figures toward independent variable report % to one decimal point statistical data reported below table interpret by comparing % in categories of the independent variable
2. Comparing Means used when dependent variable is ratio comparison to categories of independent variable both t-test and ANOVA may be used Presentation may be as follows:
t Test T-test is used to determine: if the differences in the means of two groups are statistically significant with samples under 30 when comparing 2 groups on a ratio level dependent variable
Analysis of Variance (ANOVA) ANOVA is used when 3 or more groups means are compared, or When the means for 2 or more groups are compared at 2 or more points in time in a single analysis (e.g., a pre-post experimental design) Computes a ratio that compares 2 kinds of variability-with-in group & between-groups variability
3. Correlation used with ratio level variables interest in both the equation and the strength of the correlation Y = a + bX is the general equation the r is the symbol used to report the strength of the correlation: can vary from -1.0 to + 1.0
Sample Data Set (X) (Y)
Y Y X
Y Y X Regression Line
Y Y X a value read here b value (slope) read here h/b h b
Y Y X Predicted Value