Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
Medicine needs data to advance
Statistics... A set of mathematical procedures for describing, synthesizing, analyzing, and interpreting quantitative data …the selection of an appropriate statistical technique is determined by the research design, hypothesis, and the data collected
Preparing data for analysis... Data must be accurately scored and systematically organized to facilitate data analysis: tabulating tabulating: organizing the data in a systematic manner coding coding: assigning numerals (e.g., ID) to data scoring scoring: assigning a total to each participant’s instrument
descriptive statistics... descriptive statistics... …permit the researcher to describe many pieces of data with a few indices
statistics... statistics... …indices calculated by the researcher for a sample drawn from a population
parameters... parameters... …indices calculated by the researcher for an entire population
Types of descriptive statistics… 1. graphs 2. measures of central tendency 3. measures of variability
graphs... graphs... …representations of data enabling the researcher to see what the distribution of scores look like
1. Graphs… frequency polygon pie chart boxplot stem-and-leaf chart
measures of central tendency... measures of central tendency... …indices enabling the researcher to determine the typical or average score of a group of scores
2. Measures of central tendency… mode median mean
mode... mode... …the score attained by more participants than any other score
median... median... …the point in a distribution above and below which are 50% of the scores
mean... mean... …the arithmetic average of the scores
measures of variability... measures of variability... …indices enabling the researcher to indicate how spread out a group of scores are
3. Measures of variability… range quartile deviation variance standard deviation
range... range... …the difference between the highest and lowest score in a distribution
quartile deviation... quartile deviation... …one half of the difference between the upper quartile (the 75%’ile) and the lower quartile (the 25%’ile) in a distribution
variance... variance... …a summary statistic indicating the degree of variability among participants for a given variable
standard deviation... standard deviation... …the square root of variance providing an index of variability in the distribution of scores
Normal distributions of data (the normal curve)... A bell-shaped distribution of scores having four identifiable properties… …50% of the scores fall above the mean and 50% of the scores fall below the mean …the mean, median, and mode are the same value
…most scores are near the mean and, the farther from the mean a score is, the fewer the number of participants who attained that score …the same number, or percentage, of scores is between the mean and plus one standard deviation as is between the mean and minus one standard deviation
Non-normal distributions of data (skewed distributions)... A non-bell-shaped distribution of scores where… …mean < median < mode negatively skewed distribution (a “negatively skewed distribution”) …mean > median > mode positively skewed distribution (a “positively skewed distribution”)
measures of relative position... measures of relative position... …indices enabling the researcher to describe a participant’s performance compared to the performance of all other participants
4. Measures of relative position… percentile ranks standard scores
percentile rank... percentile rank... …indicates the percentage of scores that fall at or below a given score
standard score... standard score... …a measure of relative position
Types of standard scores... Types of standard scores... …z score …T score …stanines
z score... z score... …a statistic expressing how far a score is from the mean in terms of standard deviation units
T score... T score... …a transformed z score that voids negative numbers and decimals by multiplying the z score by 10 and adding 50
stanines... stanines... …a standard score that divides a distribution into nine parts
measures of relationship... measures of relationship... …indices enabling the researcher to indicate the degree to which two sets of scores are related
5. Measures of relationship… Spearman Rho Pearson r
correlations correlations variables …determines whether and to what degree a relationship exists between two or more quantifiable variables …the degree of the relationship is expressed as a coefficient of correlation
confounding factors …the presence of a correlation does not indicate a cause-effect relationship primarily because of the possibility of multiple confounding factors
Correlation coefficient… strong negative strong positive 0.00 no relationship
Spearman Rho... Spearman Rho... …a measure of correlation used for rank and ordinal data
Pearson r... Pearson r... …a measure of correlation used for data of interval or ratio scales …assumes that the relationship between the variables being correlated is linear
Mini-Quiz… True and false… …the analysis of the data is as important as any other component of the research process True
True and false… …descriptive statistics are normally computed separately for each group in a research study True
True and false… …every instrument administered must always be scored accurately and consistently, using the same procedures and criteria True
True and false… …tentative scoring procedures must always be tried out beforehand by administering the instrument to the study participants False
True and false… …a computer should not be used to perform an analysis that a researcher has never completed by hand or, at least, studied extensively True
True and false… …the first step in data analysis is to describe, or summarize, the data using descriptive statistics True
True and false… …the number resulting from the computation of a measure of central tendency represents the typical score attained by a group of participants True
True and false… …the mean is the most precise, stable index of typical performance that is especially useful in situations in which there are extreme scores False
True and false… …unless a correlation coefficient is used to compute the reliability of an instrument in a causal- comparative or experimental study, a correlation coefficient is only computed in a correlation study True
True and false… …plus and/or minus two standard deviations includes more the 99% of the scores False
True and false… …standard scores are rarely used in research studies True
True and false… …to test a hypothesis adequately, more than descriptive statistics are normally needed True
True and false… …if the extreme scores are at the upper, or higher, end of the distribution, it is said to be positively skewed True
True and false… …the median of a set of scores corresponds to the 50% percentile True
True and false… …a standard score is a measure of relative position that is appropriate when the data represent a nominal scale False
True and false… …a z score expresses how far a score is from the mean in terms of standard deviation units True
True and false… …the Spearman Rho is the appropriate measure of correlation when the variables are expressed as ranks instead of scores True
True and false… …the assumption associated with the application of Pearson r is that the relationship between the variables being correlated is linear True
Fill in the blank… …statistics which permit the researcher to describe many scores with a small number of indices descriptive statistics
Fill in the blank… …the values calculated for a sample drawn form a population statistics
Fill in the blank… …the values calculated for an entire population parameters
Fill in the blank… …a convenient way to describe a set of data with a single number measures of central tendency
Fill in the blank… …the index of central tendency appropriate for nominal data mode
Fill in the blank… …the index of central tendency appropriate for ordinal data median
Fill in the blank… …the index of central tendency appropriate for interval or ratio data mean
Fill in the blank… …the score attained by more participants than any other score mode
Fill in the blank… …the point in a distribution above and below which are 50% of the scores median
Fill in the blank… …the arithmetic average of the scores mean
Fill in the blank… …the difference between the highest and lowest score in a distribution range
Fill in the blank… …the measure of variability identifying one half of the difference between the 75 th percentile and the 25 th percentile quartile deviation
Fill in the blank… …the measure of variability used for interval and ratio data standard deviation
Fill in the blank… …the only appropriate measure of variability for nominal data range
Fill in the blank… …+/ standard deviations constitutes ____ % of the sample 68%
Fill in the blank… …extreme scores at the lower end of the distribution indicates a ______ skewed distribution positively
Fill in the blank… …indices describing where a score is in relation to all other scores measures of relative position
Fill in the blank… …indicates the percentage of scores that fall at or below a given score percentile ranks
Fill in the blank… …if a set of scores is transformed into a set of z scores, the new distribution has a mean of ____ and a standard deviation of ____ zero; one
Fill in the blank… …a set of standard scores that divide a distribution into nine parts stanines
Fill in the blank… …the most appropriate measure of correlation when the sets of data to be correlated represent either interval or ratio scales Pearson r
This module has focused on... descriptive statistics...the statistical procedures for describing, synthesizing, analyzing, and interpreting quantitative data
The next module will focus on......the statistical procedures for generalizing to a population of individuals based on information obtained from a limited number of research participants inferential statistics