Different Types of Data Qualitative Quantitative Nominal scale – used to count number of cases (e.g. girls = 1, boys = 2) Ordinal scale – ranked – 1st, 2nd, 3rd etc. Interval scale – equal intervals between consecutive values on number scale, but no zero point (e.g. Fahrenheit vs. Celsius, IQ scale) Ratio scale – ratio between numbers, has zero point (e.g. person running 10 miles is running twice as far as someone running 5 miles)
Agenda ** Please turn in your notes. Warm Up: Please do the sheet on your desk 1. Go over completed parts of packet 2. Correlation Coefficients 3. Finish Research Methods Packet 4. Start Statistics 5. Standard Deviation Practice (if time) HW: Read and take notes on pages 46-55 Study for Vocabulary Quiz next class.
Interpreting Data… Measures of Central Tendency… Mean Median Mode average Median Middle number If even number of numbers, take the average of the two middle numbers Mode Occurs the most often Bimodal: two modes Multimodal: three or more modes Statistics allows us to meaningfully and accurately summarize and describe samples of data. Mean, median and mode – an index of average, or typical, value of a distribution of scores.
Graphical representations… Data shown in curves Can be normal or skewed Generally result is normal curve (bell shaped curve, normal distribution) Mean, median, mode fall at highest point on curve. Skewed distributions Asymmetrical, most scores grouped at one end. Negatively skewed Positively skewed Negatively skewed distributions: scores skewed to left. Positively skewed: skewed to right. In the case of a positively skewed distribution, the mode is smaller than the median, which is smaller than the mean. A positively skewed distribution is asymmetrical and points in the positive direction. If a test was very difficult and almost everyone in the class did very poorly on it, the resulting distribution would most likely be positively skewed.
Normal Distribution
Negatively Skewed Distribution
Positively Skewed Distribution
Measures of variation Variance tells us more… Measures include How much scores differ from one another and from the mean Measures include Range Variance Standard deviation
Range Spread of scores in a distribution. Largest score minus smallest score For example: Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45 Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45 Both have a range of 13, but there is a difference in amount of variability. So… The range is the largest score minus the smallest score. It is a quick and dirty measure of variability, although when a test is given back to students they very often wish to know the range of scores. Because the range is greatly affected by extreme scores, it may give a distorted picture of the scores. The following two distributions have the same range, 13, yet appear to differ greatly in the amount of variability.
Variance In other words… How spread out a distribution is. It is computed as the average squared deviation of each number from its mean. For example, for the numbers 1, 2, and 3, the mean is 2 and the variance is: σ2 = In other words…
Step 6 is standard deviation Variance… Step One -Find the mean of the scores. Step Two -Subtract the mean from every score. Step three -Square the results of step two. Step Four -Sum the results of step three. Step Five -Divide the results of step four by N-1. Step Six -Take the square root of step five. Step 5 is variance Step 6 is standard deviation
Standard Deviation Square root of the variance Most commonly used measure of spread
Standard Deviation The standard deviation measures the spread of the data about the mean value. It is useful in comparing sets of data which may have the same mean but a different range. For example, the mean of the following two is the same: 15, 15, 15, 14, 16 and 2, 7, 14, 22, 30. However, the second is clearly more spread out. If a set has a low standard deviation, the values are not spread out too much. Example: Find the standard deviation of 4, 9, 11, 12, 17, 5, 8, 12, 14 First work out the mean: 10.222 Now, subtract the mean individually from each of the numbers in the question and square the result. This is equivalent to the (x - xbar)² step. x refers to the values in the question. x 4 9 11 12 17 5 8 12 14 (x - x)² 38.7 1.49 0.60 3.16 45.9 27.3 4.94 3.16 14.3 Now add up these results (this is the 'sigma' in the formula): 139.55 Divide by n-1. n is the number of values, so in this case is 8: 17.44 And finally, square root this: 4.18 The standard deviation can usually be calculated much more easily with a calculator and this is usually acceptable in exams. With some calculators, you go into the standard deviation mode (often mode '.'). Then type in the first value, press 'data', type in the second value, press 'data'. Do this until you have typed in all the values, then press the standard deviation button (it will probably have a lower case sigma on it). Check your calculator's manual to see how to calculate it on yours. Lower case sigma means 'standard deviation'. Capital sigma means 'the sum of'. x bar means 'the mean'
68, 95, 99.7 rule 99.7% of scores fall within 3 standard deviations of the mean (above and below) 95% of scores fall within 2 standard deviations of the mean 68% fall within 1 standard deviation
How To Organize Data Frequency distribution Histogram Frequency polygon
Histogram
Frequency Polygon
Scatterplots Illustrate the strength and direction of correlations graphically Paired X and Y scores for each subject are plotted as single points on a graph Slope of a line that best fits the pattern of points suggests the degree and direction of the relationship between the two variables
Different Scatterplots Fig. 1: r =1 Fig. 2: r = -1 Fig. 3: r = 0 Fig. 4: r = ~0.65 See the handout about computing correlation coefficients.
Inferential Statistics Evaluates possibility that a correlation is a real relationship, not just chance Statistical significance (p) is measure of the likelihood of the difference between groups Real difference is more likely with: large differences between the means of frequency distribution small standard deviations large sample
The p value and significance Lower the p value, the less likely results were due to chance For a difference to be significant, p usually needs to be less than 0.05