Download presentation
Presentation is loading. Please wait.
Published byAmice Banks Modified over 9 years ago
1
Initial Data Analysis Central Tendency
2
Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase letter such as X or Y. When we want to talk about specific data points within that set, we specify those points by adding a subscript to the uppercase letter like X 1
3
Example 5,8, 12,3,6,8,7 X 1, X 2, X 3, X 4, X 5, X 6, X 7
4
Summation The Greek letter sigma, which looks like , means “add up” or “sum” whatever follows it. Thus, X i, means “add up all the X i s. If we use the X i s from the previous example, X i = 49 (or just X).
5
Example
6
Example (cont.) X = 82 + 66 + 70 + 81 + 61 = 360 Y = 84 + 51 + 72 + 56 + 73 = 336 (X-Y) = (82-84) + (66-51) + (70-72) + (81-56) + (61-73) = -2 + 15 + (-2) + 25 + (-12) = 24 X 2 = 82 2 + 66 2 + 70 2 + 81 2 + 61 2 = 6724 + 4356 + 4900 + 6561 + 3721 = 26262 One can also see it as (X 2 ) ( X) 2 = 360 2 = 129600
7
Your turn (XY) = ( (X-Y))² = = XY356723XY356723
8
Your turn (XY) = 15 + 42 + 6 = 63 ( (X-Y))² = [(-2)+(-1)+(-1)] 2 = 16 = 2.08 XY356723XY356723
9
Measures of Central Tendency While distributions provide an overall picture of some data set, it is sometimes desirable to represent the entire data set using descriptive statistics. The first descriptive statistics we will discuss are those used to indicate where the center of the distribution lies.
11
The Mode There are different measures of central tendency, each with their own advantages and disadvantages The first of these is called the mode. The mode is simply the value of the relevant variable that occurs most often (i.e., has the highest frequency) in the sample.
12
The Mode (cont.) Note that if you have done a frequency histogram, you can often identify the mode simply by finding the value with the highest bar. However, that will not work when grouping was performed prior to plotting the histogram (although you can still use the histogram to identify the modal group, just not the modal value).
13
Finding the mode Create a non-grouped frequency table as described previously, then identify the value with the greatest frequency. Example: Class height. n=48
14
Mode Advantages Very quick and easy to determine Is an actual value of the data Not affected by extreme scores Disadvantages Sometimes not very informative (e.g. cigarettes smoked in a day) Can change dramatically from sample to sample Might be more than one (which is more representative?)
15
The Median A second measure of central tendency is called the median. The median is the point corresponding to the score that lies in the middle of the distribution (i.e., there are as many data points above the median as there are below the median).
16
The Median (cont.) To find the median, the data points must first be sorted into either ascending or descending numerical order. The position of the median value can then be calculated using the following formula:
17
Examples If there are an odd number of data points: (1, 2, 2, 3, 3, 4, 4, 5, 6) The median is the item in the fifth position of the ordered data set, therefore the median is 3.
18
If there are an even number of data points: (1, 2, 2, 3, 3, 4, 4, 5, 6, 793) The formula would tell us to look in the 5.5 th place, which we can’t really do. However we can take the average of the 5 th and 6 th values to give us the median. In the above scenario 3 is in the fifth place and 4 is in the sixth place so we can use 3.5 as our median.
19
Median (Advantage/Disadvantage) Advantage: Resistant to outliers Disadvantage: May not be so informative: (1, 1, 2, 2, 2, 2, 5, 6, 9, 9, 10 ) Does the value of 2 really represent this sample as a whole very well?
20
The Mean Finally, the most commonly used measure of central tendency is called the mean (denoted for a sample, and µ for a population). The mean is the same of what most of us call the average, and it is calculated in the following manner:
21
The Mean For example, given the data set that we used to calculate the median (odd number example), the corresponding mean would be: Similarly, the mean height of a statistics class, as indicated by the previous sample, would be:
22
Mode vs. Median vs. Mean In our height example, the mode and median were the same, and the mean was fairly close to the mode and median. This was the case because the height distribution was fairly symmetrical. However, when the underlying distribution is not symmetrical, the three measures of central tendency can be quite different.
23
This raises the issue of which measure is best. Example: Slices of Pizza Eaten Last Week ValueFreqValueFreq 0485 12102 28151 36161 46201 56401 65
24
Some Visual Demos Here is a demonstration Here is a demonstration that allows you to change a frequency histogram while simultaneously noting the effects of those changes on the mean versus the median. As you use the demo, you should easily be able to think about how these changes are also affecting the mode, right? Note that the order would go Mode Median and Mean in the direction the tail is pointing.
25
Your turn Find the mean, median and mode of the following dataset: 7 3 4 3 5 2 4 6 1 7 3 6 3 3 4 Mean = Median = Mode =
26
Mean = 4.07 Median = 4 Mode = 3
27
Other measures of central tendency (preview) Trimmed mean Created by “trimming” some percentage of the high and low ends of the data M-estimators Extreme values are given less weight than those closer to the center of the distribution. May be more robust than mean or median for certain types of “funky” data
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.