The Central Tendency is the center of the distribution of a data set. You can think of this value as where the middle of a distribution lies. Measure of central tendency: Numbers that describe what is average or typical of the distribution… Mean, Median, Mode
Mean: The sum of all the data values divided by the number of values Median: The middle number when the data is arranged in order Mode: The value that occurs most frequently in the data
Data: 4,17,7,14,18,12,3,16,10,4,4,11 1. Order your data (putting the values in numerical order). 3,4,4,4,7,10,11,12,14,16,17,18 2. Find the median of your data. The median divides the data into two halves. Median: 10.5 3. To divide the data into quarters, you then find the medians of these two halves. 3,4,4,4,7, 10, Median: 4 11,12,14,16,17,18 Median: 15 4. Now you have three points: These three points divide the entire data set into quarters, called "quartiles ◦ Quartile 1 (Q1) = (4+4)/2 = 4 ◦ Quartile 2 (Q2) = (10+11)/2 = 10.5 ◦ Quartile 3 (Q3) = (14+16)/2 = 15 Once you have these three points, Q 1, Q 2, and Q 3, you have all you need in order to draw a simple box-and-whisker plot.
Percentile rank is calculated by taking the number of data points with values less than the value we want, and dividing that sum by the total number of data points.
(14.68)= Notice that all the data values in the bins up to 60 are less than Adding the frequencies up to 60 is out of 40 (total) is approximately 92.5%. So is approximately the 93 rd percentile
Deviations measure signed difference between the data values and the mean The variance is another measure of variability that is equal to the sum of the squares of the deviations divided by one less than the number of values.
Connie’s mean: 84 Oscar’s mean: 84 Example: Semester assignments scores
These are Connie’s and Oscar’s scores and their deviations from the mean score for each student. How can we combine the deviations into a single value that reflects the spread in a data set? Should we find the sum of the standard deviations? Let’s try that…. Of course, they cancel out!! So we need to eliminate the effect of the different signs! Any ideas?
When you sum the squares of the deviations, the sum is no longer zero!! The sum of the squares of the deviations, divided by one less than the number of values, is called the variance of the data. The square root of the variance is called the standard deviation of the data. The standard deviation provides one way to judge the “average difference” between data values and the mean. It is a measure of how the data are spread around the mean.
A histogram is a graphical representation of a data set, with columns to show how the data are distributed across different intervals of values. The columns of a histogram are called bins and should not be confused with the bars of a bar graph. The bars of a bar graph indicate categories— how many data items either have the same value or share a characteristic (eye color). The bins of a histogram indicate how many numerical data values fall within a certain interval.
The median (Q2) lies in the middle of its first and third quartiles. The minimum and maximum do not have to be equally far away from the median. The median (Q2) is closer to the first quartile. The mean is typically greater than the median. The mean is typically less than the median. The median is closer to the third quartile.
Shatevia took a random sample of 50 students who own MP3 players at her high school and asked how many songs they have stored. The two graphs were constructed from the data in the table.
a.What is the range of the data? The number of songs goes from a low of 765 songs to a high of 1013 songs. The range is 248 songs. b. What is the bin width of each graph The bin width of Graph A is 50 songs, and the bin width of Graph B is 10 songs. c. How can you know if the graph accounts for all 50 values? The sum of all the bin frequencies is 50 for each of the graphs. d. Why are the columns shorter in Graph B? The bins in Graph A hold the values of up to five bins from Graph B. With smaller bin widths you will usually have shorter bins. e. Which graph is better at showing the overall shape of the distribution? What is that shape? Graph A shows that the distribution is skewed left. This fact is harder to see with all the ups and downs in Graph B
Add the bin frequencies for the bins below (to the left of) 850 songs. There are 10 data values, so 10 out of 50, or 20% of the sample, had fewer than 850 songs f. Which graph is better at showing the gaps and cluster in the data? With more bins you can see gaps and clusters in the data. A dot plot is like a histogram with a very small bin width. Graph B is the better graph for seeing gaps and clusters g. What percentage of the players have fewer than 850 songs stored?