Download presentation
Presentation is loading. Please wait.
1
Lecture 5,6: Measures in Statistics
Statistics for IT Lecture 5,6: Measures in Statistics 9/17/2018
2
Course Objectives After completing this module, students should be able to Summarize data, using measures of central tendency, such as the mean, median, mode, and midrange. Describe data, using measures of variation, such as the range, variance, and standard deviation. Identify the position of a data value in a data set, using various measures of position, such as percentiles, deciles, and quartiles. 9/17/2018
3
AVERAGES, OR MEASURES OF CENTRAL TENDENCY
AVERAGES, OR MEASURES OF CENTRAL TENDENCY An average is a value that is typical, or representative, of a set of data. Since such typical values tend to lie centrally within a set of data arranged according to magnitude, averages are also called measures of central tendency. Several types of averages can be defined, the most common being the arithmetic mean, the median, the mode, the geometric mean, and the harmonic mean. Each has advantages and disadvantages, depending on the data and the intended purpose.
4
Frequency Tables When raw data is organized it can be helpful to display it in the form of a table showing the frequency (f) with which each data item (x) occurs. Such a table is called a frequency table. Eg.
5
Grouped frequency table
Grouped frequency table However, when a larger range of data is involved it may be beneficial to first break the data down into small groups, in which case, the resulting table is referred to as a grouped frequency table.
6
Arithmetic Mean-ungrouped
Arithmetic Mean-ungrouped The arithmetic mean, or briefly the mean, of a set of N numbers X1, X2, X3, ,XN is denoted by 𝑋 (read ‘‘X bar’’) and is defined as Eg.
7
Arithmetic Mean Ungrouped …
Arithmetic Mean Ungrouped … If the numbers X1, X2, , XK occur f1, f2, , fK times, respectively (i.e., occur with frequencies f1, f2, , fK), the arithmetic mean is Eg.
8
Arithmetic Mean…
9
Weighted mean
10
Example- Grade Point Average (GPA)
Example- Grade Point Average (GPA) A student received an A in English Composition (3 credits), a C in Introduction to Psychology (3 credits), a B in Biology (4 credits), and a D in Physical Education (2 credits). Assuming A= 4 grade points, B = 3 grade points, C = 2 grade points, D = 1 grade point, and F = 0 grade points, find the student’s grade point average.
12
THE ARITHMETIC MEAN COMPUTED FROM GROUPED DATA
THE ARITHMETIC MEAN COMPUTED FROM GROUPED DATA The procedure for finding the mean for grouped data assumes that the mean of all the raw data values in each class is equal to the midpoint of the class. In reality, this is not true, since the average of the raw data values in each class usually will not be exactly equal to the midpoint. However, using this procedure will give an acceptable approximation of the mean, since some values fall above the midpoint and other values fall below the midpoint for each class, and the midpoint represents an estimate of all values in the class.
14
Worked Example Miles Run per Week
Worked Example Miles Run per Week Using the given frequency distribution, find the mean. The data represent the number of miles run during one week for a sample of 20 runners.
16
The Median The median is the halfway point in a data set. Before you can find this point, the data must be arranged in order. When the data set is ordered, it is called a data array. The median either will be a specific value in the data set or will fall between two values.
17
Worked Example The number of rooms in the seven hotels in the town X is 713, 300, 618, 595, 311, 401, and 292. Find the median.
18
Worked example The number of tornadoes that have occurred in the United States over an 8-year period follows. Find the median. 684, 764, 656, 702, 856, 1133, 1132, 1303
19
Median - Grouped data The median of a sample of data organized in a frequency distribution is computed by the following formula: where L- is the lower limit of the median class, CF- is the cumulative frequency preceding the median class, f - is the frequency of the median class, and i - is the median class interval.
20
Finding the Median Class
Finding the Median Class To determine the median class for grouped data: Construct a cumulative frequency distribution. Divide the total number of data values by 2. Determine which class will contain this value. For example, if n=50, 50/2 = 25, then determine which class will contain the 25th value - the median class.
21
The Mode The third measure of average is called the mode. The mode is the value that occurs most often in the data set. Unimodal A data set that has only one value that occurs with the greatest frequency is said to be unimodal. Bimodal If a data set has two values that occur with the same greatest frequency, both values are considered to be the mode and the data set is said to be bimodal. Multimodal If a data set has more than two values that occur with the same greatest frequency, each value is used as the mode, and the data set is said to be multimodal. No mode When no data value occurs more than once, the data set is said to have no mode.
22
Worked Example Find the mode of the signing bonuses of eight NFL players for a specific year. The bonuses in millions of Ruppes are 18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10 Answer 10, 10, 10, 11.3, 12.4, 14.0, 18.0, 34.5 Since Rs. 10 million occurred 3 times—a frequency larger than any other number—the mode is Rs.10 million.
24
Mode - Grouped data www.hndit.com
The mode for grouped data is approximated by the midpoint of the class with the largest class frequency. The class with the largest frequency is called modal class. Following formula estimates the mode of grouped data Where Lmo is the lower boundary of the modal class Da is the difference between the frequency of the modal class and the class preceding it Db difference between the frequency of the modal class and the class after it C is the class interval of the modal class
25
1-2 Measure of Dispersion Measure of dispersion is the variation of the set of data It indicated to what degree the individual observations are dispersed or spread out around their mean.
26
Range – Ungrouped data The difference between the highest observation and the lowest observation. Advantage is that it is easy to calculate and gives at least some impression as to the makeup of the data set. Disadvantage is that it takes only two of the observations in the data set
27
Population Variance and Standard deviation – Ungrouped data
Population Variance and Standard deviation – Ungrouped data The population variance is the mean of the squared deviations of the observations from their population mean. The population standard deviation
28
Sample Variance and Standard deviation – Ungrouped data
Sample Variance and Standard deviation – Ungrouped data The sample variance is The sample standard deviation is
29
Sample Variance and Standard deviation
Sample Variance and Standard deviation Why we use n-1 rather than N in sample? Because we have n-1 degree of freedom, or df=n-1 (the number of freedom in any statistical operation is equal to the number of observations minus any constraint placed on those observations) Because a sample is little less dispersed than the population from which it was taken. Here we are trying to use the value of S as an estimation of .
30
Sample Variance and Standard deviation – Grouped data
Sample Variance and Standard deviation – Grouped data The sample variance is The sample standard deviation is
31
Calculating Quartiles
Calculating Quartiles Every data set has three quartiles, which divide it into four equal parts If the horizontal line can be thought of as a data set arranged in an ordered array, three quartiles can be identified, which together produce four separate parts or subset of equal size in the data set. Subset 1 Q1 Subset 2 Subset 3 Subset 4 Q2 Q3
32
First Quartile – Grouped data
First Quartile – Grouped data The first quartile is the value below which, at most, 25% of the observations fall, and above which the remaining 75% can be found where L=lower limit of the class containing Q1, CF= cumulative frequency preceding class containing Q1, f= frequency of class containing Q1, i= size of class containing Q1.
33
Second & Third Quartile – Grouped data
Second & Third Quartile – Grouped data The second quartile is right in the middle. Same as the median The third quartile is the value below which, at most, 75% of the observations fall, and above which the remaining 25% can be found where L=lower limit of the class containing Q3, CF= cumulative frequency preceding class containing Q3, f= frequency of class containing Q3, i= size of class containing Q3.
34
Inter-quartile Range The Interquartile range is the distance between the third quartile Q3 and the first quartile Q1. Inter-quartile range = third quartile - first quartile = Q3 - Q1
35
Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.