Download presentation
Presentation is loading. Please wait.
Published bySharyl Edwards Modified over 8 years ago
1
CHAPTET 3 Data Description
2
O UTLINE : Introduction. 3-1 Measures of Central Tendency. 3-2 Measures of Variation. 3-3 Measures of Position. 3-4 Exploratory Data Analysis
3
3-1 M EASURES OF C ENTRAL T ENDENCY : Mean, Median, Mode, Midrange, Weighted Mean.
4
Statistic: sample Is a characteristic or measure obtained by using all the data values from a sample. Parameter: specific population. Is a characteristic or measure obtained by using all the data values from a specific population.
5
T HE M EAN : The mean is the sum of the values, divided by the total number of values. The symbol for the sample mean: The symbol for the population mean: Where n: no. of val. In sample. Where N: no. of val. In population.
6
E XAMPLE 3-1: The data represent the number of days off per year for a sample of individuals selected from nine different countries. Find the mean 20, 26, 40, 36, 23, 42, 35, 24, 30
7
P ROPERTIES OF THE M EAN Uses all data values. Varies less than the median or mode Used in computing other statistics, such as the variance Unique, usually not one of the data values Cannot be used with open-ended classes Affected by extremely high or low values, called outliers
8
T HE M EDIAN : median (MD) The median (MD) is the midpoint of the data array. *** The median is found by arranging the data in order and selecting the middle number *** Odd number of values The median will be one of the data values in the middle. Even number of values The median will be the average of two data values.
9
E XAMPLE 3-4: The number of rooms in the seven hotels in downtown Pittsburgh is: 713, 300, 618, 595, 311, 401, and 292.. Find the median
10
E XAMPLE 3-6: The number of tornadoes that have occurred in the United States over an 8-year period follows. 684, 764, 656, 702, 856, 1133, 1132, 1303 Find the median
11
P ROPERTIES OF THE M EDIAN Gives the midpoint Used when it is necessary to find out whether the data values fall into the upper half or lower half of the distribution. Can be used for an open-ended distribution. Affected less than the mean by extremely high or extremely low values.
12
The cost of four toys in a certain toy shop is given: 15$, 40$, 30$, 1350$ measure of central tendency should be used: a) mean b) median c) mode d) range
13
T HE M ODE : The value that occurs most often in a data set is called the mode. Unimodal A data set that has only one mode. Bimodal A data set that has two mode. Multimodal A data set that has more than two mode. No mode When no data value occurs more than once.
14
E XAMPLE 3-9: Find the mode of the signing bonuses of eight NFL players for a specific year. The bonuses in millions of dollars are 18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10
15
E XAMPLE 3-10: Find the mode for the number of coal employees per county for 10 selected counties in southwestern Pennsylvania. 110, 731, 1031, 84, 20, 118, 1162, 1977, 103, 752
16
P ROPERTIES OF THE M ODE Used when the most typical case is desired Easiest average to compute Can be used with nominal data (Qualitative data) Not always unique or may not exist
17
* The ………. Is a measure of central tendency should be used when the data are qualitative. a) median b) mean c) mode d) range ----------------------------------------------------------------------- * When the data are categorical (red, blue, black, white) the measure of central tendency can be used: a) median b) mean c) mode d) range
18
If the number of books in a sample of 5-box are as follow : 11, 18, 2, 2, 7, 7, 2, 5 the set is said to have ………. a) Multimodal. b) Unimodal. c) Zero. d) Bimodal.
19
T HE M IDRANGE : The midrange is defined as the sum of the lowest and highest values in the data set, divided by 2. The symbol of midrange is MR.
20
E XAMPLE 3-15: In the last two winter seasons, the city of Brownsville, Minnesota, reported these numbers of water-line breaks per month.. 2, 3, 6, 8, 4, 1 Find the midrange
21
P ROPERTIES OF THE M IDRANGE Easy to compute. Gives the midpoint. Affected by extremely high or low values in a data set.
22
T HE W EIGHTED M EAN : Find the Weighted Mean is used when the values in a data set are not equally represented. Where w 1,w 2, …., w n are the weights And x 1, x 2, ….., x n are the values.
23
E XAMPLE 3-17: A student received the following grades. Find the corresponding GPA. CourseCredits,Grade, English Composition 3 A (4 points) Introduction to Psychology 3 C (2 points) Biology 4 B (3 points) Physical Education 2 D (1 point)
24
S UMMARY OF M EASURES OF C ENTRAL T ENDENCY. MeasureDefinitionSymbol MeanSum of values, divided by total no. of values, MedianMiddle point in data arrayMD ModeMost frequent data value- MidrangeL.V plus H.V,divided by 2MR
25
x y Positively skewed ModeMedianMean Mean > MD> Mode Mode Median Mean Negatively skewed Mean < MD< Mode x y
26
Mode Median Mean = MD = D Symmetric distribution
27
3-2 M EASURES OF V ARIATION : How Can We Measure Variability? o Range o Variance o Standard Deviation o Coefficient of Variation
28
R ANGE range The range is the difference between the highest and lowest values in a data set.
29
E XAMPLE 3-18/19: Two experimental brands of outdoor paint are tested to see how long each will last before fading. Six cans of each brand constitute a small population. The results (in months) are shown. Find the mean and range of each group. Brand ABrand B 1035 6045 5030 35 40 2025
30
V ARIANCE & S TANDARD D EVIATION variance The variance is the average of the squares of the distance each value is from the mean. standard deviation The standard deviation is the square root of the variance. The standard deviation is a measure of how spread out your data are.
31
population variance The population variance is population standard deviation The population standard deviation is sample variance The sample variance is sample standard deviation The sample standard deviation is
32
U SES OF THE V ARIANCE AND S TANDARD D EVIATION To determine the spread of the data. To determine the consistency of a variable. To determine the number of data values that fall within a specified interval in a distribution (Chebyshev’s Theorem). Used in inferential statistics.
33
E XAMPLE 3-21: Find the variance and standard deviation for the data set for Brand A paint. 10, 60, 50, 30, 40, 20 Months, X - µ 10 60 50 30 40 20 1750
34
M EASURES OF V ARIATION : V ARIANCE & S TANDARD D EVIATION (S AMPLE C OMPUTATIONAL M ODEL ) Is mathematically equivalent to the theoretical formula. Saves time when calculating by hand Does not use the mean Is more accurate when the mean has been rounded.
35
958.94 E XAMPLE 3-23: Find the variance and standard deviation for the amount of European auto sales for a sample of 6 years. The data are in millions of dollars. 11.2, 11.9, 12.0, 12.8, 13.4, 14.3 XX 2 11.2 11.9 12.0 12.8 13.4 14.3 75.6
36
C OEFFICIENT OF V ARIATION coefficient of variation The coefficient of variation is the standard deviation divided by the mean, expressed as a percentage. Use CVAR to compare standard deviations when the units are different.
37
E XAMPLE 3-25: The mean of the number of sales of cars over a 3-month period is 87, and the standard deviation is 5. The mean of the commissions is $5225, and the standard deviation is $773. Compare the variations of the two.
38
If the Cvar for the English final exam was 6.9% and Cvar for History final exam was 4.9%. Compare the variations: a) The English class was more variable. b) The History class was more variable. c) Cannote determine that. d) Both of classes has the same variation.
39
For any bell shaped distribution. Approximately 68% of the data values will fall within one standard deviation of the mean. Approximately 95% of the data values will fall within two standard deviation of the mean. Approximately 99.7% of the data values will fall within three standard deviation of the mean.
41
= 480, S = 90, approximately 68% Then the data fall between 570 and 390 = 480, S = 90, approximately 95% Then the data fall between 660 and 300 = 480, S = 90, approximately 99.7% Then the data fall between 750 and 210
42
When a distribution is bell-shaped, approximately what percentage of data values will fall within 1standard deviation of the mean? a) 90% b) 68% c) 86% d) 99% ---------------------------------------------------------------------------------------- Math exam scores have a bell-shaped distribution with a mean of 100 and standard deviation of 15. use the empirical rule to find the percentage of students with scores between 70 and 130?
43
M EASURE OF POSITION Standard score or z score Standard score or z score Quartile Quartile Outlier Outlier
45
z score or standard score (relative position) For SampleFor Population If z score is negative the score is below the mean If z score is positive the score is above the mean
47
Test AX=38 = 40S=5 Test BX=94 = 100S=10
48
Quartiles
49
Quartiles divide the data set into 4 equal groups. Deciles are denoted Q 1, Q 2 and Q 3 with the corresponding percentiles being P 25, P 50 and P 75. The median is the same as P 50 or Q 2. Smallest data value largest data value Q3Q3 Q2Q2 Q1Q1 25% 50% 75%
50
P ROCEDURE T ABLE Finding Data Values Corresponding to Q 1,Q 2 and Q 3. Step 1: Arrange the data in order from lowest to highest. Step 2: Find the median of the data values.This is the value for Q 2. Step 3: Find the median of the data values that fall below Q 2.This is the value for Q 1. Step 4: Find the median of the data values that fall above Q 2.This is the value for Q 3.
52
Outliers
53
An outlier is an extremely high or an extremely low data value when compare with the rest of the data values. Step 1: Arrange the data in order and find Q 1 and Q 3. Step 2: Find the interquartile range IQR= Q 3 - Q 1 Step 3: Multiply the IQR by 1.5. Step 4: Subtract the value obtained in step 3 form Q 1 and add the value to Q 3. Step 5: any value: smaller than Q 1 -1.5(IQR) or larger than Q 3 +1.5(IQR). outliers
55
Use the data below to find the outliers: 8, 4, 0, 2, 10, 15, 12, 22 a) 15 b) 22 c) No outlier d) 0 ---------------------------------------------------------------------------------------------------------- If Q1=4, IQR=40, Find Q3: a) 10 b) 44 c) 40 d) -4
56
* All the values in a dataset are between 60 and 88, except for one value of 97. That value 97 is likely to be ------ An outlier The mean The box plot The range
57
E XPLORATORY D ATA A NALYSIS (EDA)
58
A Box plot can be used to graphically represent the data set. For the box plot, The five –Number Summary : 1-lowest value of the data set. (min) 2-Q1. 3-the median(MD) Q2. 4-Q3. 5-the highest value of the data set. (max)
59
P ROCEDURE FOR CONSTRUCTING A BOXPLOT 1. Find five -Number summary. 2. Draw a horizontal axis with a scale such that it includes the maximum and minimum data value. 3. Draw a box whose vertical sides go through Q 1 and Q 3,and draw a vertical line though the median. 4. Draw a line from the minimum data value to the left side of the box and line from the maximum data value to the right side of the box. Q1Q1 Q2Q2 Q3Q3 minimum maximum
60
Cheese substituteReal cheese 2902501802704045420310 34026013090180240220 Compare the distributions using Box Plots? Step1: Find Q 1,MD,Q 3 for the Real cheese data 40, 45, 90, 180, 220, 240, 310, 420 Q1 MD Q3
61
Step2:Find Q 1, MD and Q 3 for the cheese substitute data. 130, 180, 250, 260, 270, 290, 310, 340 Q1 MD Q3,, =265,,
62
67.5 200 275 40 420 215 265 300 130 340 0 100 200 300 400 500
63
Left skewed (-ve skewed) Right skewed (+ve skewed)
64
The median(MD) is near the center.The distribution is symmetric. The median falls to left of the centerThe distribution is positively skewed (right skewed). The median falls to right of the center. The distribution is negatively skewed (left skewed). The lines are the same length.The distribution is symmetric. The right line is larger than the left line. The distribution is positively skewed (Right skewed).. The left line is larger than the right line. The distribution is negatively skewed (Left skewed). I NFORMATION OBTAINED FROM A B OX PLOT From Line: From Box plot:
65
The following stem & leaf represent the scores for students on an statistics exam: 2 0 1 3 3 3 4 5 7 7 7 7 8 4 2 3 3 5 5 5 2 4 6 6 1 2 * Find the mode: a) 37 b) 23 c) 40 d) 62 * The range: a) 40 b) 42 c) 30 d)33 * The mean: a) 41.19 b) 43 c) 42.5 d) 20 * The skewed in this graph: a) Symmetric skewed b) positive skewed c) negative skewed
66
The following stem & leaf represent the scores for students on an statistics exam: 2 0 3 1 4 4 0 1 2 2 5 2 * Find the mode: a) 42 b) 23 c) 40 d) 62 * The range: a) 40 b) 42 c) 30 d)32 * The mean: a) 41.19 b) 43 c) 42.5 d) 37.75 * The skewed in this graph: a) Symmetric skewed b) positive skewed c) negative skewed
67
* The IQR is …….. a) 4 b) 5 c) 10 d)0 02 468 10 * The distribution shape is …….. a) Positively skewed b) Negative skewed
68
* A characteristic or measure obtained by using the data values from a population is collide a ………. a) Statistic b) Quartile c) Percent d) Parameter ---------------------------------------------------------------------------------------------------------------------- * ……….. is the symbols of mean in the sample -------------------------------------------------------------------------------------------------------------------- * Find the mean of following data 10, 15, 12, 9, 2, 6 ? a) 12 b) 9 c) 54 d) 10.9 ----------------------------------------------------------------- * What is the median of following data 5, 7, 10, 3, 8?
69
* If the number of books in a sample of five boxes are follow 11, 8, 2, 2, 7, 7, 2, 5. Then the set is said to have a) Multimodal b) Unimodal c) Zero d) No mode ------------------------------------------------------------------------------------------------- * Find the midrange (MR) for the following data ---------------------------------------------------------- 7, -5, 2, 10, 15 * If the mean of 5 values equals 64, then --------------------------------------------------------- * The measures of central tendency for the following data 1,3,9,11,2 are: Mean= …………… median =……………… mode= ……………..
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.