Descriptive Statistics

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Descriptive Statistics
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Descriptive Statistics
Descriptive Statistics
Measures of Variation Section 2.4 Statistics Mrs. Spitz Fall 2008.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive Statistics
12.3 – Measures of Dispersion
Measures of Central Tendency Section 2.3 Statistics Mrs. Spitz Fall 2008.
Descriptive Statistics
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
More Graphs and Displays
2.1: Frequency Distributions and Their Graphs. Is a table that shows classes or intervals of data entries with a count of the number of entries in each.
Frequency Distributions and Their Graphs
Descriptive Statistics
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Section 2.4 Measures of Variation.
Copyright © 2004 Pearson Education, Inc.
Descriptive Statistics
Chapter 2 Summarizing and Graphing Data
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
MM207-Statistics Unit 2 Seminar-Descriptive Statistics Dr Bridgette Stevens AIM:BStevensKaplan (add me to your Buddy list) 1.
Descriptive Statistics Measures of Variation. Essentials: Measures of Variation (Variation – a must for statistical analysis.) Know the types of measures.
Review Measures of central tendency
Copyright © 2004 Pearson Education, Inc.. Chapter 2 Descriptive Statistics Describe, Explore, and Compare Data 2-1 Overview 2-2 Frequency Distributions.
Statistics Chapter 9. Day 1 Unusual Episode MS133 Final Exam Scores
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
Chapter 2 Describing Data.
Lecture 3 Describing Data Using Numerical Measures.
Descriptive Statistics
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Descriptive Statistics. Frequency Distributions and Their Graphs What you should learn: How to construct a frequency distribution including midpoints,
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Section 2.1 Frequency Distributions and Their Graphs.
Symbol Description It would be a good idea now to start looking at the symbols which will be part of your study of statistics.  The uppercase Greek letter.
+ CHAPTER 2 Descriptive Statistics SECTION 2.1 FREQUENCY DISTRIBUTIONS.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
Sect.2.4 Measures of variation Objective: SWBAT find the range of a data set Find the variance and standard deviation of a population and of a sample How.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Chapter 4 Histograms Stem-and-Leaf Dot Plots Measures of Central Tendency Measures of Variation Measures of Position.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Descriptive Statistics Measures of Variation
Chapter 2 Descriptive Statistics.
Chapter 3 Describing Data Using Numerical Measures
7. Displaying and interpreting single data sets
Descriptive Statistics
NUMERICAL DESCRIPTIVE MEASURES
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Descriptive Statistics
Chapter 2: Descriptive Statistics
Section 2.4 Measures of Variation.
Descriptive Statistics
Two Data Sets Stock A Stock B
Chapter 2 Describing, Exploring, and Comparing Data
Presentation transcript:

Descriptive Statistics 2 Descriptive Statistics Elementary Statistics Larson Farber

Frequency Distributions and Their Graphs Section 2.1 Frequency Distributions and Their Graphs

Frequency Distributions Minutes Spent on the Phone 102 124 108 86 103 82 71 104 112 118 87 95 103 116 85 122 87 100 105 97 107 67 78 125 109 99 105 99 101 92 This data will be used for several examples. You might want to duplicate it for use with slides that follow. Make a frequency distribution table with five classes.

Frequency Distributions Classes - the intervals used in the distribution Class width - the range divided by the number of classes, round up to next number greatest # - smallest # ALWAYS ROUND UP # of classes Lower class limit - the smallest # that can be in the class Upper class limit - the greatest # that can be in the class Frequency - the number of items in the class This data will be used for several examples. You might want to duplicate it for use with slides that follow.

Frequency Distributions Midpoint - the sum of the limits divided by 2 lower class limit + upper class limit 2 Relative frequency - the portion (%) of data in that class class frequency (f) sample size (n) Cumulative frequency – the sum of the frequencies for that class and all previous classes This data will be used for several examples. You might want to duplicate it for use with slides that follow.

Construct a Frequency Distribution Minimum = 67, Maximum = 125 Number of classes = 5 Class width = 12 3 5 8 9 Class Limits Tally 78 90 102 114 126 67 79 91 103 115 Do all lower class limits first.

Other Information Class Midpoint Relative Frequency Cumulative 67 - 78 79 - 90 91 - 102 103 - 114 115 - 126 3 5 8 9 72.5 84.5 96.5 108.5 120.5 0.10 0.17 0.27 0.30 3 8 16 25 30 The first two columns reflect the work done in previous slides. Once the first midpoint is calculated, the others can be found by adding the class width to the previous midpoint. Notice the last entry in the cumulative frequency column is equal to the total frequency.

Frequency Histogram A bar graph that represents the frequency distribution of the data set horizontal scale uses class boundaries or midpoints vertical scale measures frequencies consecutive bars must touch The first two columns reflect the work done in previous slides. Once the first midpoint is calculated, the others can be found by adding the class width to the previous midpoint. Notice the last entry in the cumulative frequency column is equal to the total frequency. Class boundaries - numbers that separate classes without forming gaps between them

Frequency Histogram Boundaries Class 3 66.5 - 78.5 67 - 78 78.5 - 90.5 90.5 - 102.5 102.5 -114.5 114.5 -126.5 Class 67 - 78 79 - 90 91 - 102 103 -114 115 -126 3 5 8 9 Time on Phone 9 9 8 8 7 6 5 5 5 A histogram is a bar graph for which the bars touch. To form boundaries, find the distance between consecutive classes. Add half that distance to the lower limits and half to the upper limits. In this case the distance is 1 unit so add .5 to all upper limits and subtract .5 from all lower ones. The data must be quantitative. This histogram is labeled at the class boundaries. Explain that midpoints could have been labeled instead. 4 3 3 2 1 6 6 . 5 7 8 . 5 9 . 5 1 2 . 5 1 1 4 . 5 1 2 6 . 5 minutes

Relative Frequency Histogram A bar graph that represents the relative frequency distribution of the data set Same shape as frequency histogram horizontal scale uses class boundaries or midpoints vertical scale measures relative frequencies The first two columns reflect the work done in previous slides. Once the first midpoint is calculated, the others can be found by adding the class width to the previous midpoint. Notice the last entry in the cumulative frequency column is equal to the total frequency.

Relative Frequency Histogram Time on Phone Relative frequency A relative frequency histogram will have the same shape as a frequency histogram. minutes Relative frequency on vertical scale

Frequency Polygon A line graph that emphasizes the continuous change in frequencies horizontal scale uses class midpoints vertical scale measures frequencies The first two columns reflect the work done in previous slides. Once the first midpoint is calculated, the others can be found by adding the class width to the previous midpoint. Notice the last entry in the cumulative frequency column is equal to the total frequency.

Frequency Polygon Time on Phone minutes Class 3 67 - 78 79 - 90 5 91 - 102 103 -114 115 -126 3 5 8 9 Time on Phone 9 9 8 8 7 6 5 5 5 4 3 3 2 1 The frequency polygon is labeled at midpoints. Explain that midpoints could have been labeled instead. 72.5 84.5 96.5 108.5 120.5 minutes Mark the midpoint at the top of each bar. Connect consecutive midpoints. Extend the frequency polygon to the axis.

Ogive Also called a cumulative frequency graph A line graph that displays the cumulative frequency of each class horizontal scale uses upper boundaries vertical scale measures cumulative frequencies The first two columns reflect the work done in previous slides. Once the first midpoint is calculated, the others can be found by adding the class width to the previous midpoint. Notice the last entry in the cumulative frequency column is equal to the total frequency.

Ogive An ogive reports the number of values in the data set that are less than or equal to the given value, x. Minutes on Phone 3 8 16 25 30 66.5 78.5 90.5 102.5 114.5 126.5 10 20 30 Cumulative Frequency Label each boundary on the horizontal axis. Start with 0 for the lower boundary of the first class. Then mark points corresponding to cumulative frequencies at the upper boundaries. Connect with line segments. minutes

More Graphs and Displays Section 2.2 More Graphs and Displays

Stem-and-Leaf Plot -contains all original data -easy way to sort data & identify outliers Minutes Spent on the Phone 102 124 108 86 103 82 71 104 112 118 87 95 103 116 85 122 87 100 105 97 107 67 78 125 109 99 105 99 101 92 This data will be used for several examples. You might want to duplicate it for use with slides that follow. Minimum value = Maximum value = 67 Key values: 125

Stem-and-Leaf Plot Lowest value is 67 and highest value is 125, so list stems from 6 to 12. Never skip stems. You can have a stem with NO leaves. Stem Leaf Stem Leaf 6 | 7 | 8 | 9 | 10 | 11 | 12 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | Divide each data value into a stem and a leaf. The leaf is the rightmost significant digit. The stem consists of the digits to the left. The data shown represent the first line of the ‘minutes on phone’ data used earlier. The complete stem and leaf will be shown on the next slide.

Stem-and-Leaf Plot 6 | 7 7 | 1 8 8 | 2 5 6 7 7 9 | 2 5 7 9 9 6 | 7 7 | 1 8 8 | 2 5 6 7 7 9 | 2 5 7 9 9 10 | 0 1 2 3 3 4 5 5 7 8 9 11 | 2 6 8 12 | 2 4 5 Stress the importance of using a key to explain the plot. 6|7 could mean 6700 or .067 for a different problem. A stem and leaf should not be used with data when values are very different such as 3, 34,900, 24 etc. The stem-and leaf has the advantage over a histogram of retaining the original values. Key: 6 | 7 means 67

Stem-and-Leaf with two lines per stem Key: 6 | 7 means 67 6 | 7 7 | 1 7 | 8 8 | 2 8 | 5 6 7 7 9 | 2 9 | 5 7 9 9 10 | 0 1 2 3 3 4 10 | 5 5 7 8 9 11 | 2 11 | 6 8 12 | 2 4 12 | 5 1st line digits 0 1 2 3 4 2nd line digits 5 6 7 8 9 With two lines per stem the data is more finely “chopped”. Class width is 5 times the leaf unit. All stems except possibly the first and last must have two lines even if one is blank. For this data set, the first line for the stem 6 can be blank because there are no data values from 60 to 64. 1st line digits 0 1 2 3 4 2nd line digits 5 6 7 8 9

Dot Plot -contains all original data -easy way to sort data & identify outliers Minutes Spent on the Phone 66 76 86 96 106 116 126 minutes Dot plots also allow you to retain original values.

Pie Chart / Circle Graph Used to describe parts of a whole Central Angle for each segment NASA budget (billions of $) divided among 3 categories. Pie charts help visualize the relative proportion of each category. Find the relative frequency for each category and multiply it by 360 degrees to find the central angle. Billions of $ Human Space Flight 5.7 Technology 5.9 Mission Support 2.7 Construct a pie chart for the data.

Pie Chart Human Space Flight 5.7 143 Technology 5.9 149 Billions of $ Degrees Human Space Flight 5.7 143 Technology 5.9 149 Mission Support 2.7 68 14.3 360 Total Mission Support 19% Human Space Flight 40% NASA Budget (Billions of $) Technology 41%

Pareto Chart -The bars are in order of decreasing height -A vertical bar graph in which the height of the bar represents frequency or relative frequency -The bars are in order of decreasing height -See example on page 53

Scatter Plot Absences Grade - Used to show the relationship between two quantitative sets of data x 8 2 5 12 15 9 6 y 78 92 90 58 43 74 81 Final grade (y) 40 45 50 55 60 65 70 75 80 85 90 95 2 4 6 8 10 12 14 16 Absences (x)

Time Series Chart / Line Graph - Quantitative entries taken at regular intervals over a period of time - See example on page 55

Measures of Central Tendency Section 2.3 Measures of Central Tendency

Measures of Central Tendency Mean: The sum of all data values divided by the number of values For a population: For a sample: Median: The point at which an equal number of values fall above and fall below There are other measures of central tendency (ex. midrange), but these three are the most commonly used. Be sure to discuss the advantages and disadvantages for each. Discuss the sigma notation. Mode: The value with the highest frequency

Calculate the mean, the median, and the mode An instructor recorded the average number of absences for his students in one semester. For a random sample the data are: 2 4 2 0 40 2 4 3 6 Calculate the mean, the median, and the mode As students which measure is the most representative of the center of the data.

Calculate the mean, the median, and the mode An instructor recorded the average number of absences for his students in one semester. For a random sample the data are: 2 4 2 0 40 2 4 3 6 Calculate the mean, the median, and the mode Mean: Median: Sort data in order As students which measure is the most representative of the center of the data. 0 2 2 2 3 4 4 6 40 The middle value is 3, so the median is 3. Mode: The mode is 2 since it occurs the most times.

Calculate the mean, the median, and the mode. Suppose the student with 40 absences is dropped from the course. Calculate the mean, median and mode of the remaining values. Compare the effect of the change to each type of average. 2 4 2 0 2 4 3 6 Calculate the mean, the median, and the mode. Ask students which measure was changed the most by eliminating the extreme value (outlier). Mode: The mode is 2 since it occurs the most times.

Calculate the mean, the median, and the mode. Suppose the student with 40 absences is dropped from the course. Calculate the mean, median and mode of the remaining values. Compare the effect of the change to each type of average. 2 4 2 0 2 4 3 6 Calculate the mean, the median, and the mode. Mean: Median: Sort data in order. Ask students which measure was changed the most by eliminating the extreme value (outlier). 0 2 2 2 3 4 4 6 The middle values are 2 and 3, so the median is 2.5. Mode: The mode is 2 since it occurs the most times.

Shapes of Distributions Symmetric Uniform Mean = Median Skewed right positive Skewed left negative Both curves at the top are symmetric. Note that the fulcrum is placed at the mean of each distribution. The direction of skew is with the tail. The mean is always in the direction of the skew. Mean > Median Mean < Median

Weighted Mean A weighted mean is the mean of a data set whose entries have varying weights X = where w is the weight of each entry As students which measure is the most representative of the center of the data.

Weighted Mean A student receives the following grades, A worth 4 points, B worth 3 points, C worth 2 points and D worth 1 point. If the student has a B in 2 three-credit classes, A in 1 four-credit class, D in 1 two-credit class and C in 1 three-credit class, what is the student’s mean grade point average? As students which measure is the most representative of the center of the data.

Mean of Grouped Data The mean of a frequency distribution for a sample is approximated by X = where x are the midpoints, f are the frequencies and n is As students which measure is the most representative of the center of the data.

Mean of Grouped Data The heights of 16 students in a physical ed. class: Height Frequency 60-62 3 63-65 4 66-68 7 69-71 2 Approximate the mean of the grouped data As students which measure is the most representative of the center of the data.

Section 2.4 Measures of Variation

Two Data Sets Stock A Stock B Closing prices for two stocks were recorded on ten successive Fridays. Calculate the mean, median and mode for each. 56 33 56 42 57 48 58 52 61 57 63 67 67 77 67 82 67 90 Stock A Stock B Use this example to review the measures of central tendency. Both sets of data have the same mean, the same median and the same mode. Students clearly see that the data sets are vastly different though. A good lead in for measures of variation.

Two Data Sets Stock A Stock B Closing prices for two stocks were recorded on ten successive Fridays. Calculate the mean, median and mode for each. 56 33 56 42 57 48 58 52 61 57 63 67 67 77 67 82 67 90 Stock A Stock B Use this example to review the measures of central tendency. Both sets of data have the same mean, the same median and the same mode. Students clearly see that the data sets are vastly different though. A good lead in for measures of variation. Mean = 61.5 Median = 62 Mode = 67 Mean = 61.5 Median = 62 Mode = 67

Measures of Variation Range = Maximum value – Minimum value Range for A = 67 – 56 = $11 Range for B = 90 – 33 = $57 The range is easy to compute but only uses two numbers from a data set. Explain that if only one number changes, the range can be vastly changed. If a $56 stock dropped to $12 the range would change. Emphasize the need for different symbols for population parameters and sample statistics.

Measures of Variation To calculate measures of variation that use every value in the data set, you need to know about deviations. The deviation for each value x is the difference between the value of x and the mean of the data set. Explain that if only one number changes, the range can be vastly changed. If a $56 stock dropped to $12 the range would change. Emphasize the need for different symbols for population parameters and sample statistics. In a population, the deviation for each value x is: In a sample, the deviation for each value x is:

Deviations Stock A Deviation 56 57 58 61 63 67 – 5.5 – 4.5 – 3.5 – 0.5 1.5 5.5 56 – 61.5 56 – 61.5 57 – 61.5 58 – 61.5 Make sure students know that the sum of the deviations from the mean is always 0. The sum of the deviations is always zero.

Population Variance Population Variance: The sum of the squares of the deviations, divided by N. x 56 57 58 61 63 67 ( )2 – 5.5 – 4.5 – 3.5 – 0.5 1.5 5.5 30.25 20.25 12.25 0.25 2.25 188.50 Each deviation is squared to eliminate the negative sign. Students should know how to calculate these formulas for small data sets. For larger data sets they will use calculators or computer software programs. Sum of squares

Population Standard Deviation Population Standard Deviation: The square root of the population variance. The variance is expressed in “square units” which are meaningless. Using a standard deviation returns the data to its original units. The population standard deviation is $4.34.

Sample Variance and Standard Deviation To calculate a sample variance divide the sum of squares by n – 1. The sample standard deviation, s, is found by taking the square root of the sample variance. Samples only contain a portion of the population. This portion is likely to not contain extreme values. Dividing by n-1 gives a higher value than dividing by n and so increases the standard deviation.

Interpreting Standard Deviation Standard deviation is a measure of the typical amount an entry deviates (is away) from the mean. The more the entries are spread out, the greater the standard deviation. The closer the entries are together, the smaller the standard deviation. When all data values are equal, the standard deviation is 0. A summary of the formulas

Summary Range = Maximum value – Minimum value Population Variance Population Standard Deviation Sample Variance A summary of the formulas Sample Standard Deviation

Empirical Rule (68-95-99.7%) Data with symmetric bell-shaped distribution have the following characteristics. 13.5% 13.5% 2.35% 2.35% –4 –3 –2 –1 1 2 3 4 This type of distribution will be studied in much more detail in later chapters. About 68% of the data lies within 1 standard deviation of the mean About 95% of the data lies within 2 standard deviations of the mean About 99.7% of the data lies within 3 standard deviations of the mean

Using the Empirical Rule The mean value of homes on a certain street is $125,000 with a standard deviation of $5,000. The data set has a bell shaped distribution. Estimate the percent of homes between $120,000 and $135,000.

Using the Empirical Rule The mean value of homes on a certain street is $125,000 with a standard deviation of $5,000. The data set has a bell shaped distribution. Estimate the percent of homes between $120,000 and $135,000. 125 130 135 120 140 145 115 110 105 $120,000 is 1 standard deviation below the mean and $135,000 is 2 standard deviations above the mean. 68% + 13.5% = 81.5% So, 81.5% have a value between $120 and $135 thousand.

Chebychev’s Theorem For any distribution regardless of shape the portion of data lying within k standard deviations (k > 1) of the mean is at least 1 – 1/k2. For k = 2, at least 1 – 1/4 = 3/4 or 75% of the data lie within 2 standard deviation of the mean. At least 75% of the data is between -1.68 and 13.68. The empirical rule is valid for symmetric, mound -shaped distributions. Chebychev’s theorem applies to all distributions. Emphasize the phrase “at least” in the statement of the theorem. For k = 3, at least 1 – 1/9 = 8/9 = 88.9% of the data lie within 3 standard deviation of the mean. At least 89% of the data is between -5.52 and 17.52.

Chebychev’s Theorem The mean time in a women’s 400-meter dash is 52.4 seconds with a standard deviation of 2.2 sec. Apply Chebychev’s theorem for k = 2. Apply Chebychev’s theorem for k = 3. At least 1-19 = 8/9 or the data values will fall within 3 standard deviation of the mean no matter what the distribution. At least 8/9 of the times in this race will fall between 45.8 and 59 sec.

Chebychev’s Theorem The mean time in a women’s 400-meter dash is 52.4 seconds with a standard deviation of 2.2 sec. Apply Chebychev’s theorem for k = 2. Mark a number line in standard deviation units. 2 standard deviations A Apply Chebychev’s theorem for k = 3. At least 1-19 = 8/9 or the data values will fall within 3 standard deviation of the mean no matter what the distribution. At least 8/9 of the times in this race will fall between 45.8 and 59 sec. 45.8 48 50.2 52.4 54.6 56.8 59 At least 75% of the women’s 400-meter dash times will fall between 48 and 56.8 seconds.

Standard Deviation of Grouped Data Sample standard deviation = f is the frequency, n is total frequency, theorem for k = 3. At least 1-19 = 8/9 or the data values will fall within 3 standard deviation of the mean no matter what the distribution. At least 8/9 of the times in this race will fall between 45.8 and 59 sec. Apply Chebychev’s See example on pg 82

Estimates with Classes When a frequency distribution has classes, you can estimate the sample mean and standard deviation by using the midpoints of each class. Apply Chebychev’s theorem for k = 3. At least 1-19 = 8/9 or the data values will fall within 3 standard deviation of the mean no matter what the distribution. At least 8/9 of the times in this race will fall between 45.8 and 59 sec. x is the midpoint, f is the frequency, n is total frequency See example on pg 83

Section 2.5 Measures of Position

Quartiles Fractiles – numbers that divide an ordered data set into equal parts. Quartiles (Q1, Q2 and Q3 ) - divide the data set into 4 equal parts. Q2 is the same as the median. Q1 is the median of the data below Q2. Q3 is the median of the data above Q2.

Quartiles You are managing a store. The average sale for each of 27 randomly selected days in the last year is given. Find Q1, Q2, and Q3. 28 43 48 51 43 30 55 44 48 33 45 37 37 42 27 47 42 23 46 39 20 45 38 19 17 35 45

Finding Quartiles The data in ranked order (n = 27) are: 17 19 20 23 27 28 30 33 35 37 37 38 39 42 42 43 43 44 45 45 45 46 47 48 48 51 55. The median = Q2 = 42. There are 13 values above/below the median. Q1 is 30. Q3 is 45. Calculators and computers might use a slightly different method. Values may differ, but only in small amounts.

Interquartile Range (IQR) Interquartile Range – the difference between the third and first quartiles IQR = Q3 – Q1 The Interquartile Range is Q3 – Q1 = 45 – 30 = 15 Any data value that is more than 1.5 IQRs to the left of Q1 or to the right of Q3 is an outlier Calculators and computers might use a slightly different method. Values may differ, but only in small amounts.

Box and Whisker Plot A box and whisker plot uses 5 key values to describe a set of data. Q1, Q2 and Q3, the minimum value and the maximum value. Q1 Q2 = the median Q3 Minimum value Maximum value 30 42 45 17 55 42 45 30 17 The Interquartile range is the difference between the third quartile and the first. It gives the range of the middle 50% of the data. 55 15 25 35 45 55 Interquartile Range = 45 – 30 = 15

Percentiles Percentiles divide the data into 100 parts. There are 99 percentiles: P1, P2, P3…P99. P50 = Q2 = the median P25 = Q1 P75 = Q3 There are several different ways to calculate percentiles, so the example is only meant to serve as an approximation. Later in the course students will use cumulative areas to calculate probabilities. Percentiles are a good tie in. A 63rd percentile score indicates that score is greater than or equal to 63% of the scores and less than or equal to 37% of the scores.

Percentiles Cumulative distributions can be used to find percentiles. There are several different ways to calculate percentiles, so the example is only meant to serve as an approximation. Later in the course students will use cumulative areas to calculate probabilities. Percentiles are a good tie in. 114.5 falls on or above 25 of the 30 values. 25/30 = 83.33. So you can approximate 114 = P83.

Standard Scores Standard score or z-score - represents the number of standard deviations that a data value, x, falls from the mean. This person scores 1.29 standard deviations above the mean. this person scores .57 standard deviations below the mean. The score for this person is equal to the mean.

Standard Scores The test scores for a civil service exam have a mean of 152 and standard deviation of 7. Find the standard z-score for a person with a score of: (a) 161 (b) 148 (c) 152 This person scores 1.29 standard deviations above the mean. this person scores .57 standard deviations below the mean. The score for this person is equal to the mean.

Calculations of z-Scores A value of x = 161 is 1.29 standard deviations above the mean. (b) A value of x = 148 is 0.57 standard deviations below the mean. (c) A value of x = 152 is equal to the mean.

Standard Scores When a distribution is approximately bell shaped, about 95% of the data lie within 2 standard deviations of the mean. When this is transformed to z-scores, about 95% of the z-scores should fall between -2 and 2. A z-score outside of this range is considered unusual and a z-score less than -3 or greater than 3 would be very unusual. This person scores 1.29 standard deviations above the mean. this person scores .57 standard deviations below the mean. The score for this person is equal to the mean.