Numerical Measures.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

DESCRIBING DISTRIBUTION NUMERICALLY
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Descriptive Measures MARE 250 Dr. Jason Turner.
Class Session #2 Numerically Summarizing Data
Descriptive Statistics
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
QBM117 Business Statistics
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements
Numerical Descriptive Measures
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
© 2008 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
Some definitions In Statistics. A sample: Is a subset of the population.
Measures of Variability Variability. Measure of Variability (Dispersion, Spread) Variance, standard deviation Range Inter-Quartile Range Pseudo-standard.
Review Measures of central tendency
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Percentiles and Box – and – Whisker Plots Measures of central tendency show us the spread of data. Mean and standard deviation are useful with every day.
Chapter 2 Describing Data.
6-1 Numerical Summaries Definition: Sample Mean.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Lecture 3 Describing Data Using Numerical Measures.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Measure of Variability (Dispersion, Spread) 1.Range 2.Inter-Quartile Range 3.Variance, standard deviation 4.Pseudo-standard deviation.
Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape.
Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape.
1 Chapter 4 Numerical Methods for Describing Data.
Unit 3: Averages and Variations Week 6 Ms. Sanchez.
Summary Statistics: Measures of Location and Dispersion.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Numerical descriptions of distributions
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Exploratory Data Analysis
Exploratory Data Analysis
Methods for Describing Sets of Data
Numerical descriptions of distributions
Chapter 3 Describing Data Using Numerical Measures
Chapter 6 ENGR 201: Statistics for Engineers
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Summary Statistics 9/23/2018 Summary Statistics
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Chapter 2b.
Descriptive Statistics
Box and Whisker Plots Algebra 2.
Percentiles and Box-and- Whisker Plots
2.6: Boxplots CHS Statistics
Numerical Measures: Skewness and Location
Lecture 2 Chapter 3. Displaying and Summarizing Quantitative Data
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Describing Distributions Numerically
1.2 Describing Distributions with Numbers
Numerical Descriptive Statistics
Chapter 1: Exploring Data
Describing Distributions Numerically
Honors Statistics Review Chapters 4 - 5
Measures of Variability
NUMERICAL DESCRIPTIVE MEASURES
Presentation transcript:

Numerical Measures

Numerical Measures Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape

Measures of Central Tendency (Location) Mean Median Mode Central Location

Measures of Non-central Location Quartiles, Mid-Hinges Percentiles

Measure of Variability (Dispersion, Spread) Variance, standard deviation Range Inter-Quartile Range Variability

Measures of Shape Skewness Kurtosis

Summation Notation

Summation Notation Let x1, x2, x3, … xn denote a set of n numbers. Then the symbol denotes the sum of these n numbers x1 + x2 + x3 + …+ xn

Example Let x1, x2, x3, x4, x5 denote a set of 5 denote the set of numbers in the following table. i 1 2 3 4 5 xi 10 15 21 7 13

Then the symbol denotes the sum of these 5 numbers x1 + x2 + x3 + x4 + x5 = 10 + 15 + 21 + 7 + 13 = 66

Meaning of parts of summation notation Final value for i each term of the sum Quantity changing in each term of the sum Starting value for i

Example Again let x1, x2, x3, x4, x5 denote a set of 5 denote the set of numbers in the following table. i 1 2 3 4 5 xi 10 15 21 7 13

Then the symbol denotes the sum of these 3 numbers = 153 + 213 + 73 = 3375 + 9261 + 343 = 12979

Measures of Central Location (Mean)

Mean Let x1, x2, x3, … xn denote a set of n numbers. Then the mean of the n numbers is defined as:

Example Again let x1, x2, x3, x4, x5 denote a set of 5 denote the set of numbers in the following table. i 1 2 3 4 5 xi 10 15 21 7 13

Then the mean of the 5 numbers is:

Interpretation of the Mean Let x1, x2, x3, … xn denote a set of n numbers. Then the mean, , is the centre of gravity of those the n numbers. That is if we drew a horizontal line and placed a weight of one at each value of xi , then the balancing point of that system of mass is at the point .

xn x1 x3 x4 x2

In the Example 21 7 10 13 15 20 10

The mean, , is also approximately the center of gravity of a histogram

Measures of Central Location (Median)

The Median Let x1, x2, x3, … xn denote a set of n numbers. Then the median of the n numbers is defined as the number that splits the numbers into two equal parts. To evaluate the median we arrange the numbers in increasing order.

If the number of observations is odd there will be one observation in the middle. This number is the median. If the number of observations is even there will be two middle observations. The median is the average of these two observations

Example Again let x1, x2, x3, x3 , x4, x5 denote a set of 5 denote the set of numbers in the following table. i 1 2 3 4 5 xi 10 15 21 7 13

The numbers arranged in order are: 7 10 13 15 21 Unique “Middle” observation – the median

Example 2 Let x1, x2, x3 , x4, x5 , x6 denote the 6 denote numbers: 23 41 12 19 64 8 Arranged in increasing order these observations would be: 8 12 19 23 41 64 Two “Middle” observations

Median = average of two “middle” observations =

Example The data on N = 23 students Variables Verbal IQ Math IQ Initial Reading Achievement Score Final Reading Achievement Score

The following table gives data on Verbal IQ, Math IQ,   Data Set #3 The following table gives data on Verbal IQ, Math IQ, Initial Reading Acheivement Score, and Final Reading Acheivement Score for 23 students who have recently completed a reading improvement program Initial Final Verbal Math Reading Reading Student IQ IQ Acheivement Acheivement 1 86 94 1.1 1.7 2 104 103 1.5 1.7 3 86 92 1.5 1.9 4 105 100 2.0 2.0 5 118 115 1.9 3.5 6 96 102 1.4 2.4 7 90 87 1.5 1.8 8 95 100 1.4 2.0 9 105 96 1.7 1.7 10 84 80 1.6 1.7 11 94 87 1.6 1.7 12 119 116 1.7 3.1 13 82 91 1.2 1.8 14 80 93 1.0 1.7 15 109 124 1.8 2.5 16 111 119 1.4 3.0 17 89 94 1.6 1.8 18 99 117 1.6 2.6 19 94 93 1.4 1.4 20 99 110 1.4 2.0 21 95 97 1.5 1.3 22 102 104 1.7 3.1 23 102 93 1.6 1.9

Computing the Median Stem leaf Diagrams Median = middle observation =12th observation

Summary

Some Comments The mean is the centre of gravity of a set of observations. The balancing point. The median splits the obsevations equally in two parts of approximately 50%

The median splits the area under a histogram in two parts of 50% The mean is the balancing point of a histogram 50% 50% median

For symmetric distributions the mean and the median will be approximately the same value 50% 50% Median &

For Positively skewed distributions the mean exceeds the median For Negatively skewed distributions the median exceeds the mean 50% 50% median

An outlier is a “wild” observation in the data Outliers occur because of errors (typographical and computational) Extreme cases in the population

The mean is altered to a significant degree by the presence of outliers Outliers have little effect on the value of the median This is a reason for using the median in place of the mean as a measure of central location Alternatively the mean is the best measure of central location when the data is Normally distributed (Bell-shaped)

Review

Summarizing Data Graphical Methods

Histogram Grouped Freq Table Stem-Leaf Diagram

Numerical Measures Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape The objective is to reduce the data to a small number of values that completely describe the data and certain aspects of the data.

Measures of Central Location (Mean)

Mean Let x1, x2, x3, … xn denote a set of n numbers. Then the mean of the n numbers is defined as:

Interpretation of the Mean Let x1, x2, x3, … xn denote a set of n numbers. Then the mean, , is the centre of gravity of those the n numbers. That is if we drew a horizontal line and placed a weight of one at each value of xi , then the balancing point of that system of mass is at the point .

xn x1 x3 x4 x2

The mean, , is also approximately the center of gravity of a histogram

The Median Let x1, x2, x3, … xn denote a set of n numbers. Then the median of the n numbers is defined as the number that splits the numbers into two equal parts. To evaluate the median we arrange the numbers in increasing order.

If the number of observations is odd there will be one observation in the middle. This number is the median. If the number of observations is even there will be two middle observations. The median is the average of these two observations

Measures of Non-Central Location Percentiles Quartiles (Hinges, Mid-hinges)

Definition The P×100 Percentile is a point , xP , underneath a distribution that has a fixed proportion P of the population (or sample) below that value P×100 % xP

Definition (Quartiles) The first Quartile , Q1 ,is the 25 Percentile , x0.25 25 % x0.25

The second Quartile , Q2 ,is the 50th Percentile , x0.50 50 % x0.50

The second Quartile , Q2 , is also the median and the 50th percentile

The third Quartile , Q3 ,is the 75th Percentile , x0.75 75 % x0.75

divide the population into 4 equal parts of 25%. The Quartiles – Q1, Q2, Q3 divide the population into 4 equal parts of 25%. 25 % 25 % 25 % 25 % Q1 Q2 Q3

Computing Percentiles and Quartiles There are several methods used to compute percentiles and quartiles. Different computer packages will use different methods Sometimes for small samples these methods will agree (but not always) For large samples the methods will agree within a certain level of accuracy

Computing Percentiles and Quartiles – Method 1 The first step is to order the observations in increasing order. We then compute the position, k, of the P×100 Percentile. k = P × (n+1) Where n = the number of observations

Example The data on n = 23 students Variables Verbal IQ Math IQ Initial Reading Achievement Score Final Reading Achievement Score We want to compute the 75th percentile and the 90th percentile

The position, k, of the 75th Percentile. k = P × (n+1) = .75 × (23+1) = 18 The position, k, of the 90th Percentile. k = P × (n+1) = .90 × (23+1) = 21.6 When the position k is an integer the percentile is the kth observation (in order of magnitude) in the data set. For example the 75th percentile is the 18th (in size) observation

When the position k is an not an integer but an integer(m) + a fraction(f). i.e. k = m + f then the percentile is xP = (1-f) × (mth observation in size) + f × (m+1st observation in size) In the example the position of the 90th percentile is: k = 21.6 Then x.90 = 0.4(21st observation in size) + 0.6(22nd observation in size)

xP = (1-f) × (mth observation in size) When the position k is an not an integer but an integer(m) + a fraction(f). i.e. k = m + f then the percentile is xP = (1-f) × (mth observation in size) + f × (m+1st observation in size) mth obs (m+1)st obs xp = (1- f) ( mth obs) + f [(m+1)st obs]

When the position k is an not an integer but an integer(m) + a fraction(f). i.e. k = m + f mth obs (m+1)st obs xp = (1- f) ( mth obs) + f [(m+1)st obs] Thus the position of xp is 100f% through the interval between the mth observation and the (m +1)st observation

Example The data Verbal IQ on n = 23 students arranged in increasing order is: 80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119

x0.75 = 75th percentile = 18th observation in size =105 (position k = 18) x0.90 = 90th percentile = 0.4(21st observation in size) + 0.6(22nd observation in size) = 0.4(111)+ 0.6(118) = 115.2 (position k = 21.6)

An Alternative method for computing Quartiles – Method 2 Sometimes this method will result in the same values for the quartiles. Sometimes this method will result in the different values for the quartiles. For large samples the two methods will result in approximately the same answer.

Let x1, x2, x3, … xn denote a set of n numbers. The first step in Method 2 is to arrange the numbers in increasing order. From the arranged numbers we compute the median. This is also called the Hinge

Example Consider the 5 numbers: 10 15 21 7 13 Arranged in increasing order: 7 10 13 15 21 The median (or Hinge) splits the observations in half Median (Hinge)

The lower mid-hinge (the first quartile) is the “median” of the lower half of the observations (excluding the median). The upper mid-hinge (the third quartile) is the “median” of the upper half of the observations (excluding the median).

Consider the five number in increasing order: 7 10 13 15 21 Lower Half Upper Half Median (Hinge) 13 Upper Mid-Hinge (First Quartile) (7+10)/2 =8.5 Upper Mid-Hinge (Third Quartile) (15+21)/2 = 18

Computing the median and the quartile using the first method: Position of the median: k = 0.5(5+1) = 3 Position of the first Quartile: k = 0.25(5+1) = 1.5 Position of the third Quartile: k = 0.75(5+1) = 4.5 7 10 13 15 21 Q3 = 18 Q1 = 8. 5 Q2 = 13

Both methods result in the same value This is not always true.

The data Verbal IQ on n = 23 students arranged in increasing order is: Example The data Verbal IQ on n = 23 students arranged in increasing order is: 80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119 Upper Mid-Hinge (Third Quartile) 105 Lower Mid-Hinge (First Quartile) 89 Median (Hinge) 96

Computing the median and the quartile using the first method: Position of the median: k = 0.5(23+1) = 12 Position of the first Quartile: k = 0.25(23+1) = 6 Position of the third Quartile: k = 0.75(23+1) = 18 80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119 Q3 = 105 Q1 = 89 Q2 = 96

Many programs compute percentiles, quartiles etc. Each may use different methods. It is important to know which method is being used. The different methods result in answers that are close when the sample size is large.

Box-Plots Box-Whisker Plots A graphical method of displaying data An alternative to the histogram and stem-leaf diagram

To Draw a Box Plot Compute the Hinge (Median, Q2) and the Mid-hinges (first & third quartiles – Q1 and Q3 ) We also compute the largest and smallest of the observations – the max and the min The five number summary min, Q1, Q2, Q3, max

The data Verbal IQ on n = 23 students arranged in increasing order is: Example The data Verbal IQ on n = 23 students arranged in increasing order is: 80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119 Q3 = 105 min = 80 Q1 = 89 Q2 = 96 max = 119

The Box Plot is then drawn Drawing above an axis a “box” from Q1 to Q3. Drawing vertical line in the box at the median, Q2 Drawing whiskers at the lower and upper ends of the box going down to the min and up to max.

Upper Whisker Lower Whisker Box Q3 min Q1 Q2 max

The data Verbal IQ on n = 23 students arranged in increasing order is: Example The data Verbal IQ on n = 23 students arranged in increasing order is: min = 80 Q1 = 89 Q2 = 96 Q3 = 105 max = 119 This is sometimes called the five-number summary

Box Plot of Verbal IQ 70 80 90 100 110 120 130

Box Plot can also be drawn vertically 70 80 90 100 110 120 130 Box Plot can also be drawn vertically

Box-Whisker plots (Verbal IQ, Math IQ)

Box-Whisker plots (Initial RA, Final RA )

Summary Information contained in the box plot 25% 25% 25% 25% Middle 50% of population

Advance Box Plots

An outlier is a “wild” observation in the data Outliers occur because of errors (typographical and computational) Extreme cases in the population We will now consider the drawing of box-plots where outliers are identified

To Draw a Box Plot we need to: Compute the Hinge (Median, Q2) and the Mid-hinges (first & third quartiles – Q1 and Q3 ) The difference Q3– Q1 is called the inter-quartile range (denoted by IQR) To identify outliers we will compute the inner and outer fences

The fences are like the fences at a prison The fences are like the fences at a prison. We expect the entire population to be within both sets of fences. If a member of the population is between the inner and outer fences it is a mild outlier. If a member of the population is outside of the outer fences it is an extreme outlier.

Inner fences

f1 = Q1 - (1.5)IQR f2 = Q3 + (1.5)IQR Lower inner fence Upper inner fence f2 = Q3 + (1.5)IQR

Outer fences

Lower outer fence F1 = Q1 - (3)IQR Upper outer fence F2 = Q3 + (3)IQR

Observations that are between the lower and upper inner fences are considered to be non-outliers. Observations that are outside the inner fences but not outside the outer fences are considered to be mild outliers. Observations that are outside outer fences are considered to be extreme outliers.

mild outliers are plotted individually in a box-plot using the symbol extreme outliers are plotted individually in a box-plot using the symbol non-outliers are represented with the box and whiskers with Max = largest observation within the fences Min = smallest observation within the fences

Extreme outlier Box-Whisker plot representing the data that are not outliers Mild outliers Inner fences Outer fence

Example Data collected on n = 109 countries in 1995. Data collected on k = 25 variables.

The variables Population Size (in 1000s) Density = Number of people/Sq kilometer Urban = percentage of population living in cities Religion lifeexpf = Average female life expectancy lifeexpm = Average male life expectancy

literacy = % of population who read pop_inc = % increase in popn size (1995) babymort = Infant motality (deaths per 1000) gdp_cap = Gross domestic product/capita Region = Region or economic group calories = Daily calorie intake. aids = Number of aids cases birth_rt = Birth rate per 1000 people

death_rt = death rate per 1000 people aids_rt = Number of aids cases/100000 people log_gdp = log10(gdp_cap) log_aidsr = log10(aids_rt) b_to_d =birth to death ratio fertility = average number of children in family log_pop = log10(population)

cropgrow = ?? lit_male = % of males who can read lit_fema = % of females who can read Climate = predominant climate

The data file as it appears in SPSS

Consider the data on infant mortality Stem-Leaf diagram stem = 10s, leaf = unit digit

Summary Statistics median = Q2 = 27 Quartiles Lower quartile = Q1 = the median of lower half Upper quartile = Q3 = the median of upper half Interquartile range (IQR) IQR = Q1 - Q3 = 66.5 – 12 = 54.5

The Outer Fences The Inner Fences lower = Q1 - 3(IQR) = 12 – 3(54.5) = - 151.5 upper = Q3 = 3(IQR) = 66.5 + 3(54.5) = 230.0 No observations are outside of the outer fences The Inner Fences lower = Q1 – 1.5(IQR) = 12 – 1.5(54.5) = - 69.75 upper = Q3 = 1.5(IQR) = 66.5 + 1.5(54.5) = 148.25 Only one observation (168 – Afghanistan) is outside of the inner fences – (mild outlier)

Box-Whisker Plot of Infant Mortality

Example 2 In this example we are looking at the weight gains (grams) for rats under six diets differing in level of protein (High or Low) and source of protein (Beef, Cereal, or Pork). Ten test animals for each diet

Gains in weight (grams) for rats under six diets Table Gains in weight (grams) for rats under six diets differing in level of protein (High or Low) and source of protein (Beef, Cereal, or Pork) Level  High Protein Low protein Source  Beef  Cereal  Pork Beef Cereal Pork Diet 1 2 3 4 5 6   73 98 94 90 107 49 102 74 79 76 95 82 118 56 96 97 104 111 64 80 86 81 88 51 100 108 72 106 87 77 91 67 70 117 120 89 61 92 105 78 58 Median 103.0 87.0 100.0 82.0 84.5 81.5 Mean 85.9 99.5 79.2 83.9 78.7 IQR 24.0 18.0 11.0 23.0 16.0 PSD 17.78 13.33 8.15 17.04 11.05 Variance 229.11 225.66 119.17 192.84 246.77 273.79 Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55  

High Protein Low Protein Beef Cereal Pork Cereal Pork Beef

Conclusions Weight gain is higher for the high protein meat diets Increasing the level of protein - increases weight gain but only if source of protein is a meat source

Next topic: Numerical Measures of Variability