1 Descriptive Statistics: Numerical Methods Chapter 4.

Slides:



Advertisements
Similar presentations
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
Descriptive Statistics
Calculating & Reporting Healthcare Statistics
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Numerical Descriptive Techniques
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Slides by JOHN LOUCKS St. Edward’s University.
Basic Business Statistics 10th Edition
Chapter Two Descriptive Statistics McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Describing Data: Numerical
Chapter 2 Describing Data with Numerical Measurements
Numerical Descriptive Techniques
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
Review of Measures of Central Tendency, Dispersion & Association
1 Tendencia central y dispersión de una distribución.
Describing distributions with numbers
Economics 173 Business Statistics Lecture 2 Fall, 2001 Professor J. Petry
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Chapter 3 - Part B Descriptive Statistics: Numerical Methods
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Numerical Descriptive Techniques
Chapter 3 – Descriptive Statistics
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Business Statistics: Communicating with Numbers
4 - 1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Descriptive Statistics: Numerical Methods
Review of Measures of Central Tendency, Dispersion & Association
Chapter 2 Describing Data.
Describing distributions with numbers
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Variation This presentation should be read by students at home to be able to solve problems.
1 Economics 173 Business Statistics Lectures 1 & 2 Summer, 2001 Professor J. Petry.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Business Statistics, A First Course.
Business Statistics Spring 2005 Summarizing and Describing Numerical Data.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
Economics 173 Business Statistics Lectures 1 Fall, 2001 Professor J. Petry.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.
Statistical Methods © 2004 Prentice-Hall, Inc. Week 3-1 Week 3 Numerical Descriptive Measures Statistical Methods.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
Chapter 2 Describing Data: Numerical
Business and Economics 6th Edition
Numerical Descriptive Techniques
Chapter 4 Describing Data (Ⅱ ) Numerical Measures
Ch 4 實習.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
BUS173: Applied Statistics
Numerical Descriptive Statistics
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
St. Edward’s University
Business and Economics 7th Edition
Presentation transcript:

1 Descriptive Statistics: Numerical Methods Chapter 4

2 Introduction §In this chapter we use numerical measures to describe data sets, that represent populations or samples. §Usually, we focus our attention on two types of measures when describing population characteristics: · Measure of the central location. · Measure of dispersion.

3 §Why both the central location and the variability are used to describe a set of number? §Observe the following example. Introduction

4 §Think of a sample portfolio composed of three stocks. 100 shares ARR = 10% 200 shares ARR = 15% 100 shares ARR = 20% A central measure for this portfolio’s ARR for is 15%. §Now observe the following portfolio 100 shares ARR = 5% 100 shares ARR = 5% 200 shares ARR = 15% 100 shares ARR = 25% 100 shares ARR = 25% A central measure of this portfolio’s ARR for is 15% too.

5 Introduction §Considering the average ARR only the two portfolios are equal. But are they really? §Is the dispersion of ARR the same for the two portfolio? §The dispersion (variability) is an important property when describing a set of numbers, at least as important as the central location. §We’ll have more detailed discussions on these two important measures later.

6 4.1 Measures of Central Location With one data point clearly the central location is at the point itself. §The central data point reflects the locations of all the actual data points. §How? With two data points, the central location should fall in the middle between them (in order to reflect the location of both of them).

7 4.1 Measures of Central Location §The central data point reflects the locations of all the actual data points. §How? If the third data point appears in the center the measure of central location will remain in the center, but… (click) But if the third data point appears on the left hand-side of the midrange, it should “pull” the central location to the left.

8 As more and more data points are added, the central location moves (left and right) as required in order to reflect the effects of all the points. 4.1 Measures of Central Location

9 Sum of the measurements Number of measurements Mean = §This is the most popular and useful measure of central location The Arithmetic Mean

10 Sample meanPopulation mean Sample sizePopulation size The Arithmetic Mean

11 Find the mean rate of return for a portfolio equally invested in five stocks having the following annual rate of returns: 11.2%, 8.07%, 5.55%, 13.7%, 21%. Solution Example 1 The Arithmetic Mean

12 §The median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude. §When determining the median pay attention to the number of observations (k). · ‘k’ is odd Median = the number at the (k+1)/2 th location of the ordered array. · ‘k’ is Even Median = the average of the two numbers in the middle (The number at the (k/2) th and the [(k/2)+1)] th locations of the ordered array.) The Median

13 30,32,60,3126,26,28,29, Odd number of observations 26,26,28,29,30,32,60 Example 2 The salaries of seven employees were recorded (in 1000s): 28, 60, 26, 32, 30, 26, 29. Find the median salary. Suppose an additional salary of $31,000 is added to the group of salaries recorded before. Find the median salary. Even number of observations 29.5, The Median There are seven salaries (K = 7). The (k+1)/2 th salary of the ordered array is the number at the (7+1)/2 th = 4 th location. The median is 29. There are eight salaries (K = 8). The two salaries in the middle are 29 (in the (k/2) th =4 th location), and 30 (in the [(k/2)+1] th =5 th location. The median is the average number – 29.5.

14 §The Mode of a set of measurements is the value that occurs most frequently. §A Set of data may have one mode (or modal class), or two or more modes. The modal class For large data sets the modal class is much more relevant than a single-value mode. The Mode

15 § Example 3  The manager of a men’s clothing store observes the waist size (in inches) of trousers sold last week: 31, 34, 36, 33, 28, 34, 30, 34, 32, 40.  The mode of this data set is 34 in. This information seems to be valuable (for example, for the design of a new display in the store), much more than “ the median is 33.5 in.” The Mode

16 Relationship among Mean, Median, and Mode § If a distribution is symmetrical, the mean, median and mode coincide § If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mean Median Mode

17 §If a distribution is symmetrical, the mean, median and mode coincide §If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mean Median Mode Mean Median Mode A negatively skewed distribution (“skewed to the left”) Relationship among Mean, Median, and Mode

18 Using the Mean, Median, and Mode §When to use (not use) each measure of central location): The mean - is very sensitive to extreme values, thus, should not be used when a few extreme values residing away from most of the observations, are present. The mean is used in most statistical analyses. The median – is not effected by extreme values therefore, can be used in their presence. Yet, the medians does not reflect all the values included in the data set, but rather the location of the observation in the middle. The mode – should be used mainly for categorical data.

19 § Example 4 A professor of statistics wants to report the results of a midterm exam, taken by 100 students. The mean of the test marks is The median of the test marks is 81 The mode of the test marks is 84 Describe the information each one provides. The mean provides information about the over-all performance level of the class. It can serve as a tool for making comparisons with other classes and/or other exams. The Median indicates that half of the class received a grade below 81%, and half of the class received a grade above 81%. A student can use this statistic to place his/her mark relative to other students in the class. The mode must be used when data is qualitative. If marks are classified by letter grade, the frequency of each grade can be calculated. Then, the mode becomes a logical measure to compute. The mode must be used when data is qualitative. If marks are classified by letter grade, the frequency of each grade can be calculated. Then, the mode becomes a logical measure to compute. Summary Examples

20 Summary Examples § Example 5 · The following sample represents the lateness of arriving flights in a certain domestic flight airport (in minutes): 22, 12, 4, -3,… (the data is found in Lateness.xls)Lateness.xls (a)Find the mean, median, and mode of this sample. Are these data form a skewed distribution? negative, positive? (b)Which measure should not be used? Change the largest lateness to 34 minutes (rather than 67). Which central location measures are effected? (c) A person is waiting for the arrival of a certain flight. He is told the flight will probably be late not more than10 minutes. Should he believe this is a reliable estimate? Use the distribution of data requested in part (b).

21 §Example 5 - solution  We run the data on Excel using the ‘Descriptive Statistics’ tool. §The distribution of these data shows a positive skewness: §Do not use the mean, because an ‘outlier’ of 67 minutes lateness effects (increases) the mean value to be almost 11 minutes Summary Examples

22 §Example 5 - solution · When changing the largest observation from 67 to 34, the mean reduces to 9.80 minutes, but the median and mode do not change. It is reasonable to believe that the lateness will not exceed 10 minutes. From the Ogive we see that about 60 % of the flights arrive within 10 minutes of the scheduled arrival time. Summary Examples

23 Problems P4-1: Consider the following sample of measurements: 27, 32, 30, 28, 31, 32, 35, 28, 28, 29. Compute the mean, median, mode. Does it appear that the mode is a good measure of central location for this set of numbers? P4-2: The manager at a local supermarket (facing tough competition) tries to improve service to customers waiting to pay by adding a second cashier. The goal is to have customers wait at most 4.5 minutes before leaving the cashier area. From the data presented in P4-02.xls, was the manager successful in achieving this goal? Use Excel and numerical descriptive measures.P4-02.xls

Measures of Variability §Measures of central location fail to tell the whole story about the distribution. §A question of interest still remains unanswered: How much are the values of a given set spread out around the mean value?

25 Observe two hypothetical data sets: The mean provides a good representation of the values in the data set. Set 1: Small variability Why do we need measures of variability?

26 Why do we need measures of variability? Observe two hypothetical data sets: Set 1: Small variability Set 2: Larger variability The mean is the same as before but no longer represents the set values as good as before. The mean provides a good representation of the values in the data set.

27 · The range of a set of measurements is the difference between the largest and smallest measurements. · Its major advantage is the ease with which it can be computed. · Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points. ? ? ? But, how do all the measurements spread out? Smallest measurement Largest measurement The range cannot assist in answering this question Range The Range

28 · This measure reflects the dispersion of all the measurement values.  The variance of a population of N measurements x 1, x 2,…,x N having a mean  is defined as · The variance of a sample of n measurements x 1, x 2, …,x n having a mean is defined as The Variance

29 Consider two small populations: = = = = = = = = +6 Sum = 0 The mean of both populations is …but measurements in B are more dispersed then those in A. A measure of dispersion should agree with this observation. Can the sum of deviations from the mean be a good measure of dispersion? A B The Variance

30 The sum of deviations is zero for both populations, therefore, is not a good measure of dispersion, since clearly their dispersion is not equal. The Variance

31 Let us calculate the variance of the two populations Why is the variance defined as the average squared deviation? Why not use the sum of squared deviations as a measure of dispersion instead? After all, the sum of squared deviations increases in magnitude when the dispersion of a data set increases!! The Variance

32 Which data set has a larger dispersion? AB Data set B is more dispersed around the mean Let us calculate the sum of squared deviations for both data sets The Variance

AB Sum A = (1-2) 2 +…+(1-2) 2 +(3-2) 2 + … +(3-2) 2 = 10 Sum B = (1-3) 2 + (5-3) 2 = 8 Sum A > Sum B. This is inconsistent with the observation that set B is more dispersed. The Variance

AB However, when calculated on “per observation” basis (variance), the dispersions are properly ordered.  A 2 = Sum A /N = 10/10 = 1  B 2 = Sum B /N = 8/2 = 4 The Variance

35 § Example 6 · Find the variance of the following set of numbers, representing annual rates of returns for a group of mutual funds. Assume the set is (i) a sample, (ii) a population: -2, 4, 5, 6.9, 10 § Solution Assuming a sample The Variance

36 § Example 6 - solution continued Assuming a population The Variance

37 The standard deviation of a set of measurements is the square root of the set variance. Standard Deviation

38 · Example 7 The daily percentage of defective items in two weeks of production (10 working days) were calculated for two production lines? Which line provides good items more consistently? Line 1: 8.3, 6.2, 20.9, 2.7, 33.6, 42.9, 24.4, 5.2, 3.1, Line 2: 12.1, 2.8, 6.4, 12.2, 27.8, 25.3, 18.2, 10.7, 1.3, 11.4 Standard Deviation

39 Example 7, Solution Line 1 should be considered less consistent because the standard deviation of its defective proportion is larger (i.e. therefore the standard deviation of the good item proportion is also larger). Standard Deviation Let us use the Excel printout obtained from the “Descriptive Statistics” sub-menu.

40 Interpreting the Standard Deviation §The standard deviation can be used to · compare the variability of several distributions · make a statement about the general shape of a distribution. §When describing the shape of a distribution we refer to · A distribution with any shape · A mound shaped distribution

41 The Empirical Rule – Describing a Mound Shaped Data Set If a sample of measurements has a mound- shaped distribution, the interval…

42 § Example 10 Describe the set of data provided in Data 10 using numerical descriptive measures.Data 10 The Empirical Rule § Solution · From the histogram it appears that the distribution is approximately mound shaped. We ’ll use the empirical rule to describe the data.

43 §From the empirical rule we get: · Approximately 68% of the data lie between and [ (.556), (.556)] · Approximately 95% of the data lie between and [ (.556), (.556) ] · Approximately 99.7% of the data lie between and [ (.556), (.556) ] Example 10 – solution continued §Running the Descriptive statistics tool in Excel we have Mean = Standard deviation (sample) = The Empirical Rule – Interpreting the Standard Deviation Actual count: 26 (100%) Actual count: 25(96%) Actual count: 19 (73%)

44 §The proportion of observations in any sample that lie within k standard deviations of the mean is at least 1-1/z 2 for any z > 1. §This theorem is valid for any set of measurements (sample, population) of any shape!! KIntervalMinimum % 1at least 75% 2at least 89% 3at least 94% The Chebyshev Theorem - Describing Any Data Set (1-1/2 2 ) (1-1/3 2 ) (1-1/4 2 )

45 § Example 9 · Employee salaries were recorded and a histogram was created. Describe this data using the correct numerical measures. The Chebyshev Theorem § Solution · Creating the histogram we realize that the distribution is positively skewed. Chebychev Theorem needs to be used to describe the data.

46 § Example 9 – solution continued · From Excel we have: Mean = Standard deviation = · Applying Chebychev Theorem At least 75% of the salaries lie within [ (58.354), (58.354)] = [ , ] At least 88.9% of the salaries lie within [ (58.354), (58.354)] = [68.138, ] The Chebyshev Theorem Actual count 39 (97.5%) All (100%)

47 §The coefficient of variation of a set of measurements is the standard deviation divided by the mean value. §This coefficient provides a proportionate measure of variation. A standard deviation of 10 may be perceived large when the mean value is 100, but only moderately large when the mean value is 500 The Coefficient of Variation

Measures of Relative Location and Box Plots § Additional information on the general shape of a data set can be obtained by describing the relative location of 5 values within the data set. § We use percentiles to describe these 5 relative locations. What is a percentile?

49 Your score § Percentile · The p th percentile of a set of measurements is the value for which At most p% of the measurements are less than that value At most (100-p)% of all the measurements are greater than that value. § Example · Suppose your score is the 60 th percentile of a SAT test. Then 60% of all the scores lie here 40% 4.3 Measures of Relative Location and Box Plots

50 § Here are two possible approaches commonly used to describe a set of values. § The five number summary: · Smallest value · First quartile (Q 1 ) · Median (Q 2 ) · Third quartile (Q 3 ) · Largest value - OR - The first decile (the 10 th percentile) First quartile (Q1) Median (Q2) Third quartile (Q3) The ninth decile (90 th percentile) 4.3 Measures of Relative Location and Box Plots

51 · First (lower)decile= 10th percentile · First (lower) quartile, Q 1, = 25th percentile · Median,= 50th percentile · Third quartile, Q 3, = 75th percentile · Ninth (upper)decile= 90th percentile Lower decile A demostration of Commonly used percentiles 10% 90% lie here

52 § Commonly used percentiles: · First (lower)decile= 10th percentile · First (lower) quartile, Q 1, = 25th percentile · Median,= 50th percentile · Third quartile, Q 3, = 75th percentile · Ninth (upper)decile= 90th percentile Lower quartile A demostration of Commonly used percentiles - optional 10% 90% lie here 25%75% lie here Click

53 § Commonly used percentiles: · First (lower)decile= 10th percentile · First (lower) quartile, Q 1, = 25th percentile · Median,= 50th percentile · Third quartile, Q 3, = 75th percentile · Ninth (upper)decile= 90th percentile Middle decile -Median A demostration of Commonly used percentiles And so on… 25%75% lie here 50% lie here 50% lie here Click

54 §There are two general cases to consider: · The percentile is a member of the data set · The percentile is not a member of the data set; It falls in between two values of the data set. §Let us demonstrate the two cases with two examples. Determining Percentiles and their Location

55 § Example 11 Find the quartiles for the data set of flight lateness presented in example 4.5. Data: 8.3, 6.2, 20.9, 2.7, 33.6, 42.9, 24.4, 5.2, 3.1, Determining Percentiles and their Location

56 At most (.25)(10) = 2.5 measurements should appear below the first quartile. Check the smallest 2 measurements on the left hand side. At most (.25)(10) = 2.5 measurements should appear below the first quartile. Check the smallest 2 measurements on the left hand side. At most (.75)(10)=7.5 measurements should appear above the first quartile. Check the largest 7 measurements on the right hand side. At most (.75)(10)=7.5 measurements should appear above the first quartile. Check the largest 7 measurements on the right hand side. The first quartile 10 measurements §Example 11 - Solution Sort the measurements 2.7, 3.1, 5.2, 6.2, 8.3, 20.9, 24.4, 30.05, 33.6, 42.9 Determining Percentiles and their Location

57 §Example 11 – solution continued · The second quartile (Median): At most (.5)(10) = 5 numbers lie below and above Q 2 2.7, 3.1, 5.2, 6.2, 8.3, 20.9, 24.4, 30.05, 33.6, 42.9 Q2Q2 Q 2 = ( )/2 = 14.6 Determining Percentiles and their Location

58 §Example 11 – solution continued · The third quartile At most (.75)10 = 7.5 numbers lie below Q 3 At most (.25)10 = 2.5 numbers lie above Q 3 2.7, 3.1, 5.2, 6.2, 8.3, 20.9, 24.4, 30.05, 33.6, 42.9 Q3Q3 Determining Percentiles and their Location

59 § Example 12 Find the 20 th percentile for the data set of flight lateness presented in example 11. § Solution · Following the procedure applied to the previous example, At most (.20)10 = 2 numbers should fall below the 20 th percentile. At least (.80)10 = 8 numbers should fall above the 20 th percentile. The sorted data set is: 2.7, 3.1, 5.2, 6.2, 8.3, 20.9, 24.4, 30.05, 33.6, 42. From the sorted data set we see that every number greater than 3.1 and smaller than 5.2 meets these two conditions. We show next how to determine the location and value of a percentile whose value is not one of the data set points. Determining Percentiles and their Location

60 §Find the location of any percentile using the formula Determining Percentiles and their Location

61 §Example 12-solution continued · Finding the location of the 20 th percentile: · 2.7, 3.1, 5.2, 6.2, 8.3, 20.9, 24.4, 30.05, 33.6, 42.9 · Finding the value of the 20 th percentile. The 20 th percentile is located at location 2.75, that is, at.75 the distance from 3.1 to 5.2. Therefore, P 20 = (5.2 – 3.1) = Determining Percentiles and their Location

62 Quartiles and Variability §Quartiles can provide an idea about the shape of a histogram Q 1 Q 2 Q 3 Positively skewed histogram Q 1 Q 2 Q 3 Negatively skewed histogram

63 §This is a measure of the spread of the middle 50% of the observations §Large value indicates a large spread of the observations Interquartile range = Q 3 – Q 1 Inter-quartile Range

64 1.5(Q 3 – Q 1 ) · A box plot is a pictorial display that provides the main descriptive measures of the measurement set: L - the largest measurement Q 3 - The upper quartile Q 2 - The median Q 1 - The lower quartile S - The smallest measurement SQ1Q1 Q2Q2 Q3Q3 L Whisker Box Plot An outlier is defined as any value that is more than 1.5(Q 3 – Q 1 ) away from the box.

65 · Example 13 Create a box plot for the data regarding the GMAT scores of 200 applicants (see Data13.xls)Data13.xls Box Plot (IQR) (IQR)

66 · Interpreting the box plot results The scores range from 449 to 788. About half the scores are smaller than 537, and about half are larger than 537. About half the scores lie between 512 and 575. About a quarter lies below 512 and a quarter above 575. Q Q Q %50%25% Box Plot Example 13 - continued

67 50% 25% The data set is positively skewed Q Q Q %50%25% Box Plot Example 13 - continued

Measures of Linear Relationship §The covariance and the coefficient of correlation are used to measure the direction and strength of the linear relationship between two variables. · The Covariance answers the question: Is there any pattern to the way two variables move together? · The C orrelation Coefficient answers the question: How strong is the linear relationship between two variables.

69  x (  y ) is the population mean of the variable X (Y). N is the population size. Covariance x (y) is the population mean of the variable X (Y). n is the sample size.

70 §If the two variables move the same direction, (both increase or both decrease), the covariance is a large positive number. Covariance X Y

71 §If the two variables move in two opposite directions, (one increases when the other one decreases), the covariance is a large negative number. Covariance X Y

72 §If the two variables are unrelated, the covariance will be close to zero. Covariance X Y

73 The coefficient of correlation The coefficient of correlation measures the strength of the linear relationship between two variables.

74 §If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship). The coefficient of correlation

75 The coefficient of correlation §If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship).

76 §A weak linear relationship is indicated by a coefficient close to zero. §Also, a non-linear relationship translates to a weak linear relationship The coefficient of correlation

77 § Example 14 · Compute the covariance and the coefficient of correlation to measure how are car speed (mile per hour) and gas consumption (miles per gallon) related to one another (see data next). § Solution · We believe speed affects gas consumption. Thus Speed is labeled X Miles per gallon is labeled Y The coefficient of correlation and the covariance

78 Car x y x 2 y 2 xy The coefficient of correlation and the covariance Example 14 – solution continued

79 Car x y x 2 y 2 xy The coefficient of correlation and the covariance Example 14 – solution continued

80 The coefficient of correlation and the covariance Example 14 – solution continued Interpretation: Speed and mileage per gallon are strongly positively linearly related for the speed range of 15 to 50 miles per hour.