Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3 Descriptive Statistics: Graphical and Numerical Summaries of Data UNIT OBJECTIVES At the conclusion of this unit you should be able to: n 1)Construct.

Similar presentations


Presentation on theme: "Chapter 3 Descriptive Statistics: Graphical and Numerical Summaries of Data UNIT OBJECTIVES At the conclusion of this unit you should be able to: n 1)Construct."— Presentation transcript:

1

2 Chapter 3 Descriptive Statistics: Graphical and Numerical Summaries of Data UNIT OBJECTIVES At the conclusion of this unit you should be able to: n 1)Construct graphs that appropriately describe data n 2)Calculate and interpret numerical summaries of a data set. n 3)Combine numerical methods with graphical methods to analyze a data set. n 4)Apply graphical methods of summarizing data to choose appropriate numerical summaries. n 5)Apply software and/or calculators to automate graphical and numerical summary procedures.

3 Section 3.1 Displaying Categorical Data “Sometimes you can see a lot just by looking.” Yogi Berra Hall of Fame Catcher, NY Yankees

4 The three rules of data analysis won’t be difficult to remember n 1.Make a picture —reveals aspects not obvious in the raw data; enables you to think clearly about the patterns and relationships that may be hiding in your data. n 2.Make a picture —to show important features of and patterns in the data. You may also see things that you did not expect: the extraordinary (possibly wrong) data values or unexpected patterns n 3.Make a picture —the best way to tell others about your data is with a well-chosen picture.

5 Bar Charts: show counts or relative frequency for each category n Example: Titanic passenger/crew distribution

6 Pie Charts: shows proportions of the whole in each category n Example: Titanic passenger/crew distribution

7 Example: Top 10 causes of death in the United States RankCauses of deathCounts % of top 10s % of total deaths 1Heart disease700,14237%28% 2Cancer553,76829%22% 3Cerebrovascular163,5389%6% 4Chronic respiratory123,0136%5% 5Accidents101,5375%4% 6Diabetes mellitus71,3724%3% 7Flu and pneumonia62,0343%2% 8Alzheimer’s disease53,8523%2% 9Kidney disorders39,4802% 10Septicemia32,2382%1% All other causes629,96725% For each individual who died in the United States, we record what was the cause of death. The table above is a summary of that information.

8 Top 10 causes of deaths in the United States Top 10 causes of death: bar graph Each category is represented by one bar. The bar’s height shows the count (or sometimes the percentage) for that particular category. The number of individuals who died of an accident in is approximately 100,000.

9 Bar graph sorted by rank  Easy to analyze Top 10 causes of deaths in the United States Sorted alphabetically  Much less useful

10 1. United States $158 2. China $64.4 3. Japan $54 4. Germany $24.4 5. Britain $23.5 6. France $19.3 7. Brazil $14.2 8. Italy $13.1 9. Australia $12.8 10. India $11.9 1. United States $137.9 2. Japan $23.4 3. Germany $20 4. Britain $16.8 5. France $12.6 6. Canada $7.3 7. Italy $6.3 8. China $5.4 9. Netherlands $5.4 10. Australia $4.8 Recent Annual Software Sales ($billions) Recent Annual Computer Hardware Sales ($billion) NY Times

11 Percent of people dying from top 10 causes of death in the United States Top 10 causes of death: pie chart Each slice represents a piece of one whole. The size of a slice depends on what percent of the whole this category represents.

12 Percent of deaths from top 10 causes Percent of deaths from all causes Make sure your labels match the data. Make sure all percents add up to 100.

13 Internships Basic bar chartSide-by-side bar chart

14 Trend, Student Debt by State (grads of public, 4 yr or more) National Average: 2009-10: $21,604 2012-13: $25,043

15 Student Debt North Carolina Schools

16

17 Unnecessary dimension in a pie chart 3 rd dimension is unnecessary; the 3D pie chart does not convey any more information than a 2D pie chart

18 Section 3.1 continued Displaying Quantitative Data Histograms Stem and Leaf Displays

19 Frequency Histograms

20 Relative Frequency Histogram of Exam Grades 0.05.10.15.20.25.30 405060708090 Grade Relative frequency 100

21 Histograms A histogram shows three general types of information: n It provides visual indication of where the approximate center of the data is. n We can gain an understanding of the degree of spread, or variation, in the data. n We can observe the shape of the distribution.

22 Histograms Showing Different Centers

23 Histograms - Same Center, Different Spread

24 Histograms: Shape n A distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other. Symmetric distribution Complex, multimodal distribution  Not all distributions have a simple overall shape, especially when there are few observations. Skewed distribution  A distribution is skewed to the right if the right side of the histogram (side with larger values) extends much farther out than the left side. It is skewed to the left if the left side of the histogram extends much farther out than the right side.

25 Shape (cont.)Female heart attack patients in New York state Age: left-skewedCost: right-skewed

26 Shape (cont.): outliers All 200 m Races, 20.2 secs or less

27 AlaskaFlorida Shape (cont.): Outliers An important kind of deviation is an outlier. Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. The overall pattern is fairly symmetrical except for 2 states clearly not belonging to the main trend. Alaska and Florida have unusual representation of the elderly in their population. A large gap in the distribution is typically a sign of an outlier.

28 Excel Example: 2012-13 NFL Salaries

29 Statcrunch Example: 2012-13 NFL Salaries

30 Heights of Students in Recent Stats Class (Bimodal)

31 Example: Grades on a statistics exam Data: 75 66 77 66 64 73 91 65 59 86 61 86 61 58 70 77 80 58 94 78 62 79 83 54 52 45 82 48 67 55

32 Example-2 Frequency Distribution of Grades Class Limits Frequency 40 up to 50 50 up to 60 60 up to 70 70 up to 80 80 up to 90 90 up to 100 Total 2 6 8 7 5 2 30

33 Example-3: Relative Frequency Distribution of Grades Class Limits Relative Frequency 40 up to 50 50 up to 60 60 up to 70 70 up to 80 80 up to 90 90 up to 100 2/30 =.067 6/30 =.200 8/30 =.267 7/30 =.233 5/30 =.167 2/30 =.067

34 Relative Frequency Histogram of Grades 0.05.10.15.20.25.30 405060708090 Grade Relative frequency 100

35 Based on the histo- gram, about what percent of the values are between 47.5 and 52.5? 1. 50% 2. 5% 3. 17% 4. 30%

36 Stem and leaf displays n Have the following general appearance stemleaf 18 9 21 2 8 9 9 32 3 8 9 40 1 56 7 64

37 Example: employee ages at a small company 18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39; stem: 10’s digit; leaf: 1’s digit n 18: stem=1; leaf=8; 18 = 1 | 8 stemleaf 18 9 21 2 8 9 9 32 3 8 9 40 1 56 7 64

38 Suppose a 95 yr. old is hired stemleaf 18 9 21 2 8 9 9 32 3 8 9 40 1 56 7 64 7 8 95

39 Number of TD passes by NFL teams: 2012-2013 season ( stems are 10’s digit) stemleaf 4343 03 247 26677789 201222233444 113467889 08

40 Pulse Rates n = 138

41 Advantages/Disadvantages of Stem-and-Leaf Displays n Advantages 1) each measurement displayed 2) ascending order in each stem row 3) relatively simple (data set not too large) n Disadvantages display becomes unwieldy for large data sets

42 Population of 185 US cities with between 100,000 and 500,000 n Multiply stems by 100,000

43 Back-to-back stem-and-leaf displays. TD passes by NFL teams: 1999-2000, 2012-13 multiply stems by 10 1999-20002012-13 2403 637 2324 665526677789 43322221100201222233444 9998887666167889 4211134 08

44 Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic. How many pulses are between 67 and 77? Stems are 10’s digits 1. 4 2. 6 3. 8 4. 10 5. 12

45 Other Graphical Methods for Data n Time plots plot observations in time order; time on horizontal axis, variable on vertical axis ** Time series measurements are taken at regular intervals (monthly unemployment, quarterly GDP, weather records, electricity demand, etc.)  Heat maps, word walls

46 Unemployment Rate, by Educational Attainment

47 Water Use During Super Bowl XLV (Packers 31, Steelers 25)

48 Heat Maps

49 Word Wall (customer feedback)

50 Section 3.2 Describing the Center of Data Mean Median

51 2 characteristics of a data set to measure n center measures where the “middle” of the data is located n variability (next section) measures how “spread out” the data is

52 Notation for Data Values and Sample Mean

53 Simple Example of Sample Mean n Weekly TV viewing time in hours of 7 randomly selected 4 th graders: 19, 40, 16, 12, 10, 6, and 9

54 Population Mean

55 Connection Between Mean and Histogram n A histogram balances when supported at the mean. Mean x = 140.6

56 The median: another measure of center Given a set of n data values arranged in order of magnitude, Median=middle valuen odd mean of 2 middle values,n even n Ex. 2, 4, 6, 8, 10; n=5; median=6 n Ex. 2, 4, 6, 8; n=4; median=(4+6)/2=5

57 Student Pulse Rates (n=62) 38, 59, 60, 60, 62, 62, 63, 63, 64, 64, 65, 67, 68, 70, 70, 70, 70, 70, 70, 70, 71, 71, 72, 72, 73, 74, 74, 75, 75, 75, 75, 76, 77, 77, 77, 77, 78, 78, 79, 79, 80, 80, 80, 84, 84, 85, 85, 87, 90, 90, 91, 92, 93, 94, 94, 95, 96, 96, 96, 98, 98, 103 Median = (75+76)/2 = 75.5

58 The median splits the histogram into 2 halves of equal area

59 Mean: balance point Median: 50% area each half mean 55.26 years, median 57.7years

60 Medians are used often n Year 2011 baseball salaries Median $1,450,000 (max=$32,000,000 Alex Rodriguez; min=$414,000) n Median fan age: MLB 45; NFL 43; NBA 41; NHL 39 n Median existing home sales price: May 2011 $166,500; May 2010 $174,600 n Median household income (2008 dollars) 2009 $50,221; 2008 $52,029

61 Examples n Example: n = 7 17.5 2.8 3.2 13.9 14.1 25.3 45.8 n Example n = 7 (ordered): n 2.8 3.2 13.9 14.1 17.5 25.3 45.8 n Example: n = 8 17.5 2.8 3.2 13.9 14.1 25.3 35.7 45.8 n Example n =8 (ordered) 2.8 3.2 13.9 14.1 17.5 25.3 35.7 45.8 m = 14.1 m = (14.1+17.5)/2 = 15.8

62 Below are the annual tuition charges at 7 public universities. What is the median tuition? 4429 4960 4971 5245 5546 7586 1. 5245 2. 4965.5 3. 4960 4. 4971

63 Below are the annual tuition charges at 7 public universities. What is the median tuition? 4429 4960 5245 5546 4971 5587 7586 1. 5245 2. 4965.5 3. 5546 4. 4971

64 Properties of Mean, Median 1.The mean and median are unique; that is, a data set has only 1 mean and 1 median (the mean and median are not necessarily equal). 2.The mean uses the value of every number in the data set; the median does not.

65 Example: class pulse rates n 53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

66 2010, 2014 baseball salaries n 2010 n = 845 mean = $3,297,828 median = $1,330,000 max = $33,000,000 n 2014 n = 848 mean = $3,932,912 median = $1,456,250 max = $28,000,000

67 Disadvantage of the mean n Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

68 Mean, Median, Maximum Baseball Salaries 1985 - 2014

69 Skewness: comparing the mean, and median n Skewed to the right (positively skewed) n mean>median

70 Skewed to the left; negatively skewed n Mean < median n mean=78; median=87;

71 Symmetric data n mean, median approx. equal

72 Section 3.3 Describing Variability of Data Standard Deviation Using the Mean and Standard Deviation Together: 68-95-99.7 Rule (Empirical Rule)

73 Recall: 2 characteristics of a data set to measure n center measures where the “middle” of the data is located n variability measures how “spread out” the data is

74 Ways to measure variability 1. range=largest-smallest ok sometimes; in general, too crude; sensitive to one large or small obs.

75 Example

76 The Sample Standard Deviation, a measure of spread around the mean n Square the deviation of each observation from the mean; find the square root of the “average” of these squared deviations

77 Calculations … Mean = 63.4 Sum of squared deviations from mean = 85.2 (n − 1) = 13; (n − 1) is called degrees freedom (df) s 2 = variance = 85.2/13 = 6.55 square inches s = standard deviation = √6.55 = 2.56 inches Women height (inches)

78 1. First calculate the variance s 2. 2. Then take the square root to get the standard deviation s. Mean ± 1 s.d. We’ll never calculate these by hand, so make sure to know how to get the standard deviation using your calculator, Excel, or other software.

79 Population Standard Deviation

80 Remarks 1. The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

81 Remarks (cont.) 2. Note that s and  are always greater than or equal to zero. 3. The larger the value of s (or  ), the greater the spread of the data. When does s=0? When does  =0? When all data values are the same.

82 Remarks (cont.) 4. The standard deviation is the most commonly used measure of risk in finance and business –Stocks, Mutual Funds, etc. 5. Variance  s 2 sample variance   2 population variance  Units are squared units of the original data  square $, square gallons ??

83 Review: Properties of s and  s and  are always greater than or equal to 0 when does s = 0?  = 0? The larger the value of s (or  ), the greater the spread of the data n the standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

84 Summary of Notation

85 Section 3.3 (cont.) Using the Mean and Standard Deviation Together 68-95-99.7 rule (also called the Empirical Rule) z-scores

86 68-95-99.7 rule Mean and Standard Deviation (numerical) Histogram (graphical) 68-95-99.7 rule

87 The 68-95-99.7 rule. If the histogram of the data is approximately bell-shaped, then

88 68-95-99.7 rule: 68% within 1 stan. dev. of the mean 68% 34% y-s y y+s

89 68-95-99.7 rule: 95% within 2 stan. dev. of the mean 95% 47.5% y-2s y y+2s

90 Example: textbook costs 286291307308315316327 328340342346347348348 349354355355360361364 367369371373377380381 382385385387390390397 398409409410418422424 425426428433434437440 480

91 Example: textbook costs (cont.) 286291307308315316327328 340342346347348348349354 355355360361364367369371 373377380381382385385387 390390397398409409410418 422424425426428433434437 440480

92 Example: textbook costs (cont.) 286291307308315316327328 340342346347348348349354 355355360361364367369371 373377380381382385385387 390390397398409409410418 422424425426428433434437 440480

93 Example: textbook costs (cont.) 286291307308315316327328 340342346347348348349354 355355360361364367369371 373377380381382385385387 390390397398409409410418 422424425426428433434437 440480

94 The best estimate of the standard deviation of the men’s weights displayed in this dotplot is 1. 10 2. 15 3. 20 4. 40

95 Section 3.3 (cont.) Using the Mean and Standard Deviation Together 68-95-99.7 rule (also called the Empirical Rule) z-scores Preceding slidesNext

96 Z-scores: Standardized Data Values Measures the distance of a number from the mean in units of the standard deviation

97 z-score corresponding to y

98 n Exam 1: y 1 = 88, s 1 = 6; your exam 1 score: 91 Exam 2: y 2 = 88, s 2 = 10; your exam 2 score: 92 Which score is better?

99 Comparing SAT and ACT Scores n SAT Math: Eleanor’s score 680 SAT mean =500 sd=100 n ACT Math: Gerald’s score 27 ACT mean=18 sd=6 n Eleanor’s z-score: z=(680-500)/100=1.8 n Gerald’s z-score: z=(27-18)/6=1.5 n Eleanor’s score is better.

100 Z-scores add to zero Student/Institutional Support to Athletic Depts For the 9 Public ACC Schools: 2013 ($ millions) SchoolSupporty - ybarZ-score Maryland15.56.41.79 UVA13.14.01.12 Louisville10.91.80.50 UNC9.20.10.03 VaTech7.9-1.2-0.34 FSU7.9-1.2-0.34 GaTech7.1-2.0-0.56 NCSU6.5-2.6-0.73 Clemson3.8-5.3-1.47 Mean=9.1000, s=3.5697 Sum = 0

101 Recently the mean tuition at 4-yr public colleges/universities in the U.S. was $6185 with a standard deviation of $1804. In NC the mean tuition was $4320. What is NC’s z-score? 1. 1.03 2. -1.03 3. 2.39 4. 1865 5. -1865

102 Section 3.4 Measures of Position (also called Measures of Relative Standing) Quartiles 5-Number Summary Interquartile Range: Another Measure of Spread Boxplots

103 m = median = 3.4 Q 1 = first quartile = 2.3 Q 3 = third quartile = 4.2 Quartiles: Measuring spread by examining the middle The first quartile, Q 1, is the value in the sample that has 25% of the data at or below it (Q 1 is the median of the lower half of the sorted data). The third quartile, Q 3, is the value in the sample that has 75% of the data at or below it (Q 3 is the median of the upper half of the sorted data).

104 Quartiles and median divide data into 4 pieces Q1 M Q3 Q1 M Q3 1/4 1/41/4 1/4

105 Quartiles are common measures of spread n http://oirp.ncsu.edu/ir/admit http://oirp.ncsu.edu/ir/admit n http://oirp.ncsu.edu/univ/peer http://oirp.ncsu.edu/univ/peer n University of Southern California University of Southern California n Economic Value of College Majors Economic Value of College Majors

106 Rules for Calculating Quartiles Step 1: find the median of all the data (the median divides the data in half) Step 2a: find the median of the lower half; this median is Q 1 ; Step 2b: find the median of the upper half; this median is Q 3. Important: when n is odd include the overall median in both halves; when n is even do not include the overall median in either half.

107 Example n 2 4 6 8 10 12 14 16 18 20 n = 10 n Median n m = (10+12)/2 = 22/2 = 11 n Q 1 : median of lower half 2 4 6 8 10 Q 1 = 6 n Q 3 : median of upper half 12 14 16 18 20 Q 3 = 16 11

108 Pulse Rates n = 138 Median: mean of pulses in locations 69 & 70: median= (70+70)/2=70 Q 1 : median of lower half (lower half = 69 smallest pulses); Q 1 = pulse in ordered position 35; Q 1 = 63 Q 3 median of upper half (upper half = 69 largest pulses); Q 3 = pulse in position 35 from the high end; Q 3 =78

109 Below are the weights of 31 linemen on the NCSU football team. What is the value of the first quartile Q 1 ? #stemleaf 22255 42357 62426 7257 1026257 122759 (4)281567 152935599 1030333 73145 532155 2336 1340 1. 287 2. 257.5 3. 263.5 4. 262.5

110 Interquartile range, another measure of spread n lower quartile Q 1 n middle quartile: median n upper quartile Q 3 n interquartile range (IQR) IQR = Q 3 – Q 1 measures spread of middle 50% of the data

111 Example: beginning pulse rates n Q 3 = 78; Q 1 = 63 n IQR = 78 – 63 = 15

112 Below are the weights of 31 linemen on the NCSU football team. The first quartile Q 1 is 263.5. What is the value of the IQR? #stemleaf 22255 42357 62426 7257 1026257 122759 (4)281567 152935599 1030333 73145 532155 2336 1340 1. 23.5 2. 39.5 3. 46 4. 69.5

113 5-number summary of data n Minimum Q 1 median Q 3 maximum n Example: Pulse data 45 63 70 78 111

114 m = median = 3.4 Q 3 = third quartile = 4.2 Q 1 = first quartile = 2.3 Largest = max = 6.1 Smallest = min = 0.6 Five-number summary: min Q 1 m Q 3 max Boxplot: display of 5-number summary BOXPLOT

115 Boxplot: display of 5-number summary n Example: age of 66 “crush” victims at rock concerts 2001-2010. 5-number summary: 13 17 19 22 47

116 Q 3 = third quartile = 4.2 Q 1 = first quartile = 2.3 Largest = max = 7.9 Boxplot: display of 5-number summary BOXPLOT 8 Interquartile range Q 3 – Q 1 = 4.2 − 2.3 = 1.9 Q 3 +1.5*IQR= 4.2+2.85 = 7.05 1.5 * IQR = 1.5*1.9=2.85. Individual #25 has a value of 7.9 years, so 7.9 is an outlier. The line from the top end of the box is drawn to the biggest number in the data that is less than 7.05

117 ATM Withdrawals by Day, Month, Holidays

118

119 Beg. of class pulses (n=138) n Q 1 = 63, Q 3 = 78 n IQR=78  63=15 n 1.5(IQR)=1.5(15)=22.5 n Q 1 - 1.5(IQR): 63 – 22.5=40.5 n Q 3 + 1.5(IQR): 78 + 22.5=100.5 70 63 78 40.5 100.5 45

120 Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who gained at least 50 yards. What is the approximate value of Q 3 ? 0 136 273 410 547 684 821 958 1095 1232 1369 Pass Catching Yards by Receivers 1. 450 2. 750 3. 215 4. 545

121 Rock concert deaths: histogram and boxplot

122 Automating Boxplot Construction n Excel “out of the box” does not draw boxplots. n Many add-ins are available on the internet that give Excel the capability to draw box plots. n Statcrunch (http://statcrunch.stat.ncsu.edu) draws box plots.

123 Tuition 4-yr Colleges

124 Section 3.5 Bivariate Descriptive Statistics Contingency Tables for Bivariate Categorical Data Scatterplots and Correlation for Bivariate Quantitative Data

125 Basic Terminology n Univariate data: 1 variable is measured on each sample unit or population unit For example: height of each student in a sample n Bivariate data: 2 variables are measured on each sample unit or population unit e.g. height and GPA of each student in a sample; (caution: data from 2 separate samples is not bivariate data)

126 Contingency Tables for Bivariate Categorical Data n Example: Survival and class on the Titanic Marginal distributions marg. dist. of survival 710/2201 32.3% 1491/2201 67.7% marg. dist. of class 885/2201 40.2% 325/2201 14.8% 285/2201 12.9% 706/2201 32.1%

127 Marginal distribution of class. Bar chart.

128 Marginal distribution of class: Pie chart

129 Contingency Tables for Bivariate Categorical Data - 2 n Conditional distributions. Given the class of a passenger, what is the chance the passenger survived?

130 Conditional distributions: segmented bar chart

131 Contingency Tables for Bivariate Categorical Data - 3 Questions: n What fraction of survivors were in first class? n What fraction of passengers were in first class and survivors ? n What fraction of the first class passengers survived? 202/710 202/2201 202/325

132 TV viewers during the Super Bowl in 2013. What is the marginal distribution of those who watched the commercials only? 1. 8.0% 2. 23.5% 3. 58.2% 4. 27.7%

133 TV viewers during the Super Bowl in 2013. What percentage watched the game and were female? 1. 41.8% 2. 38.8% 3. 51.2% 4. 19.8%

134 TV viewers during the Super Bowl in 2013. Given that a viewer did not watch the Super Bowl telecast, what percentage were male? 1. 45.2% 2. 48.8% 3. 26.8% 4. 27.7%

135 Section 3.5 Bivariate Descriptive Statistics Contingency Tables for Bivariate Categorical Data Scatterplots and Correlation for Bivariate Quantitative Data Previous slides Next

136 StudentBeersBlood Alcohol 150.1 220.03 390.19 470.095 530.07 630.02 740.07 850.085 980.12 1030.04 1150.06 1250.05 1360.1 1470.09 1510.01 1640.05 Here, we have two quantitative variables for each of 16 students. 1) How many beers they drank, and 2) Their blood alcohol level (BAC) We are interested in the relationship between the two variables: How is one affected by changes in the other one? Scatterplots: the most frequently used method to graphically describe the relationship between 2 quantitative variables

137 StudentBeersBAC 150.1 220.03 390.19 470.095 530.07 630.02 740.07 850.085 980.12 1030.04 1150.06 1250.05 1360.1 1470.09 1510.01 1640.05 Scatterplot: Blood Alcohol Content vs Number of Beers In a scatterplot, one axis is used to represent each of the variables, and the data are plotted as points on the graph.

138 Scatterplot: Fuel Consumption vs Car Weight. x=car weight, y=fuel cons. n (x i, y i ): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)

139 The correlation coefficient r is a measure of the direction and strength of the linear relationship between 2 quantitative variables. The correlation coefficient "r" Correlation can only be used to describe quantitative variables. Categorical variables don’t have means and standard deviations.

140 Correlation: Fuel Consumption vs Car Weight r =.9766

141 Properties r ranges from -1 to+1 "r" quantifies the strength and direction of a linear relationship between 2 quantitative variables. Strength: how closely the points follow a straight line. Direction: is positive when individuals with higher X values tend to have higher values of Y.

142 Properties (cont.) High correlation does not imply cause and effect CARROTS: Hidden terror in the produce department at your neighborhood grocery n Everyone who ate carrots in 1920, if they are still alive, has severely wrinkled skin!!! n Everyone who ate carrots in 1865 is now dead!!! n 45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest !!!

143 Properties: Cause and Effect n There is a strong positive correlation between the monetary damage caused by structural fires and the number of firemen present at the fire. (More firemen-more damage) n Improper training? Will no firemen present result in the least amount of damage?

144 Properties Cause and Effect n r measures the strength of the linear relationship between x and y; it does not indicate cause and effect x = fouls committed by player; y = points scored by same player (1,2) (24,75) (1,0) (18,59) (9,9) (3,7) (5,35) (20,46) (1,0) (3,2) (22,57) The correlation is due to a third “lurking” variable – playing time correlation r =.935

145 End of Chapter 3


Download ppt "Chapter 3 Descriptive Statistics: Graphical and Numerical Summaries of Data UNIT OBJECTIVES At the conclusion of this unit you should be able to: n 1)Construct."

Similar presentations


Ads by Google