Download presentation
Presentation is loading. Please wait.
Published byBennett Payne Modified over 9 years ago
2
Chapter 3 Descriptive Statistics: Graphical and Numerical Summaries of Data UNIT OBJECTIVES At the conclusion of this unit you should be able to: n 1)Construct graphs that appropriately describe data n 2)Calculate and interpret numerical summaries of a data set. n 3)Combine numerical methods with graphical methods to analyze a data set. n 4)Apply graphical methods of summarizing data to choose appropriate numerical summaries. n 5)Apply software and/or calculators to automate graphical and numerical summary procedures.
3
Section 3.1 Displaying Categorical Data “Sometimes you can see a lot just by looking.” Yogi Berra Hall of Fame Catcher, NY Yankees
4
The three rules of data analysis won’t be difficult to remember n 1.Make a picture —reveals aspects not obvious in the raw data; enables you to think clearly about the patterns and relationships that may be hiding in your data. n 2.Make a picture —to show important features of and patterns in the data. You may also see things that you did not expect: the extraordinary (possibly wrong) data values or unexpected patterns n 3.Make a picture —the best way to tell others about your data is with a well-chosen picture.
5
Bar Charts: show counts or relative frequency for each category n Example: Titanic passenger/crew distribution
6
Pie Charts: shows proportions of the whole in each category n Example: Titanic passenger/crew distribution
7
Example: Top 10 causes of death in the United States RankCauses of deathCounts % of top 10s % of total deaths 1Heart disease700,14237%28% 2Cancer553,76829%22% 3Cerebrovascular163,5389%6% 4Chronic respiratory123,0136%5% 5Accidents101,5375%4% 6Diabetes mellitus71,3724%3% 7Flu and pneumonia62,0343%2% 8Alzheimer’s disease53,8523%2% 9Kidney disorders39,4802% 10Septicemia32,2382%1% All other causes629,96725% For each individual who died in the United States, we record what was the cause of death. The table above is a summary of that information.
8
Top 10 causes of deaths in the United States Top 10 causes of death: bar graph Each category is represented by one bar. The bar’s height shows the count (or sometimes the percentage) for that particular category. The number of individuals who died of an accident in is approximately 100,000.
9
Bar graph sorted by rank Easy to analyze Top 10 causes of deaths in the United States Sorted alphabetically Much less useful
10
1. United States $158 2. China $64.4 3. Japan $54 4. Germany $24.4 5. Britain $23.5 6. France $19.3 7. Brazil $14.2 8. Italy $13.1 9. Australia $12.8 10. India $11.9 1. United States $137.9 2. Japan $23.4 3. Germany $20 4. Britain $16.8 5. France $12.6 6. Canada $7.3 7. Italy $6.3 8. China $5.4 9. Netherlands $5.4 10. Australia $4.8 Recent Annual Software Sales ($billions) Recent Annual Computer Hardware Sales ($billion) NY Times
11
Percent of people dying from top 10 causes of death in the United States Top 10 causes of death: pie chart Each slice represents a piece of one whole. The size of a slice depends on what percent of the whole this category represents.
12
Percent of deaths from top 10 causes Percent of deaths from all causes Make sure your labels match the data. Make sure all percents add up to 100.
13
Internships Basic bar chartSide-by-side bar chart
14
Trend, Student Debt by State (grads of public, 4 yr or more) National Average: 2009-10: $21,604 2012-13: $25,043
15
Student Debt North Carolina Schools
17
Unnecessary dimension in a pie chart 3 rd dimension is unnecessary; the 3D pie chart does not convey any more information than a 2D pie chart
18
Section 3.1 continued Displaying Quantitative Data Histograms Stem and Leaf Displays
19
Frequency Histograms
20
Relative Frequency Histogram of Exam Grades 0.05.10.15.20.25.30 405060708090 Grade Relative frequency 100
21
Histograms A histogram shows three general types of information: n It provides visual indication of where the approximate center of the data is. n We can gain an understanding of the degree of spread, or variation, in the data. n We can observe the shape of the distribution.
22
Histograms Showing Different Centers
23
Histograms - Same Center, Different Spread
24
Histograms: Shape n A distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other. Symmetric distribution Complex, multimodal distribution Not all distributions have a simple overall shape, especially when there are few observations. Skewed distribution A distribution is skewed to the right if the right side of the histogram (side with larger values) extends much farther out than the left side. It is skewed to the left if the left side of the histogram extends much farther out than the right side.
25
Shape (cont.)Female heart attack patients in New York state Age: left-skewedCost: right-skewed
26
Shape (cont.): outliers All 200 m Races, 20.2 secs or less
27
AlaskaFlorida Shape (cont.): Outliers An important kind of deviation is an outlier. Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. The overall pattern is fairly symmetrical except for 2 states clearly not belonging to the main trend. Alaska and Florida have unusual representation of the elderly in their population. A large gap in the distribution is typically a sign of an outlier.
28
Excel Example: 2012-13 NFL Salaries
29
Statcrunch Example: 2012-13 NFL Salaries
30
Heights of Students in Recent Stats Class (Bimodal)
31
Example: Grades on a statistics exam Data: 75 66 77 66 64 73 91 65 59 86 61 86 61 58 70 77 80 58 94 78 62 79 83 54 52 45 82 48 67 55
32
Example-2 Frequency Distribution of Grades Class Limits Frequency 40 up to 50 50 up to 60 60 up to 70 70 up to 80 80 up to 90 90 up to 100 Total 2 6 8 7 5 2 30
33
Example-3: Relative Frequency Distribution of Grades Class Limits Relative Frequency 40 up to 50 50 up to 60 60 up to 70 70 up to 80 80 up to 90 90 up to 100 2/30 =.067 6/30 =.200 8/30 =.267 7/30 =.233 5/30 =.167 2/30 =.067
34
Relative Frequency Histogram of Grades 0.05.10.15.20.25.30 405060708090 Grade Relative frequency 100
35
Based on the histo- gram, about what percent of the values are between 47.5 and 52.5? 1. 50% 2. 5% 3. 17% 4. 30%
36
Stem and leaf displays n Have the following general appearance stemleaf 18 9 21 2 8 9 9 32 3 8 9 40 1 56 7 64
37
Example: employee ages at a small company 18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39; stem: 10’s digit; leaf: 1’s digit n 18: stem=1; leaf=8; 18 = 1 | 8 stemleaf 18 9 21 2 8 9 9 32 3 8 9 40 1 56 7 64
38
Suppose a 95 yr. old is hired stemleaf 18 9 21 2 8 9 9 32 3 8 9 40 1 56 7 64 7 8 95
39
Number of TD passes by NFL teams: 2012-2013 season ( stems are 10’s digit) stemleaf 4343 03 247 26677789 201222233444 113467889 08
40
Pulse Rates n = 138
41
Advantages/Disadvantages of Stem-and-Leaf Displays n Advantages 1) each measurement displayed 2) ascending order in each stem row 3) relatively simple (data set not too large) n Disadvantages display becomes unwieldy for large data sets
42
Population of 185 US cities with between 100,000 and 500,000 n Multiply stems by 100,000
43
Back-to-back stem-and-leaf displays. TD passes by NFL teams: 1999-2000, 2012-13 multiply stems by 10 1999-20002012-13 2403 637 2324 665526677789 43322221100201222233444 9998887666167889 4211134 08
44
Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic. How many pulses are between 67 and 77? Stems are 10’s digits 1. 4 2. 6 3. 8 4. 10 5. 12
45
Other Graphical Methods for Data n Time plots plot observations in time order; time on horizontal axis, variable on vertical axis ** Time series measurements are taken at regular intervals (monthly unemployment, quarterly GDP, weather records, electricity demand, etc.) Heat maps, word walls
46
Unemployment Rate, by Educational Attainment
47
Water Use During Super Bowl XLV (Packers 31, Steelers 25)
48
Heat Maps
49
Word Wall (customer feedback)
50
Section 3.2 Describing the Center of Data Mean Median
51
2 characteristics of a data set to measure n center measures where the “middle” of the data is located n variability (next section) measures how “spread out” the data is
52
Notation for Data Values and Sample Mean
53
Simple Example of Sample Mean n Weekly TV viewing time in hours of 7 randomly selected 4 th graders: 19, 40, 16, 12, 10, 6, and 9
54
Population Mean
55
Connection Between Mean and Histogram n A histogram balances when supported at the mean. Mean x = 140.6
56
The median: another measure of center Given a set of n data values arranged in order of magnitude, Median=middle valuen odd mean of 2 middle values,n even n Ex. 2, 4, 6, 8, 10; n=5; median=6 n Ex. 2, 4, 6, 8; n=4; median=(4+6)/2=5
57
Student Pulse Rates (n=62) 38, 59, 60, 60, 62, 62, 63, 63, 64, 64, 65, 67, 68, 70, 70, 70, 70, 70, 70, 70, 71, 71, 72, 72, 73, 74, 74, 75, 75, 75, 75, 76, 77, 77, 77, 77, 78, 78, 79, 79, 80, 80, 80, 84, 84, 85, 85, 87, 90, 90, 91, 92, 93, 94, 94, 95, 96, 96, 96, 98, 98, 103 Median = (75+76)/2 = 75.5
58
The median splits the histogram into 2 halves of equal area
59
Mean: balance point Median: 50% area each half mean 55.26 years, median 57.7years
60
Medians are used often n Year 2011 baseball salaries Median $1,450,000 (max=$32,000,000 Alex Rodriguez; min=$414,000) n Median fan age: MLB 45; NFL 43; NBA 41; NHL 39 n Median existing home sales price: May 2011 $166,500; May 2010 $174,600 n Median household income (2008 dollars) 2009 $50,221; 2008 $52,029
61
Examples n Example: n = 7 17.5 2.8 3.2 13.9 14.1 25.3 45.8 n Example n = 7 (ordered): n 2.8 3.2 13.9 14.1 17.5 25.3 45.8 n Example: n = 8 17.5 2.8 3.2 13.9 14.1 25.3 35.7 45.8 n Example n =8 (ordered) 2.8 3.2 13.9 14.1 17.5 25.3 35.7 45.8 m = 14.1 m = (14.1+17.5)/2 = 15.8
62
Below are the annual tuition charges at 7 public universities. What is the median tuition? 4429 4960 4971 5245 5546 7586 1. 5245 2. 4965.5 3. 4960 4. 4971
63
Below are the annual tuition charges at 7 public universities. What is the median tuition? 4429 4960 5245 5546 4971 5587 7586 1. 5245 2. 4965.5 3. 5546 4. 4971
64
Properties of Mean, Median 1.The mean and median are unique; that is, a data set has only 1 mean and 1 median (the mean and median are not necessarily equal). 2.The mean uses the value of every number in the data set; the median does not.
65
Example: class pulse rates n 53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140
66
2010, 2014 baseball salaries n 2010 n = 845 mean = $3,297,828 median = $1,330,000 max = $33,000,000 n 2014 n = 848 mean = $3,932,912 median = $1,456,250 max = $28,000,000
67
Disadvantage of the mean n Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data
68
Mean, Median, Maximum Baseball Salaries 1985 - 2014
69
Skewness: comparing the mean, and median n Skewed to the right (positively skewed) n mean>median
70
Skewed to the left; negatively skewed n Mean < median n mean=78; median=87;
71
Symmetric data n mean, median approx. equal
72
Section 3.3 Describing Variability of Data Standard Deviation Using the Mean and Standard Deviation Together: 68-95-99.7 Rule (Empirical Rule)
73
Recall: 2 characteristics of a data set to measure n center measures where the “middle” of the data is located n variability measures how “spread out” the data is
74
Ways to measure variability 1. range=largest-smallest ok sometimes; in general, too crude; sensitive to one large or small obs.
75
Example
76
The Sample Standard Deviation, a measure of spread around the mean n Square the deviation of each observation from the mean; find the square root of the “average” of these squared deviations
77
Calculations … Mean = 63.4 Sum of squared deviations from mean = 85.2 (n − 1) = 13; (n − 1) is called degrees freedom (df) s 2 = variance = 85.2/13 = 6.55 square inches s = standard deviation = √6.55 = 2.56 inches Women height (inches)
78
1. First calculate the variance s 2. 2. Then take the square root to get the standard deviation s. Mean ± 1 s.d. We’ll never calculate these by hand, so make sure to know how to get the standard deviation using your calculator, Excel, or other software.
79
Population Standard Deviation
80
Remarks 1. The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement
81
Remarks (cont.) 2. Note that s and are always greater than or equal to zero. 3. The larger the value of s (or ), the greater the spread of the data. When does s=0? When does =0? When all data values are the same.
82
Remarks (cont.) 4. The standard deviation is the most commonly used measure of risk in finance and business –Stocks, Mutual Funds, etc. 5. Variance s 2 sample variance 2 population variance Units are squared units of the original data square $, square gallons ??
83
Review: Properties of s and s and are always greater than or equal to 0 when does s = 0? = 0? The larger the value of s (or ), the greater the spread of the data n the standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement
84
Summary of Notation
85
Section 3.3 (cont.) Using the Mean and Standard Deviation Together 68-95-99.7 rule (also called the Empirical Rule) z-scores
86
68-95-99.7 rule Mean and Standard Deviation (numerical) Histogram (graphical) 68-95-99.7 rule
87
The 68-95-99.7 rule. If the histogram of the data is approximately bell-shaped, then
88
68-95-99.7 rule: 68% within 1 stan. dev. of the mean 68% 34% y-s y y+s
89
68-95-99.7 rule: 95% within 2 stan. dev. of the mean 95% 47.5% y-2s y y+2s
90
Example: textbook costs 286291307308315316327 328340342346347348348 349354355355360361364 367369371373377380381 382385385387390390397 398409409410418422424 425426428433434437440 480
91
Example: textbook costs (cont.) 286291307308315316327328 340342346347348348349354 355355360361364367369371 373377380381382385385387 390390397398409409410418 422424425426428433434437 440480
92
Example: textbook costs (cont.) 286291307308315316327328 340342346347348348349354 355355360361364367369371 373377380381382385385387 390390397398409409410418 422424425426428433434437 440480
93
Example: textbook costs (cont.) 286291307308315316327328 340342346347348348349354 355355360361364367369371 373377380381382385385387 390390397398409409410418 422424425426428433434437 440480
94
The best estimate of the standard deviation of the men’s weights displayed in this dotplot is 1. 10 2. 15 3. 20 4. 40
95
Section 3.3 (cont.) Using the Mean and Standard Deviation Together 68-95-99.7 rule (also called the Empirical Rule) z-scores Preceding slidesNext
96
Z-scores: Standardized Data Values Measures the distance of a number from the mean in units of the standard deviation
97
z-score corresponding to y
98
n Exam 1: y 1 = 88, s 1 = 6; your exam 1 score: 91 Exam 2: y 2 = 88, s 2 = 10; your exam 2 score: 92 Which score is better?
99
Comparing SAT and ACT Scores n SAT Math: Eleanor’s score 680 SAT mean =500 sd=100 n ACT Math: Gerald’s score 27 ACT mean=18 sd=6 n Eleanor’s z-score: z=(680-500)/100=1.8 n Gerald’s z-score: z=(27-18)/6=1.5 n Eleanor’s score is better.
100
Z-scores add to zero Student/Institutional Support to Athletic Depts For the 9 Public ACC Schools: 2013 ($ millions) SchoolSupporty - ybarZ-score Maryland15.56.41.79 UVA13.14.01.12 Louisville10.91.80.50 UNC9.20.10.03 VaTech7.9-1.2-0.34 FSU7.9-1.2-0.34 GaTech7.1-2.0-0.56 NCSU6.5-2.6-0.73 Clemson3.8-5.3-1.47 Mean=9.1000, s=3.5697 Sum = 0
101
Recently the mean tuition at 4-yr public colleges/universities in the U.S. was $6185 with a standard deviation of $1804. In NC the mean tuition was $4320. What is NC’s z-score? 1. 1.03 2. -1.03 3. 2.39 4. 1865 5. -1865
102
Section 3.4 Measures of Position (also called Measures of Relative Standing) Quartiles 5-Number Summary Interquartile Range: Another Measure of Spread Boxplots
103
m = median = 3.4 Q 1 = first quartile = 2.3 Q 3 = third quartile = 4.2 Quartiles: Measuring spread by examining the middle The first quartile, Q 1, is the value in the sample that has 25% of the data at or below it (Q 1 is the median of the lower half of the sorted data). The third quartile, Q 3, is the value in the sample that has 75% of the data at or below it (Q 3 is the median of the upper half of the sorted data).
104
Quartiles and median divide data into 4 pieces Q1 M Q3 Q1 M Q3 1/4 1/41/4 1/4
105
Quartiles are common measures of spread n http://oirp.ncsu.edu/ir/admit http://oirp.ncsu.edu/ir/admit n http://oirp.ncsu.edu/univ/peer http://oirp.ncsu.edu/univ/peer n University of Southern California University of Southern California n Economic Value of College Majors Economic Value of College Majors
106
Rules for Calculating Quartiles Step 1: find the median of all the data (the median divides the data in half) Step 2a: find the median of the lower half; this median is Q 1 ; Step 2b: find the median of the upper half; this median is Q 3. Important: when n is odd include the overall median in both halves; when n is even do not include the overall median in either half.
107
Example n 2 4 6 8 10 12 14 16 18 20 n = 10 n Median n m = (10+12)/2 = 22/2 = 11 n Q 1 : median of lower half 2 4 6 8 10 Q 1 = 6 n Q 3 : median of upper half 12 14 16 18 20 Q 3 = 16 11
108
Pulse Rates n = 138 Median: mean of pulses in locations 69 & 70: median= (70+70)/2=70 Q 1 : median of lower half (lower half = 69 smallest pulses); Q 1 = pulse in ordered position 35; Q 1 = 63 Q 3 median of upper half (upper half = 69 largest pulses); Q 3 = pulse in position 35 from the high end; Q 3 =78
109
Below are the weights of 31 linemen on the NCSU football team. What is the value of the first quartile Q 1 ? #stemleaf 22255 42357 62426 7257 1026257 122759 (4)281567 152935599 1030333 73145 532155 2336 1340 1. 287 2. 257.5 3. 263.5 4. 262.5
110
Interquartile range, another measure of spread n lower quartile Q 1 n middle quartile: median n upper quartile Q 3 n interquartile range (IQR) IQR = Q 3 – Q 1 measures spread of middle 50% of the data
111
Example: beginning pulse rates n Q 3 = 78; Q 1 = 63 n IQR = 78 – 63 = 15
112
Below are the weights of 31 linemen on the NCSU football team. The first quartile Q 1 is 263.5. What is the value of the IQR? #stemleaf 22255 42357 62426 7257 1026257 122759 (4)281567 152935599 1030333 73145 532155 2336 1340 1. 23.5 2. 39.5 3. 46 4. 69.5
113
5-number summary of data n Minimum Q 1 median Q 3 maximum n Example: Pulse data 45 63 70 78 111
114
m = median = 3.4 Q 3 = third quartile = 4.2 Q 1 = first quartile = 2.3 Largest = max = 6.1 Smallest = min = 0.6 Five-number summary: min Q 1 m Q 3 max Boxplot: display of 5-number summary BOXPLOT
115
Boxplot: display of 5-number summary n Example: age of 66 “crush” victims at rock concerts 2001-2010. 5-number summary: 13 17 19 22 47
116
Q 3 = third quartile = 4.2 Q 1 = first quartile = 2.3 Largest = max = 7.9 Boxplot: display of 5-number summary BOXPLOT 8 Interquartile range Q 3 – Q 1 = 4.2 − 2.3 = 1.9 Q 3 +1.5*IQR= 4.2+2.85 = 7.05 1.5 * IQR = 1.5*1.9=2.85. Individual #25 has a value of 7.9 years, so 7.9 is an outlier. The line from the top end of the box is drawn to the biggest number in the data that is less than 7.05
117
ATM Withdrawals by Day, Month, Holidays
119
Beg. of class pulses (n=138) n Q 1 = 63, Q 3 = 78 n IQR=78 63=15 n 1.5(IQR)=1.5(15)=22.5 n Q 1 - 1.5(IQR): 63 – 22.5=40.5 n Q 3 + 1.5(IQR): 78 + 22.5=100.5 70 63 78 40.5 100.5 45
120
Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who gained at least 50 yards. What is the approximate value of Q 3 ? 0 136 273 410 547 684 821 958 1095 1232 1369 Pass Catching Yards by Receivers 1. 450 2. 750 3. 215 4. 545
121
Rock concert deaths: histogram and boxplot
122
Automating Boxplot Construction n Excel “out of the box” does not draw boxplots. n Many add-ins are available on the internet that give Excel the capability to draw box plots. n Statcrunch (http://statcrunch.stat.ncsu.edu) draws box plots.
123
Tuition 4-yr Colleges
124
Section 3.5 Bivariate Descriptive Statistics Contingency Tables for Bivariate Categorical Data Scatterplots and Correlation for Bivariate Quantitative Data
125
Basic Terminology n Univariate data: 1 variable is measured on each sample unit or population unit For example: height of each student in a sample n Bivariate data: 2 variables are measured on each sample unit or population unit e.g. height and GPA of each student in a sample; (caution: data from 2 separate samples is not bivariate data)
126
Contingency Tables for Bivariate Categorical Data n Example: Survival and class on the Titanic Marginal distributions marg. dist. of survival 710/2201 32.3% 1491/2201 67.7% marg. dist. of class 885/2201 40.2% 325/2201 14.8% 285/2201 12.9% 706/2201 32.1%
127
Marginal distribution of class. Bar chart.
128
Marginal distribution of class: Pie chart
129
Contingency Tables for Bivariate Categorical Data - 2 n Conditional distributions. Given the class of a passenger, what is the chance the passenger survived?
130
Conditional distributions: segmented bar chart
131
Contingency Tables for Bivariate Categorical Data - 3 Questions: n What fraction of survivors were in first class? n What fraction of passengers were in first class and survivors ? n What fraction of the first class passengers survived? 202/710 202/2201 202/325
132
TV viewers during the Super Bowl in 2013. What is the marginal distribution of those who watched the commercials only? 1. 8.0% 2. 23.5% 3. 58.2% 4. 27.7%
133
TV viewers during the Super Bowl in 2013. What percentage watched the game and were female? 1. 41.8% 2. 38.8% 3. 51.2% 4. 19.8%
134
TV viewers during the Super Bowl in 2013. Given that a viewer did not watch the Super Bowl telecast, what percentage were male? 1. 45.2% 2. 48.8% 3. 26.8% 4. 27.7%
135
Section 3.5 Bivariate Descriptive Statistics Contingency Tables for Bivariate Categorical Data Scatterplots and Correlation for Bivariate Quantitative Data Previous slides Next
136
StudentBeersBlood Alcohol 150.1 220.03 390.19 470.095 530.07 630.02 740.07 850.085 980.12 1030.04 1150.06 1250.05 1360.1 1470.09 1510.01 1640.05 Here, we have two quantitative variables for each of 16 students. 1) How many beers they drank, and 2) Their blood alcohol level (BAC) We are interested in the relationship between the two variables: How is one affected by changes in the other one? Scatterplots: the most frequently used method to graphically describe the relationship between 2 quantitative variables
137
StudentBeersBAC 150.1 220.03 390.19 470.095 530.07 630.02 740.07 850.085 980.12 1030.04 1150.06 1250.05 1360.1 1470.09 1510.01 1640.05 Scatterplot: Blood Alcohol Content vs Number of Beers In a scatterplot, one axis is used to represent each of the variables, and the data are plotted as points on the graph.
138
Scatterplot: Fuel Consumption vs Car Weight. x=car weight, y=fuel cons. n (x i, y i ): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)
139
The correlation coefficient r is a measure of the direction and strength of the linear relationship between 2 quantitative variables. The correlation coefficient "r" Correlation can only be used to describe quantitative variables. Categorical variables don’t have means and standard deviations.
140
Correlation: Fuel Consumption vs Car Weight r =.9766
141
Properties r ranges from -1 to+1 "r" quantifies the strength and direction of a linear relationship between 2 quantitative variables. Strength: how closely the points follow a straight line. Direction: is positive when individuals with higher X values tend to have higher values of Y.
142
Properties (cont.) High correlation does not imply cause and effect CARROTS: Hidden terror in the produce department at your neighborhood grocery n Everyone who ate carrots in 1920, if they are still alive, has severely wrinkled skin!!! n Everyone who ate carrots in 1865 is now dead!!! n 45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest !!!
143
Properties: Cause and Effect n There is a strong positive correlation between the monetary damage caused by structural fires and the number of firemen present at the fire. (More firemen-more damage) n Improper training? Will no firemen present result in the least amount of damage?
144
Properties Cause and Effect n r measures the strength of the linear relationship between x and y; it does not indicate cause and effect x = fouls committed by player; y = points scored by same player (1,2) (24,75) (1,0) (18,59) (9,9) (3,7) (5,35) (20,46) (1,0) (3,2) (22,57) The correlation is due to a third “lurking” variable – playing time correlation r =.935
145
End of Chapter 3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.