Download presentation
Presentation is loading. Please wait.
1
Describing data with graphics and numbers
2
Types of Data Categorical Variables –also known as class variables, nominal variables Quantitative Variables –aka numerical nariables –either continuous or discrete.
3
Graphing categorical variables
4
Ten most common causes of death in Americans between 15 and 19 years old in 1999.
5
Bar graphs
6
Graphing numerical variables
7
Heights of BIOL 300 students (cm) 165 168 163 173 170 163 170 155152 190 170 168 142 160 154 165156 177 173 165 165 175 155 166 168 165 180 165
8
Stem-and-leaf plot
9
19 18 17 16 15 14 0 0 0 0 3 3 5 7 0 3 3 5 5 5 5 5 5 6 8 8 8 2 4 5 5 6 2
10
Frequency table Height GroupFrequency 141-150 151-160 161-170 171-180 181-190
11
Frequency table Height GroupFrequency 141-1501 151-1606 161-17015 171-1805 181-1901
12
Histogram
14
Frequency distribution
15
Histogram with more data
18
90th percentile 50th percentile (median)
19
Associations between two categorical variables
20
Association between reproductive effort and avian malaria
22
Mosaic plot
23
Grouped Bar Graph
24
Associations between categorical and numerical variables
25
Multiple histograms
26
Associations between two numerical variables
27
Scatterplots
29
Evaluating Graphics Lie factor Chartjunk Efficiency
30
Don’t mislead with graphics
31
Better representation of truth
32
Lie Factor Lie factor = size of effect shown in graphic size of effect in data
33
Lie Factor Example Effect in graphic: 2.33/0.08 = 29.1 Effect in data: 6748/5844 = 1.15 Lie factor = 29.1 / 1.15 = 25.3
34
Chartjunk
36
Needless 3D Graphics
38
Summary: Graphical methods for frequency distributions
39
Summary: Associations between variables
40
Great book on graphics
41
Describing data
42
Two common descriptions of data Location (or central tendency) Width (or spread)
43
Measures of location Mean Median Mode
44
Mean n is the size of the sample
45
Mean Y 1 =56, Y 2 =72, Y 3 =18, Y 4 =42
46
Mean Y 1 =56, Y 2 =72, Y 3 =18, Y 4 =42 = (56+72+18+42) / 4 = 47
47
Median The median is the middle measurement in a set of ordered data.
48
The data: 18 28 24 25 36 14 34
49
The data: 18 28 24 25 36 14 34 can be put in order: 14 18 24 25 28 34 36 Median is 25.
51
Mean vs. median in politics 2004 U.S. Economy Republicans: times are good –Mean income increasing ~ 4% per year Democrats: times are bad –Median family income fell Why?
54
Measures of width Range Standard deviation Variance Coefficient of variation
55
Range 14 17 18 20 22 22 24 25 26 28 28 28 30 34 36
56
Range 14 17 18 20 22 22 24 25 26 28 28 28 30 34 36 The range is 36-14 = 22
58
Population Variance
59
Sample variance n is the sample size
60
Shortcut for calculating sample variance
61
Standard deviation (SD) Positive square root of the variance is the true standard deviation s is the sample standard deviation
62
In class exercise Calculate the variance and standard deviation of a sample with the following data: 6, 1, 2
63
Answer Variance=7 Standard deviation =
64
Coefficient of variance (CV) CV = 100 s /.
65
Equal means, different variances
66
Manipulating means The mean of the sum of two variables: E[X + Y] = E[X]+ E[Y] The mean of the sum of a variable and a constant: E[X + c] = E[X]+ c The mean of a product of a variable and a constant: E[c X] = c E[X] The mean of a product of two variables: E[X Y] = E[X] E[Y] if and only if X and Y are independent.
67
Manipulating variance The variance of the sum of two variables: Var[X + Y] = Var[X]+ Var[Y] if and only if X and Y are independent. The variance of the sum of a variable and a constant: Var[X + c] = Var[X] The variance of a product of a variable and a constant: Var[c X] = c 2 Var[X]
68
Parents’ heights MeanVariance Father Height174.371.7 Mother Height160.458.3 Father Height +Mother Height 334.7184.9
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.