Describing data with graphics and numbers
Types of Data Categorical Variables –also known as class variables, nominal variables Quantitative Variables –aka numerical nariables –either continuous or discrete.
Graphing categorical variables
Ten most common causes of death in Americans between 15 and 19 years old in 1999.
Bar graphs
Graphing numerical variables
Heights of BIOL 300 students (cm)
Stem-and-leaf plot
Frequency table Height GroupFrequency
Frequency table Height GroupFrequency
Histogram
Frequency distribution
Histogram with more data
90th percentile 50th percentile (median)
Associations between two categorical variables
Association between reproductive effort and avian malaria
Mosaic plot
Grouped Bar Graph
Associations between categorical and numerical variables
Multiple histograms
Associations between two numerical variables
Scatterplots
Evaluating Graphics Lie factor Chartjunk Efficiency
Don’t mislead with graphics
Better representation of truth
Lie Factor Lie factor = size of effect shown in graphic size of effect in data
Lie Factor Example Effect in graphic: 2.33/0.08 = 29.1 Effect in data: 6748/5844 = 1.15 Lie factor = 29.1 / 1.15 = 25.3
Chartjunk
Needless 3D Graphics
Summary: Graphical methods for frequency distributions
Summary: Associations between variables
Great book on graphics
Describing data
Two common descriptions of data Location (or central tendency) Width (or spread)
Measures of location Mean Median Mode
Mean n is the size of the sample
Mean Y 1 =56, Y 2 =72, Y 3 =18, Y 4 =42
Mean Y 1 =56, Y 2 =72, Y 3 =18, Y 4 =42 = ( ) / 4 = 47
Median The median is the middle measurement in a set of ordered data.
The data:
The data: can be put in order: Median is 25.
Mean vs. median in politics 2004 U.S. Economy Republicans: times are good –Mean income increasing ~ 4% per year Democrats: times are bad –Median family income fell Why?
Measures of width Range Standard deviation Variance Coefficient of variation
Range
Range The range is = 22
Population Variance
Sample variance n is the sample size
Shortcut for calculating sample variance
Standard deviation (SD) Positive square root of the variance is the true standard deviation s is the sample standard deviation
In class exercise Calculate the variance and standard deviation of a sample with the following data: 6, 1, 2
Answer Variance=7 Standard deviation =
Coefficient of variance (CV) CV = 100 s /.
Equal means, different variances
Manipulating means The mean of the sum of two variables: E[X + Y] = E[X]+ E[Y] The mean of the sum of a variable and a constant: E[X + c] = E[X]+ c The mean of a product of a variable and a constant: E[c X] = c E[X] The mean of a product of two variables: E[X Y] = E[X] E[Y] if and only if X and Y are independent.
Manipulating variance The variance of the sum of two variables: Var[X + Y] = Var[X]+ Var[Y] if and only if X and Y are independent. The variance of the sum of a variable and a constant: Var[X + c] = Var[X] The variance of a product of a variable and a constant: Var[c X] = c 2 Var[X]
Parents’ heights MeanVariance Father Height Mother Height Father Height +Mother Height