Unit 1: Representing Data & Analysing 2D Data 1.2 Understanding Variability in Data
Variability in Data? How spread out the data is E.g.: if you are measuring shoe size, the data will be variable for 40-year olds than for 5-year olds. more Important to determine how variable the data is Can affect how we see our data Need to be sure of our samples
Variability Makes statistics interesting Allows us to interpret, model, make predictions from data Exists everywhere Some data vary a lot, some a little Can vary within a sample Can vary from sample to sample
Causes of Variation Instrument measurements not perfectly precise –E.g. in physical sciences may assume that quantity being measured is unchanging & stable –Variation due to observational error
Causes of Variation Variation may be due to natural phenomena (biology, sociology, manufacturing, etc.) –Different members of the population may vary greatly –Hair colour & length –Number and placement of spots on dalmatian dogs –Size of widgets might be ±1 mm off
We want to understand why things vary. By thinking about and examining the variables, we can try to understand the different reasons and sources for variability
Describing and Representing Variability Graphs of data show how things vary and reveal patterns Different graphs may reveal different aspects of variation We can also use numerical summaries (like mean, median, mode, range, IQR, σ, r, r 2, etc.) to tell us about the variation of the data
To the activity! Group A: compare the four samples of 50 students Group B: compare one of the samples of 50 to 100, 150, and one of the 200 Group C: compare the four samples of 200 Write conclusions on whiteboard Finish the activity by
Samples of 50 Means and medians are far apart Differ from sample to sample Histograms generally have different shapes
Samples of 50, 100, 150, 200 As sample size grows, data gets more spread out As the sample size grows, means and medians get closer together Variability decreases as sample size grows
Samples of 200 Data pretty spread out Means and medians are about the same for each set Histograms have more or less the same shape
Conclusions Larger samples have more variation than smaller samples from the same population BUT the larger the sample, the lower the variation from sample to sample Larger sample sizes have lower sampling variance