3. Data analysis SIS
Exercise 3.1 Trend consistent change (increase or decrease) eg atmospheric temperatures Pattern Repeated change (increase and decrease) seasonal variations
Exercise 3.2 Which is a more useful graph – join the dots or line of best fit? they both have uses join the dots shows the variation, LOBF the trend Can you see any trend? decrease What effect would it have on your certainty if you only had data from 1996, 2000, 2004 and 2008? still a decrease but less obvious from 2000 What value do you think might occur in 2009? in 2030?
Time series a line graph where a measured variable is plotted against time can help identify patterns, eg: a trend, a repeating cycle, random fluctuation (not actually a pattern) a combination of all three natural (random) variation in the measurements may cause the pattern to be blurred by “noise”
Figure 3.1 is there a trend in this data? a small rise(???) a line of best fit through the data is very dubious smooth the data clear some of the noise the real pattern becomes clear smoothing means the loss of the raw data should be clearly shown as smoothed
The running means smoothing method requires that there are no gaps in the data each time interval between the data is the same calculate and plot the mean of a number of successive data points three and five are common Example 3.1
Exercise 3.2 data smoothed
Linear regression a fancy name for line of best fit Class Exercise 3.4 The data in Exercise 3.2: slope of –0.1 and a y-intercept of 211. (a) Use this to calculate the value for the fallout in (i) 2000, (ii) 2009 and (iii) 2030 2000: 11.0 2009: 10.1 2030: 8.0 (b) Which do you believe is the (i) most accurate and (ii) least accurate? Most: 2000 Least: 2030
Exercise 3.3 What is the difference between interpolation and extrapolation? interpolation – determining a value within the data range extrapolation – determining a value outside the data range
Extrapolation error
3.1 Outliers data points in a set that seem to be so different from the rest they don’t belong (??) and should be deleted (??) leaving them in changes the mean and standard deviation unless the measurement process for the suspect point is known to have a problem you should not simply remove it without testing
Example 3.2 9.21 9.13 9.05 9.25 8.95 9.10 8.99 4.28 9.22 With outlier Without outlier Mean 8.62 9.11 SD 1.53 0.11
The Q-test for outliers calculate a test-value (Q in this case) from equation based on the data compare this value to a table of values make a judgement on the basis of the comparison Q = | vo – vn | ÷ r vo is the value of the outlier vn the value of the nearest data point r the range (always positive) compare to table value if Q > table, the outlier can be deleted
Example 3.3 Can the 4.28 point from the DO data in Example 3.4 be eliminated (using medium limits)? Q = (8.95 – 4.28) ÷ (9.25 – 4.28) = 0.94 table value is 0.48 (10 pts) 0.94 (Q) > 0.48 (table) can safely delete the 4.28 data point
Exercise 3.4 (a) 15, 22, 18, 6, 25, 19 doubtful value is 6 Q = |15 – 6|÷ (25-6) = 0.47 Table value = 0.64 > Q value Can’t discard (b) 0.75, 0.83, 0.53, 0.82, 0.76, 0.81, 0.69, 1.03 doubtful values are 0.53 &1.03 Q = 0.32 & 0.42; Table value = 0.54 Can’t discard either (c) 41.5, 46.2, 41.6, 42.0, 41.1, 42.1 doubtful value is 46.2 Q = 0.80; Table value = 0.54 Can discard