1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Introduction to Graphing The Rectangular Coordinate System Scatterplots.
5.1 Rules for Exponents Review of Bases and Exponents Zero Exponents
Angstrom Care 培苗社 Quadratic Equation II
Copyright © Cengage Learning. All rights reserved.
1
Ecole Nationale Vétérinaire de Toulouse Linear Regression
STATISTICS Joint and Conditional Distributions
STATISTICS Linear Statistical Models
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Detection of Hydrological Changes – Nonparametric Approaches
SMA 6304 / MIT / MIT Manufacturing Systems Lecture 11: Forecasting Lecturer: Prof. Duane S. Boning Copyright 2003 © Duane S. Boning. 1.
CALENDAR.
Variation, uncertainties and models Marian Scott School of Mathematics and Statistics, University of Glasgow June 2012.
Time series modelling and statistical trends
Introduction to modelling extremes
Environmental change and statistical trends – some examples Marian Scott Dept of Statistics, University of Glasgow NERC August 2012.
Environmental change and statistical trends – some examples Marian Scott Dept of Statistics, University of Glasgow NERC September 2010.
Environmental change and statistical trends – some examples
Environmental change and statistical trends – some examples Marian Scott Dept of Statistics, University of Glasgow NERC September 2011.
Measurement and assessment of change What it the status quo in environmental science? In time – A simple trend line – A p-value or a 95% confidence interval.
Marian Scott SAGES, March 2009
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.
Assumptions underlying regression analysis
Chapter 7 Sampling and Sampling Distributions
Simple Linear Regression 1. review of least squares procedure 2
EC220 - Introduction to econometrics (chapter 1)
Stationary Time Series
Contributed by National Academy of Statistical Administration
Time Series Analysis -- An Introduction -- AMS 586 Week 2: 2/4,6/2014.
A.S. 3.8 INTERNAL 4 CREDITS Time Series. Time Series Overview Investigate Time Series Data A.S. 3.8 AS91580 Achieve Students need to tell the story of.
Factoring Quadratics — ax² + bx + c Topic
Module 4. Forecasting MGS3100.
3/2003 Rev 1 I – slide 1 of 33 Session I Part I Review of Fundamentals Module 2Basic Physics and Mathematics Used in Radiation Protection.
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
Copyright © 2013, 2009, 2005 Pearson Education, Inc.
LIAL HORNSBY SCHNEIDER
Oil & Gas Final Sample Analysis April 27, Background Information TXU ED provided a list of ESI IDs with SIC codes indicating Oil & Gas (8,583)
2009 Foster School of Business Cost Accounting L.DuCharme 1 Determining How Costs Behave Chapter 10.
Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1 Section 5.5 Dividing Polynomials Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Functions, Graphs, and Limits
Chapter 1: Expressions, Equations, & Inequalities
Determining How Costs Behave
©2003 Prentice Hall Business Publishing, Cost Accounting 11/e, Horngren/Datar/Foster Determining How Costs Behave Chapter 10.
Chapter 10 Correlation and Regression
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
Chapter 8 Estimation Understandable Statistics Ninth Edition
Exponents and Radicals
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Simple Linear Regression Analysis
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Correlation and Linear Regression
Multiple Regression and Model Building
16. Mean Square Estimation
1 McGill University Department of Civil Engineering and Applied Mechanics Montreal, Quebec, Canada.
9. Two Functions of Two Random Variables
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Forecasting Using the Simple Linear Regression Model and Correlation
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
CHAPTER 29: Multiple Regression*
Presentation transcript:

1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

2 Measurement and assessment of change- Two topics to consider –Regression modelling- in general –Time series Leading to –Trends (combining time series and regression ideas) Meet some examples Cover some of the ideas Apply them

An example of some typical environmental time series data 3

4

5 Trends and change In time (SNIFFER, 2006) – A linear regression equation was calculated for each dataset and then the trend was calculated from the gradient parameter (i.e. the rate of change) multiplied by the length of the data period to provide a clear change value since the start of the period. the significance of trends was tested using the non-parametric Mann-Kendall tau test (Sneyers, 1990). Linear trends with the Mann-Kendall significance test are widely used in the analysis of climate trends

6 Joint Nature Conservation Council definition of trend a trend is a measurement of change derived from a comparison of the results of two or more statistics. A trend relates to a range of dates spanning the statistics from which it is derived, e.g A trend will generally be expressed as a percentage change (+ for an increase, - for a decrease) or as an index.

7 Statistical definition of trend What is a statistical trend? –A long-term change in the mean level (Chatfield, 1996) –Long-term movement (Kendall and Ord, 1990) –The non-random function (t)= E (Y(t)) (Diggle, 1990) Trend is a long-term behaviour of the process, trends in mean, variance and extremes may be of interest (Chandler, 2002) Environmental change often but not always means a statistical trend Not restricted to linear (or even monotonic) trends

8 Statistical tools for exploring and quantifying trend Exploratory tools –Scatterplot, Time series plots, smoothed trends over time (are the series equally spaced, no missing data?) More formal tools –Can you assume monotonicity?, is the trend linear? –Non-parametric estimation and testing (classic tests) –Semi-parametric and non-parametric additive models (for irregular spaced data) what is monotonic? steadily increasing or decreasing

Simple Regression Model The basic regression model assumes: – The average value of the response y, is linearly related to the explanatory x, – The spread of the response y, about the average is the SAME for all values of x, The VARIABILITY of the response y, about the average follows a NORMAL distribution for each value of x.

Simple Regression Model Model is fit typically using least squares Goodness of fit of model assessed based on residual sum of squares and R 2 Assumptions checked using residual plots Inference about model parameters For water quality data, the response would be TOC, the explanatory would be year

Chlorophyll and nitrogen relationship

Regression Output The regression equation is chloro = N Predictor Coef StDev T P Constant N S = R-Sq = 67.5% R-Sq(adj) = 66.1%

Conclusions the equation for the best fit straight line has an intercept of -1.7 and a slope of Thus for every unit increase in N, the chloro measures increases by The R 2 (adj) value is 66.1%, so we have explained 66% of the variation in chloro by its relationship to N. The S value is 15.19, which describes the variation in the points around this fitted line.

Checking assumptions Usually based round residuals Residuals are the differences between each observation and the corresponding model fitted value They can be positive or negative but should be on average zero. Residual plots are common model assessment tools (scatterplot of residuals vs fitted values)

Confidence and prediction intervals

16 A straight line model for the Nile Annual river flow from ~1870 Straight line is a relatively poor fit, lots of variation.

17 A straight line model for the Nile relatively poor fit, lot of variation. Any pattern in the residuals?

18 A quadratic model for the Nile better fit, still lots of variation Gives a smooth change, not abrupt Any pattern in the residuals?

19 a non-parametric model for the Nile a smooth function (LOESS) or non- parametric regression model OK? In later sessions, you will see some more flexible modelling tools

20 Regression examples? In practical3final.txt, some R commands to complete some analyses Example 1: Loch Lomond, plots and simple regression

21 what is a time series? a time series is a sequence of measurements made over time. notationally, this would commonly be written as y 1, y 2,…, y i, ….y T the index i denotes the position in the sequence of observations often we will assume that the data are equally spaced-so that i is truly an index, but for many environmental time series observations are not equally spaced.

22 how to plot the data a time series plot choice of the x-axis scale –occasionally, each observation is indexed by its position in the sequence (OK if equally spaced) –alternatively, we may use the actual timescale (e.g. if an annual series, years or a daily series, then days ) –or we may regard time on a continuous scale (time might be recorded in decimal form e.g which would be June 1986)- this latter is often the preferred form for statistical modelling (time is then a continuous variable)

23 How is biodiversity changing (EEA CSI 009) Populations of common and widespread farmland bird species in 2003 are only 71% of their 1980 levels. an annual indicator

24 How is biodiversity changing (kitiwakes) (JNCC DEFRA) the UK index of kittiwake abundance has declined rapidly since the early 1990s, such that by 2009 the index was just 50% of that in 1986, the lowest value in the 24 years of monitoring. Notice the uncertainty bands

25 Water quality- freshwater Concentrations of P generally decreased Nitrate concentrations decreasing What are the rates of change and are they significant?

26 Another example - monthly mean CO 2 levels

27 Example: a time series plot (daily values) the x-axis shows the actual date

28 Loch Leven (NERC- CEH)

29 Example- air quality, monitored through time (from EMEP programme) note the gaps and the rather extreme values- one strategy is to take logs These are daily data

30 Data

31 Observed temperature anomalies in Europe. Change in different periods of the year may have different effects, –start of the growing season determined by spring and autumn temps, –changes in winter important for species survival. –note that the presentation shows winter and summer separately

32 Nitrate in the Clyde sea area in different seasons River Clyde

33 Loch Leven

34 Environmental time series data features patterns over time (both short and long term) often missing data- may cause problems for statistical analysis variation, which may not be constant over time so may need to consider transformations (log)

35 Seasonal patterns (cycles) in many environmental times series, we could imagine some periodicity (e.g. such as a monthly pattern in temperature) so it is common to produce a seasonality plot. the index (x-axis scale) depends on the period over which the cycle repeats itself (monthly, daily) We will need to include a term in any model to describe these features

36 Example: Loch Leven, monthly data- data are plotted over the months of the year (Lowess smooth included)

37 what are the questions of interest? we want to know about trends, where a trend is defined to be: –the long-term sweep of the data. we want to know about possible seasonality (or cycles) –The seasonal component of a time series describes a regular fluctuation which has a period. (The period is the time interval between consecutive peaks or troughs.)

38 Regression examples? In practical3final.txt, some R commands to complete some analyses Qn 1b) 1: Loch Lomond, plots and simple regression- and with an investigation of seasonality Qn 2: dissolved oxygen in Clyde- simple and multiple regression, year, temperature and salinity are explanatory variables

39 a descriptive model A useful descriptive model for a time series consists of 3 components: X = Trend + Seasonal Component + Irregular Component or X = T+S+I I is the irregular component, which is left over when the trend, and seasonal components are all accounted for. It is an irregular or random fluctuation (like residuals in regression).

40 smoothing a time series In many time series, the seasonal variation can be so strong that it obscures any trend or cyclical component. However, for understanding the process being observed (and forecasting future values of the series), trends and cycles are of prime importance. Smoothing is a process designed to remove seasonality so that the long-term movements in a time series can be seen more clearly

41 Example: different smoothing technique applied to air quality data (that have been logged)

42 Example : water quality in the River Clyde A very complex regression model is of the form –y i = 0 (x i ) + 1 (x i )cos(2 x i - (x i )) + i ; i = 1;…;n; –includes a mean trend term and seasonal variation as follows: x i is year in decimal term –This includes smooth terms 0 and 1 and a varying coefficient seasonal term (modelled parametrically) using cosines –This can be simplified by setting some parameters to be constant

43 Seasonality-river Clyde

44

45 Example : Loch Leven-trends correcting for covariates Loch Leven: key loch for water framework directive: environmental effect of interest is eutrophication: measurement series covers 30 years, including a variety of biological, chemical and hydrological indicators but irregular in time. Substantial improvement in the loch water quality,

46 Loch Leven

47 other examples to try Qns 3 in practical3final.txt Qn3 asks whether DO is different before and after an upgrade to Shieldhall sewage work, to do this in a regression framework we need to introduce a FACTOR (a variable that takes only two values to identify before and after 1985).

48 When time is the explanatory variable in many situations, we expect successive observations to show correlation at adjacent time points (most likely stronger the closer the time points are), strength of dependence usually depends on time separation or lag for regularly spaced data, we typically make use of the autocorrelation function (ACF) to asses how strong this correlation is We have not considered this in the earlier examples but.....