Presentation is loading. Please wait.

Presentation is loading. Please wait.

NY Times 25 November 2008.

Similar presentations


Presentation on theme: "NY Times 25 November 2008."— Presentation transcript:

1 NY Times 25 November 2008

2 Stat 153 - 24 Nov 2008 D. R. Brillinger
Chapter 14 - Examples continued Question Data Analyses Conclusions

3 Why? Chatfield's example 14.1
Monthly mean air temperature at Recife Table doesn't indicate source ???? Chatfield's objective - "to describe and understand the data" Objectives here - to extend Chatfield's series - to layout analyses - to explore the data for surprises - predicted values - signal + noise? - ...

4

5 Finding the data. Google with various key words: temperature, Recife, ... "Eventually lead" to: cdiac.ornl.gov/ftp/ndp041 Carbon dioxide information analysis center! Had to discover Recife Curado station id Years Searched an inappropriate ste for a long time (Looked at Brasil sites too, but that didn't turn up the data)

6 notice -9999 replace by NA file: recifecurado
The web data notice replace by NA file: recifecurado

7 How to handle missing values? Interpolate? Model? ...?
junk<-scan("recifecurado") junk1<-matrix(junk,ncol=48) junk2<-junk1[2:13,] # years in first row series<-c(junk2)/ # for degrees centigrade length(series[is.na(series)]) #17 - need to understand missingness Interpolation series1<-series for(i in 2:(length(series)-1)){if(is.na(series[i]))series1[i]<-.5*series[i-1] +.5*series[i+1]} Some values did not agree with Chatfield's

8 plot(xaxis,series1,type="l",xlab="year",ylab="mean temp (degrees C)",las=1)
title("Mean monthly temperatures Recife Curado") abline(h=mean(series1))

9 There is seasonality and variability
Restricted range in mid-sixties - nonconstant mean level? ylim<-range(series1) par(mfrow=c(2,1)) plot(lowess(xaxis,series1),type="l",ylim=ylim,xlab="year",ylab="degrees C",main="Smoothed Recife series") abline(h=mean(series1)) junk20<-lowess(xaxis,series1) plot(xaxis,series1-junk20$y,type="l",xlab="year",ylab="degrees C",main="Residuals") abline(h=mean(series1-junk20$y))

10

11 par(mfrow=c(1,1)) acf(series1,las=1,xlab="lag(mo)",ylab="",main="autocorrelation recife temperatures",lag.max=50,ylim=c(-1,1))

12 More confirmation of period 12
Remember the interpretation of the error lines Note that nearby values are highly correlated

13 spectrum(series1,xlab="frequency (cycles/month)",las=1)

14 Note peaks at frequency 1/12 and harmonics
Further confirmation of period 12 Note log scale for y-axis Note vertical line in upper right Gives uncertainty

15 What is the shape of the seasonal?
junk4<-matrix(series1,nrow=12) junk5<-apply(junk4,1,mean) plot(junk5,type="l",las=1) abline(h=mean(junk5))

16 Cooler in July-Aug Southern Hemisphere Uncertainty?

17 Cooler in July-August. Southern hemisphere
Part of a longer cycle? El Nino explanatory? After "removing" trend middle has been pulled up Need uncertainties Back to original data

18 Remove seasonal series2<-series1 for(i in 1:48){ for(j in 1:12){ series2[(i-1)*12+j]<-series1[(i-1)*12+j]-junk5[j] } par(mfrow=c(2,1)) plot(xaxis,series2,type="l",xlab="year",ylab="residual",main="Series after removing seasonal",las=1) abline(h=0) ylim<-range(series2) plot(xaxis,series1-mean(series1),type="l",xlab="year",ylab="degreesC",main="Mean removed series",las=1,ylim=ylim) abline(h=mean(series1-mean(series1)))

19 original variance 1.342 adjusted .248

20 par(mfrow=c(2,1)) acf(series2,lag.max=50,las=1,xlab="lag (mo)",main="Ajusted by removing monthly means",las=1) acf(diff(series1,lag=12),lag.max=50,xlab="lag (mo)",main="Order 12 differenced series")

21 Work remains Frequency domain analysis. par(mfrow=c(2,1))
junk9<-spec.pgram(series1,taper=0,detrend=F,demean=F,spans=5,plot=F) ylim<-range(junk9$spec) junk9<-spec.pgram(series1,taper=0,detrend=F,demean=F,spans=5,xlab="frequency (cycles/mo)",las=1,main="Original series") junk10<-spec.pgram(series2,taper=0,detrend=F,demean=F,spans=5,ylim=ylim,main="Monthly means removed",las=1)

22 Work remains on seasonal
Residual "not" white noise

23 Time domain distributions

24 par(mfrow=c(2,2)) boxplot(series2,main="Seasonally adjusted temps") hist(series2,breaks=15,xlab="temperature",las=1,ylab="count",main="Seasonally adjusted temps") plot(density(series2),las=1,xlab="temperature",ylab="density",las=1,main="") library(MASS) Junk1<-kde2d(series2[1:575],series2[2:576]) image(Junk1) points(series2[1:575],series2[2:576])

25 Parametric model. SARIMA ?
Thinking about prediction, consider Yt = αYt-1 + βYt-12 + Nt with some ARMA for Nt Check seasonal residuals for normality Hope to end up with white noise

26 Junk<-arima(series1,order=c(1,0,1),seasonal=list(order=c(1,0,1),period=12))
Call: arima(x = series1, order = c(1, 0, 1), seasonal = list(order = c(1, 0, 1), period = 12)) Coefficients: ar ma1 sar1 sma1 intercept s.e sigma^2 estimated as : log likelihood = , aic =

27

28 tsdiag(Junk,gof.lag=25)

29

30 Junk<-arima(series1,order=c(1,0,1),seasonal=list(order=c(1,0,1),period=12))
postscript(file="recifeplots1a.ps",paper="letter",hor=T) Junk2<-predict(Junk,n.ahead=24) Junk3<-c(series1,Junk2$pred) Junk3a<-c(rep(0,576),2*Junk2$se) Junk3b<-c(rep(0,576),-2*Junk2$se) Junk4a<-Junk3+Junk3a;Junk4b<-Junk3+Junk3b ylim<-range(Junk4a,Junk4b) par(mfrow=c(1,1)) xaxis1<-1941+(1:length(Junk3)/12) plot(xaxis1[xaxis1>1983],Junk4a[xaxis1>1983],type="l",las=1,ylim=ylim,col="red",xlab="year",ylab="degrees C",main="Data + predictions") lines(xaxis1[xaxis1>1983],Junk4b[xaxis1>1983],col="red") lines(xaxis1[xaxis1>1983],Junk3[xaxis1>1983],col="blue") lines(xaxis[xaxis>1983],series1[xaxis>1983])

31 Chatfield. "... we have found little of interest apart from what is evident in the time plot." He worked with years

32 Two series Bivariate case {Xt, Yt} - jointly distributed Linear time invariant / transfer function model nonparametric/parametric approaches

33 Southern Oscillation Index
El Niño: global coupled ocean-atmosphere phenomenon. The Pacific ocean signatures, El Niño and La Niña are important temperature fluctuations in surface waters of the tropical Eastern Pacific Ocean

34 Southern Oscillation reflects monthly or seasonal fluctuations in the air pressure difference between Tahiti and Darwin

35 junk<-scan("recifecurado")
junk1<-matrix(junk,ncol=48) junk6<-junk1[1,] junk1<-junk1[,junk6>1950] junk2<-junk1[2:13,] series<-c(junk2)/10 length(series[is.na(series)]) #13 xaxis<-1951+(1:length(series)/12) series1<-series junk4<-matrix(series1,nrow=12) junk5<-apply(junk4,1,mean) for(i in 2:(length(series)-1)){if(is.na(series[i]))series1[i]<-.5*series[i-1]+.5*series[i+1]} series2<-series1 for(i in 1:38){ for(j in 1:12){ series2[(i-1)*12+j]<-series1[(i-1)*12+j]-junk5[j]}}

36 kunk<-scan("SOIa.dat")
kunk1<-matrix(kunk,ncol=58); kunk6<-kunk1[1,] kunk1<-kunk1[,kunk6<1989] kunk2<-kunk1[2:13,] teries<-c(kunk2) length(teries[is.na(teries)]) #0 teries1<-teries; teries2<-teries1 postscript(file="recifeplots3.ps",paper="letter",hor=T) par(mfrow=c(2,1)) plot(xaxis,series2,type="l",las=1,xlab="year",ylab="",main="Seasonally adjusted Recife temps") plot(xaxis,teries2,type="l",las=1,xlab="year",ylab="",main="Southern Oscillation Index")

37 postscript(file="recifeplots2.ps",paper="letter",hor=F)
par(mfrow=c(1,1)) acf(cbind(series2,teries2))

38

39 junk10<-cbind(series2,teries2)
junk11<-spec.pgram(junk10,plot=F,taper=0,detrend=F,demean=F,spans=11) par(mfcol=c(2,2)) plot(junk11$freq,10**(.1*junk11$spec[,2]),log="y",main="SOIspectrum", xlab="frequency", ylab="", las=1,type="l") plot(junk11$freq,junk11$coh,main="Coherence",xlab="frequency",ylab="",las=1,ylim=c(0,1),type="l") junkh<-1-(1-.95)**(1/(.5*junk11$df-1)) abline(h=junkh) plot(junk11$freq,10**(.1*junk11$spec[,1]),log="y",main="Seasonally corrected Recife spectrum",xlab="frequency", ylab="",las=1, type="l")

40 SARIMAX Yt = αYt-1 + βYt-12 + γXt + Nt ar ma1 sar1 sma1 intercept teries1 s.e sigma^2 estimated as : log likelihood = , aic = Junk1<-arima(series1,order=c(1,0,1),seasonal=list(order=c(1,0,1),period=12),xreg=teries1)

41 The Box-Jenkins Airline Data
International airline passengers Monthly totals (thousands) January 1949-December 1960 n = 144 Brown, R. G. (1962) Smoothing, Forecasting and Prediction of Didcrete Time Series. Prentice-Hall.

42

43

44

45 postscript(file="airline.ps",paper="letter")
data(AirPassengers) y<-c(AirPassengers) xaxis<-1949+c(1:length(y))/12 plot(xaxis,y,type="l",xlab="year",ylab="Passengers in thousands",main="BJ airline data",las=1) plot(xaxis,y,type="l",xlab="year",ylab="Passengers in thousands",main="BJ airline data",las=1,log="y") Y<-log(y) plot(xaxis,Y,type="l",xlab="year",ylab="log(Passengers)",main="BJ airline data",las=1)

46

47

48 Y1<-diff(Y) plot(xaxis[1:length(Y1)],Y1,type="l",xlab="year",ylab="diff(log(Passengers))",main="BJ airline data",las=1) abline(h=0,lty=3) Y2<-diff(Y1,12) plot(xaxis[1:length(Y2)],Y2,type="l",xlab="year",ylab="diff12(diff(log(Passengers)))",main="BJ airline data",las=1)

49 fit <- arima(Y, order=c(0,1,1),seasonal = list(order=c(0,1,1),period=12))
arima(x = Y, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12)) ma1 sma1 s.e sigma^2 estimated as : log likelihood = 244.7, aic =

50

51

52 junk<-spectrum(fit$resid)
junkh<-10*log10(mean(10^(junk$spec/10))) abline(h=junkh,col="red") acf(fit$resid)

53 tsdiag(fit)

54

55 fit1<-arima(Y[1:108],order=c(0,1,1),seasonal = list(order=c(0,1,1),period=12))
pred<-predict(fit1,n.ahead=36) Y3<-Y Y3[109:144]<-pred$pred Y4<-Y Y4[109:144]<-pred$pred+2*pred$se Y5<-Y Y5[109:144]<-pred$pred-2*pred$se plot(xaxis,Y4,type="n",xlab="year",ylab="log(passengers in thousands)",main="BJ airline data",las=1) lines(xaxis,Y5,col="red") lines(xaxis,Y3,col="blue") lines(xaxis,Y) lines(xaxis,Y4,col="red")


Download ppt "NY Times 25 November 2008."

Similar presentations


Ads by Google