Testing for equal variance Scale family: Y = sX G(x) = P(sX ≤ x) = F(x/s) To compute inverse, let y = G(x) = F(x/s) so x/s = F -1 (y) x = G -1 (y) = sF.

Slides:



Advertisements
Similar presentations
Autocorrelation Functions and ARIMA Modelling
Advertisements

The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
10-3 Inferences.
Inference for Regression
Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models
Economics 20 - Prof. Anderson1 Time Series Data y t =  0 +  1 x t  k x tk + u t 2. Further Issues.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Computing the ranks of data is only one of several possible so- called scoring methods that are in use... Section 2.7 reviews three of them – we’ll look.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Nonparametric Methods Chapter 15.
How should these data be modelled?. Identification step: Look at the SAC and SPAC Looks like an AR(1)- process. (Spikes are clearly decreasing in SAC.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Forecasting JY Le Boudec 1. Contents 1.What is forecasting ? 2.Linear Regression 3.Avoiding Overfitting 4.Differencing 5.ARMA models 6.Sparse ARMA models.
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
Economics Prof. Buckles1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
1 Power 2 Econ 240C. 2 Lab 1 Retrospective Exercise: –GDP_CAN = a +b*GDP_CAN(-1) + e –GDP_FRA = a +b*GDP_FRA(-1) + e.
Stat 301- Day 32 More on two-sample t- procedures.
12.3 Correcting for Serial Correlation w/ Strictly Exogenous Regressors The following autocorrelation correction requires all our regressors to be strictly.
Lecture 9: One Way ANOVA Between Subjects
ARIMA Forecasting Lecture 7 and 8 - March 14-16, 2011
Chapter 11 Multiple Regression.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Biostatistics in Research Practice: Non-parametric tests Dr Victoria Allgar.
Uncertainty of sea level rise, rankings etc. First some results on the temperature hiatus.
Correlation and Regression Analysis
Nonparametrics and goodness of fit Petter Mostad
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
BOX JENKINS METHODOLOGY
AR- MA- och ARMA-.
Inference for regression - Simple linear regression
Copyright, Gerry Quinn & Mick Keough, 1998 Please do not copy or distribute this file without the authors’ permission Experimental design and analysis.
STA291 Statistical Methods Lecture 27. Inference for Regression.
Regression Method.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
User Study Evaluation Human-Computer Interaction.
Previous Lecture: Categorical Data Methods. Nonparametric Methods This Lecture Judy Zhong Ph.D.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
#1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not.
Tutorial for solution of Assignment week 39 “A. Time series without seasonal variation Use the data in the file 'dollar.txt'. “
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
It’s About Time Mark Otto U. S. Fish and Wildlife Service.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Large sample CI for μ Small sample CI for μ Large sample CI for p
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Simple linear regression Tron Anders Moger
Estimation Method of Moments (MM) Methods of Moment estimation is a general method where equations for estimating parameters are found by equating population.
Statistics 349.3(02) Analysis of Time Series. Course Information 1.Instructor: W. H. Laverty 235 McLean Hall Tel:
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Computing the ranks of data is only one of several possible so-called scoring methods that are in use... Section 2.7 reviews three of them – we’ll look.
The Box-Jenkins (ARIMA) Methodology
MODELS FOR NONSTATIONARY TIME SERIES By Eni Sumarminingsih, SSi, MM.
1 Probability and Statistics Confidence Intervals.
Lesson Test to See if Samples Come From Same Population.
Midterm. T/F (a) False—step function (b) False, F n (x)~Bin(n,F(x)) so Inverting and estimating the standard error we see that a factor of n -1/2 is missing.
Ch16: Time Series 24 Nov 2011 BUSI275 Dr. Sean Ho HW8 due tonight Please download: 22-TheFed.xls 22-TheFed.xls.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Subodh Kant. Auto-Regressive Integrated Moving Average Also known as Box-Jenkins methodology A type of linear model Capable of representing stationary.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
1 Autocorrelation in Time Series data KNN Ch. 12 (pp )
Chapter 6: Autoregressive Integrated Moving Average (ARIMA) Models
CHAPTER 16 ECONOMIC FORECASTING Damodar Gujarati
Lecturer Dr. Veronika Alhanaqtah
CH2 Time series.
BOX JENKINS (ARIMA) METHODOLOGY
Presentation transcript:

Testing for equal variance Scale family: Y = sX G(x) = P(sX ≤ x) = F(x/s) To compute inverse, let y = G(x) = F(x/s) so x/s = F -1 (y) x = G -1 (y) = sF -1 (y) Δ(x) = G -1 (F(x)) – x = s F -1 (F(x)) – x = (s-1)x

Shiftplot Blue slopes (-0.65,-0.20) CI for scale ratio (0.35,0.8)

Assumptions Iid Scale family Need moderately large samples

Testing equal variance for distributions with equal locations Ranking m X-values and n Y- values, the average rank is (n+m)(n+m+1)/4 If F is more spread out than G, and the locations are the same, we would tend to have more large and small residuals from the mean rank for the X-values. One way to get at this is to assign rank 1 to the smallest and largest values, 2 to second smallest and second largest, and continue in towards the middle.

The Ansari-Bradley test Compute the sum of the X-ranks as where p=[(m+n+1)/2] and 1 i X is the indicator of the i th observation in the combined ordered sample is an X. Small values of W correspond to F being more dispersed. In practice, align the locations first.

Null distribution Let f(w,m,n) be the number of orders with m 1 and n 0 that yield the statistic value W=w. Assume 2N=m+n is even. If we add one more X, either it or a Y is N+1. If it is a Y there are f(w,m,n) ways, while if it is an X, there are f(w-N-1,m-1,n+1) ways. Thus we get the recursion f(w,m,n+1)= f(w,m,n) + f(w-N-1,m-1,n+1)

Null distribution, cont. Thus E(W)=m(m+n+2)/4 R: ansari.test(x,y) On the exponential samples, subtracting the median from each sample, p = 5x10 -8 CI = (0.40,0.60) estimate 0.49

Assumptions Iid Known difference between locations “No rank test (i.e., a test invariant under strictly increasing transformation of the scale) can hope to be a satisfactory test against dispersion alternatives without some sort of strong restrictions (e.g., equal or known medians) being placed on the class of admissible distribution pairs. “ (Moses, 1963)

Another rank test of variability Siegel-Tukey: Sum of green ranks 14 -4x5/2 = 4 Compare to Mann-Whitney distribution P-value 2 x = 0.19 For exponential samples P-value is

NOAA State of the Climate web site

State of the Climate 2008 rwrwrw

Shen et al. (2012) 1921, 4 th warmest 2 nd warmest –14 th warmest

So we don’t really know which is the fourth warmest year But we have standard errors for each year Can we use the standard errors to assess the uncertainty in ranks?

Simple approach Draw independent normal random numbers with the right mean and sd for each year Rank Repeat to get an ensemble of paths. R code: content/uploads/2012/10/Uncertainty-analysis.txt

Rank distribution

But aren’t years dependent? Autocorrelation = correlation with itself shifted over

Lagged plots

Autoregression Idea: Predict the current value from previous values k’th order autoregression R commands library(forecast) acf(series) ar(series)

Moving average Idea: Current value is obtained by weighted average of previous errors Moving average of order k auto.arima(series)

ARIMA models George Box and Gwilym Jenkins We have already seen AR and MA ARIMA(0,1,0): X t = X t-1 +  t or  t = X t – X t-1, differencing Can be iterated. ARIMA(p,1,q) has  t following an ARMA(p,q) model

Why worry? In climate contexts we are often interested in fitting trends. Here is a sequence of slope fits to US monthly average temperature: OLS °C/ysd *** WLS °C/ysd *** GLS (AR4) °C/ysd * GLS (ARMA(3,1) °C/ysd The same data, increasingly realistic models. Significance disappears.

Does dependence matter? Structure iid Structure ARMA(3,1)

Effect of dependence Independent Dependent

Rank sd

Back to State of the Climate “ was the warmest year in the period of record for the nation.”

Need to extrapolate standard error se(2012) ≈ 0.08 anomaly(2012) = 1.7 anomaly(1998) = /0.08 ≈ 6 !!!

And the uncertainty in the ranking of 2012 is...

NOAA State of the Climate 2014 The probability that 2014 was... Warmest year on record: 48.0% One of the five warmest years: 90.4% One of the 10 warmest years: 99.2% One of the 20 warmest years: 100.0% Warmer than the 20th century average: 100.0% Warmer than the average: 100.0%

IPCC report The latest IPCC report claimed that the last three decades were the warmest on record, based on global decadal averages. Using the Hadley Center series, we investigate this claim.

Last year warmest on record? 2015 was widely reported as the warmest year on record for annual global average temperature. We use the Hadley temperature series to investigate this claim. Based on 100,000 simulations, 2015 is the warmest in all but 724, but it could be as low as the 6 th warmest. Other candidates for warmest year are 2014, 2010, 2004 and No year before 1997 was ranked warmest.