Hypothesis test in climate analyses Xuebin Zhang Climate Research Division.

Slides:



Advertisements
Similar presentations
Inference in the Simple Regression Model
Advertisements

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
STAT 497 APPLIED TIME SERIES ANALYSIS
Hypothesis testing Week 10 Lecture 2.
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Detection of Human Influence on Extreme Precipitation 11 th IMSC, Edinburgh, July 2010 Seung-Ki Min 1, Xuebin Zhang 1, Francis Zwiers 1 & Gabi Hegerl.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Topic 2: Statistical Concepts and Market Returns
Topic 3: Regression.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
BCOR 1020 Business Statistics
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
5-3 Inference on the Means of Two Populations, Variances Unknown
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Inferential Statistics
Choosing Statistical Procedures
Two and a half problems in homogenization of climate series concluding remarks to Daily Stew Ralf Lindau.
+ Quantitative Statistics: Chi-Square ScWk 242 – Session 7 Slides.
AM Recitation 2/10/11.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Hypothesis Testing.
Statistical inference: confidence intervals and hypothesis testing.
Chapter 8 Introduction to Hypothesis Testing
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
10 IMSC, August 2007, Beijing Page 1 An assessment of global, regional and local record-breaking statistics in annual mean temperature Eduardo Zorita.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Fundamentals of Data Analysis Lecture 10 Management of data sets and improving the precision of measurement pt. 2.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Introduction to Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Why Model? Make predictions or forecasts where we don’t have data.
The binomial applied: absolute and relative risks, chi-square.
One-Sample Tests of Hypothesis. Hypothesis and Hypothesis Testing HYPOTHESIS A statement about the value of a population parameter developed for the purpose.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Section A Confidence Interval for the Difference of Two Proportions Objectives: 1.To find the mean and standard error of the sampling distribution.
BioSS reading group Adam Butler, 21 June 2006 Allen & Stott (2003) Estimating signal amplitudes in optimal fingerprinting, part I: theory. Climate dynamics,
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
© Copyright McGraw-Hill 2004
WCRP Extremes Workshop Sept 2010 Detecting human influence on extreme daily temperature at regional scales Photo: F. Zwiers (Long-tailed Jaeger)
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Why Model? Make predictions or forecasts where we don’t have data.
One-Sample Tests of Hypothesis
Chapter 4 Basic Estimation Techniques
Basic Estimation Techniques
Dr.MUSTAQUE AHMED MBBS,MD(COMMUNITY MEDICINE), FELLOWSHIP IN HIV/AIDS
General Linear Model & Classical Inference
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Testing: Hypotheses
One-Sample Tests of Hypothesis
Detection of anthropogenic climate change
Elements of a statistical test Statistical null hypotheses
Inferential Statistics
One-Sample Tests of Hypothesis
Presentation transcript:

Hypothesis test in climate analyses Xuebin Zhang Climate Research Division

Useful books

Outline Application of hypothesis tests in climate analyses, not a lecture of statistics The statistical test of a hypothesis –Concept, two types of errors –In the case of non-iid Multiple tests and global significance –Independent tests –Non-independent tests An application –Detection of anthropogenic climate change

Test of a hypothesis Null hypothesis H 0 Alternative hypothesis H a Two outcomes of a test –Reject H 0 : we have strong evidence that H 0 is false (but does not imply acceptance of H a ) –Failure to reject H 0 : evidence in the sample not inconsistent with H 0 (but does not imply acceptance of H 0 ) Only consider the case without H a

The ingredients of a test A data sample, or observations Rules for the test: –Observations x, a realization of random vector X –Significance level: the probability of rejecting the H 0 when it is true –Test statistic

Type I and type II errors Type I error –Reject H 0 while it is true –Significance level Type II error –Failure to reject H 0 when it is false

An example Linear regression: Estimation with least square Null- hypothesis: H 0 : Test statistic:

i.i.d. A sequence or other collection of random variables is independent and identically distributed (i.i.d.) if each has the same probability distribution as the others and all are mutually independent. i.i.d. is very common in statistics: observations in a sample are USUALLY assumed to be (more-or-less) i.i.d. for the purposes of statistical inference. The requirement that observations be i.i.d. tends to simplify the underlying mathematics of many statistical methods. However, in practical applications this is most often not realistic. We need to pay particular attention on this issue in almost all hypothesis tests

serial correlation has an effect to reduce the number of degrees of freedom Von Storch and Navarra 1995

Solution to serially correlated data Pre-whitening: removal of serial correlation Estimate the proper number of DOF Block-bootstrap Consider serially explicitly (generalized linear regression

Prewhitening: y(i+1)-alpha*y(i) Von Storch and Navarra 1995

Estimate proper number of degrees of freedom (effective sample size) Effective time τ between independent samples can be estimated for autoregressive process Effective sample size n = NΔt/τ Use n in place of N to compute test statistic/critical value

Block bootstrap Produce many series that do not have the property (e.g. trend) to be tested by resampling the original series Keep the serial correlation in the resampled data by resampling the data block by block Compute the statistics in the resampled data to come up with the critical values of the test statistic

Multiple tests At multiple locations On multiple variables of the same system False rejection with a predefined probability (at the significance level) for each test  more tests mean more possible passed tests by chance Local significance and global (field) significance Livezey and Chen (1983, MWR)

700 hPa height and SOI Chen 1982

Global significance: independent tests False rejection expected by chance (at p probability) Probability of x out of N falsely passed tests follow a binomial distribution With a limited number of tests, false rejection rate is greater than the nominal rate defined by the local significance How many rejections are needed to claim a global significance? The significance levels for local and global may differ

M percentage (M/30*100) p 00.0% % % % % %0.012 Probability of exact M over 30 passed tests

Global significance At p=0.05, there could be 14.1% or more passed tests in 30 tests Or one needs to obtain more than 14.1% passed test to claim global significance at the 5% level It takes more than 1000 independent tests in order for the proportion of passed tests close to (but still slightly higher than) the nominal level

700 hPa height and SOI

Multiple tests: non-independent Multiple tests are very often not independent Estimate the proper number of degrees of freedom, use the results for the independent tests Monte-Carlo simulation

Estimate DoF

Example, M-C simulation

Repeatedly generate random variables to mimic the SOI index –Random noise –Block Bootstrap to consider serial correlation –AR process Compute the correlation between 700 hPa height and the generated “soi” indices, and fraction that locally significant correlation has been detected The fraction corresponding to the pre-defined global significance level is the threshold value with which the correlation with real SOI should be compared M-C simulation, more details

Hypothesis test, summary Assumptions, assumptions Two types of errors –Type I: falsely reject the null hypothesis –Type II: unable to reject null hypothesis when it is false Climate series usually NOT i.i.d. Local and field significance Multiple tests usually NOT independent

Climate change detection and attribution Climate change detection Complicated hypothesis test involving –Data preprocessing –Formulation of test statistic –Interpolation of test result

Climate change detection Statistical methods to detect weak climate change signals from noisy background of the incompletely observed climate Physically based computer models to identify patterns of climate change signals Statistical inference based on the comparison between observations and model simulated signals (and noise as well)

Weaver and Zwiers, 2000

Detection formalism Consider uncertainty in the signal Noise of ith signal May be estimated by total least square method

Signal estimates From ensemble runs –Each run contains signal + noise (natural variability) –Signal is each run is expected to be the same –Noise in each run is not expected to be the same, but the variability of the noise is –Averaging ensemble runs has an effect to keep the signal while reducing the noise level  improves S/N ratio and the chance of detection

Noise estimate One observation sample not sufficient Estimate from model control runs –Requires very long simulations Estimate from ensemble runs –Difference between runs reflect natural variability –Remove ensemble mean, residual is noise, need to be careful about the number of degrees of freedom Detection is conducted at the scale the model simulates high enough variability

Anthropogenic Influence on Global Precipitation Trend Xuebin Zhang (Environment Canada) Francis Zwiers (Environment Canada) Gabi Hegerl (Duke University) Hugo Lambert (UC Berkeley) Nathan Gillett (CRU, UEA) Susan Solomon (NOAA, USA) Peter Stott (Hadley Centre, UK) Toru Nozawa (Japan National Institute of Environmental Studies)

Observational Data Global Historical Climate Network (monthly precipitation at stations) Removal of climatology Station annual anomaly (more than 6 months) Gridding to 5x5 lat-long grids –Average of available station anomalies within grid

Model simulated data –Anthropogenic forcing (ANT, 8 models, 27runs) –Natural forcing (NAT, V+S, 4 models, 17 runs) –ALL forcing (ANT+NAT) (ALL, 10 models, 50 runs) –HadCM yr control run

Detection of anthropogenic signal Two periods, 75-yr and 50-yr 10-degree Latitudinal trends Total least square Detection: Attribution:

Precipitation variability Overestimation from station data Smaller model variability when compared with obs based estimate EOF truncation of observed and modeled trend Conservative detection result

EOF truncation: detection at scale simulated variability ~ observed variability

Detection results

Summary: detection for precipitation trend Anthropogenic signal detectable in zonal precipitation trend –ALL signal detectable –ANT signal detectable –NAT not detectable First work, with limitations –Obs data –Mismatch in obs and model based variability estimates Extreme may be more promising, but also limited by obs data Paper published in Nature July 26 (doi: /nature06025)