Using Synthetic Data to Test Downscaling Methods John Lanzante (GFDL/NOAA)

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Downscaling precipitation extremes Rob Wilby* & Chris Dawson * Climate Change Unit, Environment Agency Department of Computer Science, Loughborough.
The Multiple Regression Model.
5/3/2015J-PARC1 Transverse Matching Using Transverse Profiles and Longitudinal Beam arrival Times.
Normalizing and Redistributing Variables Chapter 7 of Data Preparation for Data Mining Markus Koskela.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Use of regression analysis Regression analysis: –relation between dependent variable Y and one or more independent variables Xi Use of regression model.
Business Statistics - QBM117 Scatter diagrams and measures of association.
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
Chapter 2 Describing Data Sets
Statistics Psych 231: Research Methods in Psychology.
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Chapter 5. Operations on Multiple R. V.'s 1 Chapter 5. Operations on Multiple Random Variables 0. Introduction 1. Expected Value of a Function of Random.
Environmental Data Analysis with MatLab Lecture 7: Prior Information.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
BCOR 1020 Business Statistics Lecture 24 – April 17, 2008.
Statistics for Managers Using Microsoft® Excel 7th Edition
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistics for Managers Using Microsoft® Excel 7th Edition
Determining How Costs Behave
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
Nonlinear Regression Functions
1. An Overview of the Data Analysis and Probability Standard for School Mathematics? 2.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Downscaling in time. Aim is to make a probabilistic description of weather for next season –How often is it likely to rain, when is the rainy season likely.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.
Confidence Interval Estimation
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Lecture 2 Graphs, Charts, and Tables Describing Your Data
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Correlation Correlation is used to measure strength of the relationship between two variables.
Chapter 2 Looking at Data - Relationships. Relations Among Variables Response variable - Outcome measurement (or characteristic) of a study. Also called:
Basics of Data Cleaning
Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 2-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Sampling, sample size estimation, and randomisation
Handbook on Residential Property Price Indices Chapter 5: Methods Jan de Haan UNECE/ILO Meeting, May 2010.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Chap 2-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course in Business Statistics 4 th Edition Chapter 2 Graphs, Charts, and Tables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
New Measures of Data Utility Mi-Ja Woo National Institute of Statistical Sciences.
2.There are two fundamentally different approaches to this problem. One can try to fit a theoretical distribution, such as a GEV or a GP distribution,
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Scatter Plots, Correlation and Linear Regression.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 22.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Quality Improvement Tools CHAPTER SIX SUPPLEMENT McGraw-Hill/Irwin Copyright © 2011 by the McGraw-Hill Companies, Inc. All rights reserved.
NON-LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 6 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.
Meteorology 485 Long Range Forecasting Friday, February 13, 2004.
Wide Band Power and Harmonic Amplitude of Precipitation Alex Ruane John Roads Scripps Institution of Oceanography / UCSD Ramat Gan, Israel: July, 2006.
Future Directions in Ensemble DA for Hurricane Prediction Applications Jeff Anderson: NCAR Ryan Torn: SUNY Albany Thanks to Chris Snyder, Pavel Sakov The.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics: A First Course 5 th Edition.
ChE 551 Lecture 04 Statistical Tests Of Rate Equations 1.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Stats Methods at IC Lecture 3: Regression.
BAE 6520 Applied Environmental Statistics
BAE 5333 Applied Water Resources Statistics
Determining How Costs Behave
STEM Fair Graphs.
Presentation transcript:

Using Synthetic Data to Test Downscaling Methods John Lanzante (GFDL/NOAA)

CONCEPTS Testing Downscaling: Like Product Testing My Product 

STEP1: Recruit Test Subjects (Gather Data) CONCEPTS STEP2: Feed Cereal For Several Decades (Apply Downscaling Method)

CONCEPTS STEP3: How are subjects affected? How well did downscaling do? Not so clear – Need more subjects? Need more data? Real-world data may be limited? Can we generate synthetic data to fill the void?

CONCEPTS STEP 4a: Snowmen most affected? Generate a new sample.

CONCEPTS STEP 4b: Snow-women affected differently? Generate a new sample.

REALISTIC EXAMPLES CASE 1 – Linearity: Simplest downscaling – linear regression

REALISTIC EXAMPLES CASE 1 – Strong Nonlinearity: Simplest downscaling – linear regression

REALISTIC EXAMPLES SUMMARY CASE 1 – Nonlinearity: Hard to test nonlinearity in real-world data ? (if we are just entering “non-linear regime”) Simulate various degrees of nonlinearity Compare linear & nonlinear downscaling methods Determine amount of degradation Determine time in future when degradation becomes “too large”

REALISTIC EXAMPLES CASE 2 – Coastal Error: Downscaling error maximizes along coastline

REALISTIC EXAMPLES CASE 2 – Coastal Error: Obs gridpoint  Entirely land Model gridpoint  Partly land, partly ocean

REALISTIC EXAMPLES CASE 2 – Coastal Error: Land more detail (extremes) than Ocean (damped) Missing peaks & troughs unrecoverable

REALISTIC EXAMPLES SUMMARY CASE 2 – Costal Error: Simulate land & ocean points Downscale land from mixture (land + sea) Vary the proportions of the mixture Is coastal effect due to mixture/mismatch?

SYNTHETIC DATA MODEL One Particular Synthetic Data Model: O= Observations M= Model y= year d= day Red = free parameter (user selects the value) O y d = Ō y + O ’ y d  Yearly mean + AR1 O ’ y d = rlag1 * O ’ y d-1 + a y d  AR1 fvar = var Ō / var O [ var O = var Ō + var O ] M y d = O y d + b y d corr = correlation(O,M) a ~ N(0,var a ) Proper choice of a & b b ~ N(0,var b ) yields desired rlag1 & corr

SYNTHETIC DATA MODEL STEP 1: Generate Base Time Series rlag1 day-to-day persistence fvar interannual vs. day-to-day variability corr strength of relation: model vs. obs STEP 2: Historical Adjustment mean OBS characteristics of the distribution mean MODEL var OBS var MODEL STEP 3: Future Adjustment mean OBS characteristics of the distribution mean MODEL var OBS var MODEL

SYNTHETIC DATA MODEL OUR APPLICATIONS OF THIS MODEL: Downscaling (just getting started) No results yet Applied successfully to several related issues (cross-validation, exceedance statistics, testing two distributions)

SUMMARY REAL-WORLD COMPLICATIONS: Results may not be clear-cut: Sample size too small? Multiple factors may contribute? Some conditions more interesting? SOLUTION – GENERATE SYNTHETIC DATA: Advantages of Synthetic Data: Unlimited sample size (enhance signal/noise) Change one factor at a time Prescribe exact conditions Vary factor over a wide range (“turn the knob”) Can extend outside the range of historical data Turn knob “all the way” for unambiguous results

A CAUTIONARY NOTE No “One Size Fits All”: No single “best” synthetic data model Must possess appropriate real-world characteristics Ability to vary the relevant factors Possible Models For Future Development: Skewed data (transform Gaussian data nonlinearly?) Precipitation (discrete Markov + bounded distribution?) Model occurrence & amount separately? Multivariate model?

THE END

REALISTIC EXAMPLES CASE 1 – Weak Nonlinearity: Simplest downscaling – linear regression

SUPPLEMENTAL Causes of Nonlinearity? At highest T – model soil becomes excessively dry – T becomes excessive Other possibilities: Water Vapor, Clouds, Sea-Ice, etc.

REALISTIC EXAMPLES CASE 2 – Coastal Error: Land  More extremes Ocean  Damped

REALISTIC EXAMPLES CASE 2 – Coastal Error: X/Y Plot: Land (model) vs. Land (obs)

REALISTIC EXAMPLES CASE 2 – Coastal Error: X/Y Plot: Ocean (model) vs. Land (obs)

SYNTHETIC DATA MODEL STEP 4: Fit downscaling model to historical sample STEP 5: Test downscaling in historical & future samples OUR APPLICATIONS OF THIS MODEL: No results to show today Downscaling (just getting started) Guidance in the use of cross-validation Biases in exceedance statistics Testing difference between 2 distributions