Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Lesson 10: Linear Regression and Correlation
Structural Equation Modeling
Hypothesis Testing Steps in Hypothesis Testing:
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Correlation Mechanics. Covariance The variance shared by two variables When X and Y move in the same direction (i.e. their deviations from the mean are.
Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests.
Data Analysis Statistics. Inferential statistics.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Chapter 12 Multiple Regression
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
SIMPLE LINEAR REGRESSION
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Simple Linear Regression Analysis
Topic 3: Regression.
Multiple Regression and Correlation Analysis
SIMPLE LINEAR REGRESSION
Data Analysis Statistics. Inferential statistics.
BCOR 1020 Business Statistics
Today Concepts underlying inferential statistics
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Business Statistics - QBM117 Statistical inference for regression.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Lecture 5 Correlation and Regression
Correlation and Linear Regression
SIMPLE LINEAR REGRESSION
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Regression Analysis (2)
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Simple Linear Regression Models
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
Correlation.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Inference about the slope parameter and correlation
Inference for Least Squares Lines
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 4. Inference about Process Quality
Correlation and Simple Linear Regression
Autocorrelation.
Correlation and Simple Linear Regression
Simple Linear Regression
Simple Linear Regression and Correlation
Product moment correlation
Autocorrelation.
Presentation transcript:

Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_ 2014/ Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_ 2014/

Propagation of Errors Example 1: linear function Example 2: mean x t is the true value of x if no bias in the error sample mean averages errors in data

Mean Squared Error How much is error reduced by averaging? Examine the mean squared error or error variance. The indicates an ensemble average over N realizations. If the errors are random and similar with no bias error variance is reduced by a factor of N if errors are uncorrelated

Example 3: difference in time If errors are random (uncorrelated), the difference increases the squared errors by a factor of 2 If errors are correlated (bias), the difference reduces the errors Most errors are a combination of random and bias Errors for Differences

Given a quantity that is a function of several variables F(x,y,z) a variation (or error) in F is related to variations in the variables or in terms of the error variance assuming errors in x, y and z are uncorrelated General Error Estimates

Error Estimate Example where ρ and c p are constant Squared error Factor out F 2 Take ensemble average and define relative error

Another Example Wind stress where c D is constant Error is given as a fraction r of wind speed so relative error is What is the relative error of wind stress (magnitude)? What is the stress error if the wind speed error is 10%?

Another Example Solution Wind stress where c D is constant General formula: A 10% error in wind speed s gives a 20% error in stress

Exercise 3: Error Estimates Known errors for Q, T and H Need error estimates for Q/(ρ c p H) dT/dt

Exercise 3: Are other terms significant? 1. compute LHS 2. estimate total errors for LHS Is the LHS difference larger than the estimated errors? Notes: convert relative error variance of x to error variance using var(x) check that all units match

Hypothesis Testing To determine whether a relationship is significant we formulate a null hypothesis, that the proposed relationship is NOT true We test to determine if the null hypothesis can be rejected within a given probabilty, say α = 0.05 (5%). (The level of confidence is 95%.) A significance test consists of finding the probability of a given result (a p-value) and comparing that with the alpha test value. If the p- value (probability) is less than alpha, then the null hypothesis is rejected.

Test Example Is the mean of a subsample of X over N points significantly different from the known mean value μ? Depends on the std dev (error) of the mean estimate A measure of how large this is (how likely it is to be significant) is found from the Z-transform Probability of Z score (or lower) from a normal distribution N(0,1) is p = normcdf(Z,0,1) [Matlab function] Or let Matlab do the work: p = normcdf(,μ,σ m )

Analysis of Variance (ANOVA) To test how well a dynamical or statistical model fits observations d(t) we estimate the fraction of variance described by the model z Two common types of models are (1) known function z = f (x,y) (2) linear estimator (coefficients by regression) z = a x + b y + c The ratio of the squared residual (or error) r 2 =( d – z ) 2 to the variance of the observations σ d 2 is the fraction of variance not explained by the model.

Time Series Analysis The analysis of time series differs from that of independent objects (tossing dice, medical patient studies, etc) in that the measurements generally have serial correlation: So a time series with N points does not have N independent measurements. The effective number of independent measurements (degrees of freedom N*) depends on the degree of correlation of successive measurements, the autocorrelation of the time series:

Covariance and Correlation For two time series x(t) and y(t) covariance is defined as where is expected value and Δt is a time lag Correlation is the covariance normalized by the std dev’s (values between -1 and 1) Notes: 1)this terminology differs from that in Matlab, but is common 2)when applied to a single variable, x, autocovariance, autocorrelation 3)these are time-lagged values, but we often use only zero-lag value 4)we generally remove the mean values (as shown)

Correlations Some common types of correlations: 1)autocorrelation (to get a time scale for the data) 2)correlations between two variables 3)lagged correlations to determine if one variable leads or lags another 4)vector correlations (as opposed to scalar correlations) To evaluate a correlation, need an objective measure of significance

Autocorrelation & Periodic Signals Autocorrelation of variable with periodic signal mostly shows the periodicity Remove harmonics before computing (auto) correlations for better interpretation & statistics

Characteristic Time Scale Is there a characteristic time scale for each variable? First zero crossing? Or something more robust?

Integral Time Scale More robust method: takes into account shape of function integral time scale: integrate correlation (to first zero crossing) to get equivalent time (tau) for perfect correlation integral time scales: 1 month for Qnet 4 months for SSH integral time scales shorter than zero crossing integral time scale

Caution: Covariance from Observations Autocovariance (or autocorrelation) from a single time series is an overestimate of the actual function because the error is correlated with itself. It should be estimated from two different measurements of the same quantity at the same location. If the errors have shorter time scales than the variable, then the error can be estimated from the autocovariance at non-zero lags

Autocorrelation: estimate correction for zero lag extrapolate to zero lag difference in correlation from unresolved signal variance and actual errors (upper bound) SSH Qnet

Significance of a Correlation ( degrees of freedom) The integral time scale τ is used to define the number of degrees of freedom N* of a time series N* = N/τ where N is length of the series which is needed to determine the statistical significance of the correlation Z-test for significance of the correlation r based on a random parent distribution ρ of possible correlations Create a new variable The mean and std dev of w are

Derivation of Significance Test (cont’d) For null hypothesis ρ = 0 so μ = 0. Normalize using Z transform If Z is within region containing fraction (1-α) of distribution the correlation is NOT significant. Alternatively, one can solve for the critical value of correlation r c See Bendat & Piersol for derivation (2000), pp

Exercise 4: Lagged correlations SSH: longitude-time plot SSH at two locations lag Can you estimate the speed of the Rossby wave from the SSH?

Exercise 4: Vectors Mean wind vectors KEO mooring ECMWF QuikSCAT NCEP2 Note: vector correlations do not include means

Vector Correlations complex correlation gives persistent direction errors & magnitude errors