QMDA Review Session. Things you should remember 1. Probability & Statistics.

Slides:

Advertisements

Similar presentations

Environmental Data Analysis with MatLab Lecture 21: Interpolation.

Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Environmental Data Analysis with MatLab Lecture 8: Solving Generalized Least Squares Problems.

Lecture 13 L1 , L∞ Norm Problems and Linear Programming

Lecture 15 Orthogonal Functions Fourier Series. LGA mean daily temperature time series is there a global warming signal?

Lecture 6 Bootstraps Maximum Likelihood Methods. Boostrapping A way to generate empirical probability distributions Very handy for making estimates of.

Environmental Data Analysis with MatLab Lecture 9: Fourier Series.

Environmental Data Analysis with MatLab

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.

Environmental Data Analysis with MatLab Lecture 13: Filter Theory.

Environmental Data Analysis with MatLab Lecture 16: Orthogonal Functions.

Lecture 3 Probability and Measurement Error, Part 2.

The General Linear Model. The Simple Linear Model Linear Regression.

Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests.

Environmental Data Analysis with MatLab Lecture 11: Lessons Learned from the Fourier Transform.

Environmental Data Analysis with MatLab

Lecture 2 Probability and Measurement Error, Part 1.

Variance and covariance M contains the mean Sums of squares General additive models.

Environmental Data Analysis with MatLab Lecture 17: Covariance and Autocorrelation.

Lecture 19 Continuous Problems: Backus-Gilbert Theory and Radon’s Problem.

Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.

Lecture 6 Resolution and Generalized Inverses. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Lecture 17 spectral analysis and power spectra. Part 1 What does a filter do to the spectrum of a time series?

Environmental Data Analysis with MatLab Lecture 5: Linear Models.

Lecture 3 Review of Linear Algebra Simple least-squares.

Lecture 7 Advanced Topics in Least Squares. the multivariate normal distribution for data, d p(d) = (2  ) -N/2 |C d | -1/2 exp{ -1/2 (d-d) T C d -1 (d-d)

Lecture 4: Practical Examples. Remember this? m est = m A + M [ d obs – Gm A ] where M = [G T C d -1 G + C m -1 ] -1 G T C d -1.

Lecture 5 Probability and Statistics. Please Read Doug Martinson’s Chapter 3: ‘Statistics’ Available on Courseworks.

Environmental Data Analysis with MatLab Lecture 3: Probability and Measurement Error.

Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.

Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.

Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps.

Lecture 2 Probability and what it has to do with data analysis.

Lecture 11 Vector Spaces and Singular Value Decomposition.

Lecture 16 Basic properties of Fourier Transforms.

Lecture 4 Probability and what it has to do with data analysis.

Lecture 9 Interpolation and Splines. Lingo Interpolation – filling in gaps in data Find a function f(x) that 1) goes through all your data points 2) does.

Linear and generalised linear models

Lecture 18 advanced topics is spectral analysis Parsival’s Theorem Multi-taper spectral analysis Auto- and Cross- Correlation Phase spectra.

Linear and generalised linear models

Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.

Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance.

Environmental Data Analysis with MatLab Lecture 7: Prior Information.

Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Probability theory 2008 Outline of lecture 5 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different.

Lecture II-2: Probability Review

Introduction to Regression Analysis, Chapter 13,

Modern Navigation Thomas Herring

Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;

Variance and covariance Sums of squares General linear models.

Separate multivariate observations

Review of Probability.

Environmental Data Analysis with MatLab Lecture 20: Coherence; Tapering and Spectral Analysis.

Regression and Correlation Methods Judy Zhong Ph.D.

1 Chapter 8 The Discrete Fourier Transform 2 Introduction  In Chapters 2 and 3 we discussed the representation of sequences and LTI systems in terms.

Regression Analysis (2)

Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.

Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION ASEN 5070 LECTURE 11 9/16,18/09.

Modern Navigation Thomas Herring MW 11:00-12:30 Room

Astronomical Data Analysis I

Environmental Data Analysis with MatLab 2 nd Edition Lecture 14: Applications of Filters.

Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.

The “Big Picture” (from Heath 1995). Simple Linear Regression.

Biointelligence Laboratory, Seoul National University

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Lecture 26: Environmental Data Analysis with MatLab 2nd Edition

Chapter 8 The Discrete Fourier Transform

Presentation transcript:

QMDA Review Session

Things you should remember

1. Probability & Statistics

the Gaussian or normal distribution p(x) = exp{ - (x-x) 2 / 2  2 ) 1  (2  )  expected value variance

x p(x) x x+2  x-2  95% Expectation = Median = Mode = x 95% of probability within 2  of the expected value Properties of the normal distribution

Multivariate Distributions The Covariance Matrix, C, is very important C ij the diagonal elements give the variance of each x i  x i 2 = C ii

The off-diagonal elemements of C indicate whether pairs of x’s are correlated. E.g. C 12 x2x2 x1x1 x1x1 x2x2 C 12 <0 negative correlation x2x2 x1x1 x1x1 x2x2 C 12 >0 positive correlation

the multivariate normal distribution p(x) = (2  ) -N/2 |C x | -1/2 exp{ -1/2 (x-x) T C x -1 (x-x) } has expectation x covariance C x And is normalized to unit area

if y is linearly related to x, y=Mx then y=Mx (rule for means) C y = M C x M T (rule for propagating error) T hese rules work regardless of the distribution of x

2. Least Squares

Simple Least Squares Linear relationship between data, d, and model, m d = Gm Minimize prediction error E=e T e with e=d obs -Gm m est = [G T G] -1 G T d If data are uncorrelated with variance,  d 2, then C m =  d 2 [G T G] -1

Least Squares with prior constraints Given uncorrelated with variance,  d 2, that satisfy a linear relationship d = Gm And prior information with variance,  m 2, that satisfy a linear relationship h = Dm The best estimate for the model parameters, m est, solves G  D d  h m = Previously, we discussed only the special case h=0 With  =  m /  d.

Newton’s Method for Non-Linear Least- Squares Problems Given data that satisfies a non-linear relationship d = g(m) Guess a solution m (k) with k=0 and linearize around it:  m = m-m (k) and  d = d-g(m (k) ) and  d=G  m With G ij =  g i /  m j evaluated at m (k) Then iterate, m (k+1) = m (k) +  m with  m=[G T G] -1 G T  d hoping for convergence

3. Boot-straps

Investigate the statistics of y by creating many datasets y’ and examining their statistics each y’ is created through random sampling with replacement of the original dataset y

y1y2y3y4y5y6y7…yNy1y2y3y4y5y6y7…yN y’ 1 y ’ 2 y ’ 3 y ’ 4 y ’ 5 y ’ 6 y ’ 7 … y ’ N … 6 N original data Random integers in the range 1-N N resampled data N 1  i y’ i Compute estimate Now repeat a gazillion times and examine the resulting distribution of estimates Example: statistics of the mean of y, given N data

4. Interpolation and Splines

linear splines x xixi x i+1 yiyi y i+1 y in this interval y(x) = y i + (y i+1 -y i )  (x-x i )/(x i+1 -x i ) 1 st derivative discontinuous here

cubic splines x xixi x i+1 yiyi y i+1 y cubic a+bx+cx 2 +dx 3 in this interval a different cubic in this interval 1 st and 2 nd derivative continuous here

5. Hypothesis Testing

The Null Hypothesis always a variant of this theme: the results of an experiment differs from the expected value only because of random variation

Test of Significance of Results say to 95% significance The Null Hypothesis would generate the observed result less than 5% of the time

Four important distributions Normal distribution Chi-squared distribution Student’s t-distribution F-distribution Distribution of  2 =  i=1 N x i 2 Distribution of x i Distribution of t = x 0 /  { N -1  i=1 N x i 2 } Distribution of F = { N -1  i=1 N x i 2 } / { M -1  i=1 M x N+i 2 }

5 tests m obs = m prior when m prior and  prior are known normal distribution  obs =  prior when m prior and  prior are known chi-squared distribution m obs = m prior when m prior is known but  prior is unknown t distribution  1 obs =   obs when m 1 prior and m 2 prior are known F distribution m 1 obs = m  obs when  1 prior and   prior are unknown modified t distribution

6. filters

g(t) =  -  t f(t-  ) h(  ) d  g k =  t  p=-  k f k-p h p g(t) =  0  f(  ) h(t-  ) d  g k =  t  p=0  f p h k-p or alternatively Filtering operation g(t)=f(t)*h(t) “convolution”

How to do convolution by hand x=[x 0, x 1, x 2, x 3, x 4, …] T and y=[y 0, y 1, y 2, y 3, y 4, …] T x 0, x 1, x 2, x 3, x 4, … … y 4, y 3, y 2, y 1, y 0  x0y0x0y0 Reverse on time-series, line them up as shown, and multiply rows. This is first element of x * y [x*y]2=[x*y]2= x 0, x 1, x 2, x 3, x 4, … … y 4, y 3, y 2, y 1, y 0  x 0 y 1 +x 1 y 0 Then slide, multiply rows and add to get the second element of x * y  And etc … [x*y]1=[x*y]1=

g0g1…gNg0g1…gN h0h1…hNh0h1…hN f f 1 f … f N … f 3 f 2 f 1 f 0 =  t  g = F h Matrix formulations of g(t)=f(t)*h(t) g0g1…gNg0g1…gN f0f1…fNf0f1…fN h h 1 h … h N … h 3 h 2 h 1 h 0 =  t  g = H f and

X(0) X(1) X(2) … X(N) f0f1…fNf0f1…fN A(0) A(1) A(2) … A(1) A(0) A(1) … A(2) A(1) A(0) … … A(N) A(N-1) A(N-2) … = Least-squares equation [H T H] f = H T g g = H f g0g1…gNg0g1…gN f0f1…fNf0f1…fN h h 1 h … h N … h 3 h 2 h 1 h 0 =  t  Autocorrelation of hCross-correlation of h and g

A i and X i Auto-correlation of a time-series, T(t) A(  ) =  -  +  T(t) T(t-  ) dt A i =  j T j T j-i Cross-correlation of two time-series T (1) (t) and T (2) (t) X(  ) =  -  +  T (1) (t) T (2) (t-  ) dt X i =  j T (1) j T (2) j-i

7. fourier transforms and spectra

Integral transforms: C(  ) =  -  +  T(t) exp(-i  t) dt T(t) = (1/2  )  -  +  C(  ) exp(i  t) d  Discrete transforms (DFT) C k =  n=0 N-1 T n exp(-2  ikn/N ) with k=0, …, N-1 T n = N -1  k=0 N-1 C k exp(+2  ikn/N ) with n=0, …, N-1 Frequency step:  t = 2  /N Maximum (Nyquist) Frequency  max = 1/ (2  t)

Aliasing and cyclicity in a digital world  n+N =  n and since time and frequency play symmetrical roles in exp(-i  t) t k+N = t k

One FFT that you should know: FFT of a spike at t=0 is a constant C(  ) =  -  +   (t) exp(-i  t) dt = exp(0) = 1

Error Estimates for the DFT Assume uncorrelated, normally-distributed data, d n =T n, with variance  d 2 The matrix G in Gm=d is G nk = N -1 exp(+2  ikn/N ) The problem Gm=d is linear, so the unknowns, m k =C k, (the coefficients of the complex exponentials) are also normally-distributed. Since exponentials are orthogonal, G H G=N -1 I is diagonal and C m =  d 2 [G H G] -1 = N -1  d 2 I is diagonal, too Apportioning variance equally between real and imaginary parts of C m, each has variance  2 = N -1  d 2 /2. The spectrum s m 2 = C r m 2 + C i m 2 is the sum of two uncorrelated, normally distributed random variables and is thus   2 -distributed. The 95% value of   2 is about 5.9, so that to be significant, a peak must exceed 5.9N -1  d 2 /2

Convolution Theorem transform[ f(t)*g(t) ] = transform[g(t)]  transform[f(t)]

Power spectrum of a stationary time-series T(t) = stationary time series C(  ) =  -T/2 +T/2 T(t) exp(-i  t) dt S(  ) = lim T  T -1 |C(  )| 2 S(  ) is called the power spectral density, the spectrum normalized by the length of the time series.

Relationship of power spectral density to DFT To compute the Fourier transform, C(  ), you multiply the DFT coefficients, C k, by  t. So to get power spectal density T -1 |C(  )| 2 = (N  t) -1 |  t C k | 2 = (  t/N) |C k | 2 You multiply the DFT spectrum, |C k | 2, by  t/N.

Windowed Timeseries Fourier transform of long time-series convolved with the Fourier Transform of the windowing function is Fouier transform of windowed time-series

Window Functions Boxcar its Fourier transform is a sinc function which has a narrow central peak but large side lobes Hanning (Cosine) taper its Fourier transform has a somewhat wider central peak but now side lobes

8. EOF’s and factor analysis

Samples N  M (f 1 in s 1 ) (f 2 in s 1 ) (f 3 in s 1 ) (f 1 in s 2 ) (f 2 in s 2 ) (f 3 in s 2 ) (f 1 in s 3 ) (f 2 in s 3 ) (f 3 in s 3 ) … (f 1 in s N ) (f 2 in s N ) (f 3 in s N ) (A in s 1 ) (B in s 1 ) (C in s 1 ) (A in s 2 ) (B in s 2 ) (C in s 2 ) (A in s 3 ) (B in s 3 ) (C in s 3 ) … (A in s N ) (B in s N ) (C in s N ) = (A in f 1 ) (B in f 1 ) (C in f 1 ) (A in f 2 ) (B in f 2 ) (C in f 2 ) (A in f 3 ) (B in f 3 ) (C in f 3 ) S = C F Coefficients N  M Factors M  M Representation of samples as a linear mixing of factors

Samples N  M (f 1 in s 1 ) (f 2 in s 1 ) (f 1 in s 2 ) (f 2 in s 2 ) (f 1 in s 3 ) (f 2 in s 3 ) … (f 1 in s N ) (f 2 in s N ) (A in s 1 ) (B in s 1 ) (C in s 1 ) (A in s 2 ) (B in s 2 ) (C in s 2 ) (A in s 3 ) (B in s 3 ) (C in s 3 ) … (A in s N ) (B in s N ) (C in s N ) = (A in f 1 ) (B in f 1 ) (C in f 1 ) (A in f 2 ) (B in f 2 ) (C in f 2 ) S  C’ F’ selected coefficients N  p selected factors p  M ignore f 3 data approximated with only most important factors p most important factors = those with the biggest coefficients

Singular Value Decomposition (SVD) Any N  M matrix S and be written as the product of three matrices S = U  V T where U is N  N and satisfies U T U = UU T V is M  M and satisfies V T V = VV T and  is an N  M diagonal matrix of singular values, i

SVD decomposition of S S = U  V T write as S = U  V T = [U  ] [V T ] = C F So the coefficients are C = U  and the factors are F = V T The factors with the biggest i ’s are the most important

Transformations of Factors If you chose the p most important factors, they define both a subspace in which the samples must lie, and a set of coordinate axes of that subspace. The choice of axes is not unique, and could be changed through a transformation, T F new = T F old A requirement is that T -1 exists, else F new will not span the same subspace as F old S = C F = C I F = (C T -1 ) (T F)= C new F new So you could try to implement the desirable factors by designing an appropriate transformation matrix, T

9. Metropolis Algorithm and Simulated Annealing

Metropolis Algorithm a method to generate a vector x of realizations of the distribution p(x)

The process is iterative start with an x, say x (i) then randomly generate another x in its neighborhood, say x (i+1), using a distribution Q(x (i+1) |x (i) ) then test whether you will accept the new x (i+1) if it passes, you append x (i+1) to the vector x that you are accumulating if it fails, then you append x (i)

a reasonable choice for Q(x (i+1) |x (i) ) normal distribution with mean=x (i) and  x 2 that quantifies the sense of neighborhood The acceptance test is as follows first compute the quantify: If a>1 always accept x (i+1) If a<1 accept x (i+1) with a probability of a and accept x (i) with a probability of 1-a p(x (i+1) ) Q(x (i) |x (i+1) ) p(x (i) ) Q(x (i+1) |x (i) ) a =

Simulated Annealing Application of Metropolis to Non- linear optimization find m that minimizes E(m)=e T e where e = d obs -g(m)

Based on using the Boltzman distribution for p(x) in the Metropolis Algorithm p(x) = exp{-E(m)/T} where temperature, T, is slowly decreased during the iterations

10. Some final words

Start Simple ! Examine a small subset of your data and looking them over carefully Build processing scripts incrementally, checking intermediated results at each stage Make lots of plots and look them over carefully Do reality checks