Searching for Periodic Signals in Time Series Data

Slides:



Advertisements
Similar presentations
Time-Series Analysis of Astronomical Data
Advertisements

Data Analysis Do it yourself!. What to do with your data? Report it to professionals (e.g., AAVSO) –Excellent! A real service to science; dont neglect.
Change-Point Detection Techniques for Piecewise Locally Stationary Time Series Michael Last National Institute of Statistical Sciences Talk for Midyear.
Islamic university of Gaza Faculty of engineering Electrical engineering dept. Submitted to: Dr.Hatem Alaidy Submitted by: Ola Hajjaj Tahleel.
Lecture 15 Orthogonal Functions Fourier Series. LGA mean daily temperature time series is there a global warming signal?
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
NASSP Masters 5003F - Computational Astronomy Lecture 5: source detection. Test the null hypothesis (NH). –The NH says: let’s suppose there is no.
GG 313 Lecture 25 Transform Pairs and Convolution 11/22/05.
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
1.Our Solar System: What does it tell us? 2. Fourier Analysis i. Finding periods in your data ii. Fitting your data.
Filtering Filtering is one of the most widely used complex signal processing operations The system implementing this operation is called a filter A filter.
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Error Propagation. Uncertainty Uncertainty reflects the knowledge that a measured value is related to the mean. Probable error is the range from the mean.
CSCE 641 Computer Graphics: Image Sampling and Reconstruction Jinxiang Chai.
CSCE 641 Computer Graphics: Image Sampling and Reconstruction Jinxiang Chai.
Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps.
Transforms What does the word transform mean?. Transforms What does the word transform mean? –Changing something into another thing.
William Stallings Data and Computer Communications 7th Edition (Selected slides used for lectures at Bina Nusantara University) Data, Signal.
Variable Phenomena Nyquist Sampling Harry Nyquist (1889 – 1976)
The World’s Largest Ferris Wheel
Doppler Radar From Josh Wurman Radar Meteorology M. D. Eastin.
ElectroScience Lab IGARSS 2011 Vancouver Jul 26th, 2011 Chun-Sik Chae and Joel T. Johnson ElectroScience Laboratory Department of Electrical and Computer.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Lecture 1 Signals in the Time and Frequency Domains
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Motivation Music as a combination of sounds at different frequencies
CSC361/661 Digital Media Spring 2002
Transforms. 5*sin (2  4t) Amplitude = 5 Frequency = 4 Hz seconds A sine wave.
Time Series Spectral Representation Z(t) = {Z 1, Z 2, Z 3, … Z n } Any mathematical function has a representation in terms of sin and cos functions.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
INTEGRALS Areas and Distances INTEGRALS In this section, we will learn that: We get the same special type of limit in trying to find the area under.
1 LES of Turbulent Flows: Lecture 2 Supplement (ME EN ) Prof. Rob Stoll Department of Mechanical Engineering University of Utah Fall 2014.
Filtering Robert Lin April 29, Outline Why filter? Filtering for Graphics Sampling and Reconstruction Convolution The Fourier Transform Overview.
Signals CY2G2/SE2A2 Information Theory and Signals Aims: To discuss further concepts in information theory and to introduce signal theory. Outcomes:
Time Series Analysis. A number of fields in astronomy require finding periodic signals in a time series of data: 1.Periods in pulsating stars (e.g. Cepheids)
Monte Carlo Methods So far we have discussed Monte Carlo methods based on a uniform distribution of random numbers on the interval [0,1] p(x) = 1 0  x.
Looking for Climate Signals in Ice Cores Santa Fe, 2011 Gerald R. North Thanks to Petr Chylek for the data and encouragement.
GG 313 Geological Data Analysis Lecture 13 Solution of Simultaneous Equations October 4, 2005.
Random Sampling Approximations of E(X), p.m.f, and p.d.f.
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
Statistics Presentation Ch En 475 Unit Operations.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Estimation of a Population Mean
Basic Time Series Analyzing variable star data for the amateur astronomer.
Confidence Interval Estimation For statistical inference in decision making:
1 OUTPUT ANALYSIS FOR SIMULATIONS. 2 Introduction Analysis of One System Terminating vs. Steady-State Simulations Analysis of Terminating Simulations.
What can we learn about dynamic triggering in the the lab? Lockner and Beeler, 1999.
Z bigniew Leonowicz, Wroclaw University of Technology Z bigniew Leonowicz, Wroclaw University of Technology, Poland XXIX  IC-SPETO.
Computer simulation Sep. 9, QUIZ 2 Determine whether the following experiments have discrete or continuous out comes A fair die is tossed and the.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Lecture 8 Source detection NASSP Masters 5003S - Computational Astronomy
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Machine Learning 5. Parametric Methods.
ES 07 These slides can be found at optimized for Windows)
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
Statistics Presentation Ch En 475 Unit Operations.
Periodic signals To search a time series of data for a sinusoidal oscillation of unknown frequency  : “Fold” data on trial period P  Fit a function.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics: A First Course 5 th Edition.
Computacion Inteligente Least-Square Methods for System Identification.
Trigonometric Functions of Real Numbers 5. Trigonometric Graphs 5.3.
A 2 veto for Continuous Wave Searches
Data Mining: Concepts and Techniques
MECH 373 Instrumentation and Measurements
Signals and Systems Networks and Communication Department Chapter (1)
Statistics Review ChE 477 Winter 2018 Dr. Harding.
CSCE 643 Computer Vision: Thinking in Frequency
4. Image Enhancement in Frequency Domain
RV Fitting Sirinrat Sithajan 25th April 2019 NARIT School on Exoplanet and Astrobiology Chiang Mai, Thailand.
Presentation transcript:

Searching for Periodic Signals in Time Series Data Least Squares Sine Fitting Discrete Fourier Transform Lomb-Scargle Periodogram Pre-whitening of Data Other techniques Phase Dispersion Minimization String Length Wavelets

Period Analysis How do you know if you have a periodic signal in your data? What is the period?

Try 16.3 minutes:

1. Least-squares Sine fitting Fit a sine wave of the form: V(t) = A·sin(wt + f) + Constant Where w = 2p/P, f = phase shift Best fit minimizes the c2: c2 = S (di –gi)2/N di = data, gi = fit

Advantages of Least Sqaures sine fitting: Good for finding periods in relatively sparse data Disadvantages of Least Sqaures sine fitting: Signal may not always be a sine wave (e.g. eccentric orbits) No assessement of false alarm probability (more later) Don‘t always trust your results

This is fake data of pure random noise with a s = 30 m/s This is fake data of pure random noise with a s = 30 m/s. Lesson: poorly sampled noise almost always can give you a period, but it it not signficant

[( ( ] ) ) S S 2. The Discrete Fourier Transform Any function can be fit as a sum of sine and cosines N0 FT(w) =  Xj (t) e–iwt Recall eiwt = cos wt + i sinwt j=1 X(t) is the time series 1 N0 Power: Px(w) = | FTX(w)|2 N0 = number of points 1 [( 2 ( 2 S ) S ) ] Px(w) = Xj cos wtj + Xj sin wtj N0 A DFT gives you as a function of frequency the amplitude (power) of each sine wave that is in the data

eixs = cos(xs) + i sin (xs) The continous form of the Fourier transform: F(s) =  f(x) e–ixs dx f(x) = 1/2p  F(s) eixs ds eixs = cos(xs) + i sin (xs) This is only done on paper, in the real world (computers) you always use a discrete Fourier transform (DFT)

The Fourier transform tells you the amplitude of sine (cosine) components to a data (time, pixel, x,y, etc) string Goal: Find what structure (peaks) are real, and what are artifacts of sampling, or due to the presence of noise

FT Ao P Ao 1/P A pure sine wave is a delta function in Fourier space x n A constant value is a delta function with zero frequency in Fourier space: Always subtract off „dc“ level.

Useful concept: Convolution  f(u)f(x–u)du = f * f f(x): f(x):

Useful concept: Convolution f(x-u) a1 a2 a3 g(x) a3 a2 a1 Convolution is a smoothing function

Convolution In Fourier space the convolution is just the product of the two transforms: Normal Space Fourier Space f*g F  G

DFT of a pure sine wave: width Side lobes So why isn‘t it a d-function?

It would be if we measured the blue line out to infinity: But we measure the red points. Our sampling degrades the delta function and introduces sidelobes

In time space In Fourier Space × Dt 1/P * 1/Dt × * = = 1/P 1/Dt

Window Data 16 min period sampled regularly for 3 hours

The longer the data window, the narrower is the width of the sinc function window:

3ps dn = Error in the period (frequency) of a peak in DFT: 2 N1/2 T A s = error of measurement T = time span of your observations A = amplitude of your signal N = number of data points

A more realistic window And data

Alias periods: Undersampled periods appearing as another period

Alias periods: P –1 P –1 P –1 P –1 (1 day)–1 P –1 P –1 (29.53 d)–1 false P –1 alias P –1 true = + Common Alias Periods: P –1 false (1 day)–1 + P –1 true day = P –1 false (29.53 d)–1 + P –1 true = month P –1 false = (365.25 d)–1 + P –1 true year

Nyquist Frequency N < fs/2 If T is your sampling rate which corresponds to a frequency of fs, then signals with frequencies up to fs/2 can be unambiguously reconstructed. This is the Nyquist frequency, N: N < fs/2 e.g. Suppose you observe a variable star once per night. Then the highest frequency you can determine in your data is 0.5 c/d = 2 days

When you do a DFT on a sine wave with a period = 10, sampling = 1: n0=0.1, 1/Dt = 1 Nyquist frequency -n0 +n0 positive n negative n

The effects of noise: s =10 m/s s =50 m/s s =100 m/s s =200 m/s Real Peaks s =10 m/s s =50 m/s s =100 m/s s =200 m/s 2 sine waves ampltudes of 100 and 50 m/s. Noise added at different levels

3. Lomb-Scargle Periodograms [ S Xj cos w(tj–t) ] 2 j Xj cos2 w(tj–t) [ S Xj sin w(tj–t) ] 2 j Xj sin2 w(tj–t) 1 2 1 2 Px(w) = + tan(2wt) = (Ssin 2wtj)/ (Scos 2wtj) j j Power is a measure of the statistical significance of that frequency (period): Scargle, Astrophysical Journal, 263, 835, 1982

Power: DFT versus Scargle N=15 Scargle Power Amplitude (m/s) N=40 N=100 Frequency (c/d) DFTs give you the amplitude of a periodic signal in the data. This does not change with more data. The Lomb-Scargle power gives you the statistical significance of a period. The more data you have the more significant the detection is, thus the higher power with more data

False Alarm Probability (FAP) The FAP is the probability that random noise will produce a peak with Lomb-Scargle Power the same as your observed peak in a certain frequency range Unknown period: Where P = Scargle Power N = number of independent frequencies in the frequency range of interest FAP ≈ 1 – (1–e–P)N Known period: In this case you have only one independent frequency FAP ≈ e–P Scargle Power (significance) is increased by lower level of noise and/or more data points

False Alarm Probability (FAP) The probability that noise can produce the highest peak over a range ≈ 1 – (1–e–P)N FAP ≈ e–P The probability that noise can produce the this peak exactly at this frequency = e–P

Example: A transit candidate from BEST Why is the FAP Impotant? Example: A transit candidate from BEST Depth [%] 1.2 Duration [h] ? Orbital period [d] Semi mayor axis[AU] Number of detections 1 Target field No. 8 Host star K...(?) Magnitude(B.E.S.T.) 11.25 Radius[Rsun] 0.65-0.85 Radius of planet ? [RJup] 0.71-0.89? To confirm you need radial velocity measurements, but you do not have a period… DSS1/POSSI

16 one-hour observations made with the 2m coude echelle

Least Square sine fit yields of 2.69 days

Published (Dreizler et al Published (Dreizler et al. 2003) Radial Velocity Curve of the transiting planet OGLE 3 Wrong Phase (by 180 degrees) for a transiting planet!

Discovery of a rapidly oscillating Ap star with 16.3 min period FAP ≈ 10–5

Lesson: Do not believe any FAP < 0.01 My limit: < 0.001 b CrB Small FAP does not always mean a real signal Period = 11.5 min FAP = 0.015 Period = 16.3 min FAP = 10–5 Lesson: Do not believe any FAP < 0.01 My limit: < 0.001 Better to miss a real period than to declare a false one

How do you get the number of independent Frequencies? Determining FAP: To use the Scargle formula you need the number of independent frequencies. How do you get the number of independent Frequencies? First Approximation: Use the number of data points N0 Horne & Baliunas (1986, Astrophysical Journal, 302, 757): Ni = –6.362 + 1.193 N0 + 0.00098N02 = number of independent frequencies

Use Scargle FAP only as an estimate Use Scargle FAP only as an estimate. A more valid determination of the FAP requires Monte Carlo Simulations: Method 1: Create random noise at the same level as your data Sample the random noise in the same manner as your data Calculate Scargle periodogram of noise and determine highest peak in frequency range of interest Repeat 1.000-100.000 times = Ntotal Add the number of noise periodograms with power greater than your data = Nnoise FAP = Nnoise/Ntotal Assumes Gaussian noise. What if your noise is not Gaussian, or has some unknown characteristics?

Use Scargle FAP only as an estimate Use Scargle FAP only as an estimate. A more valid determination of the FAP requires Monte Carlo Simulations: Method 2: Randomly shuffle the measured values (velocity, light, etc) keeping the times of your observations fixed Calculate Scargle periodogram of random data and determine highest peak in frequency range of interest Reshuffle your data 1.000-100.000 times = Ntotal Add the number of „random“ periodograms with power greater than your data = Nnoise FAP = Nnoise/Ntotal Advantage: Uses the actual noise characteristics of your data

FAP comparisons Scargle formula using N as number of data points Monte Carlo Simulation Using formula and number of data points as your independent frequencies may overestimate FAP, but each case is different.

Amplitude = level of noise 1 2 4 6 8 10 Amplitude/error N = 20 Number of measurements needed to detect a signal of a certain amplitude. The FAP of the detection is 0.001. The noise level is s = 5 m/s. Basically, the larger the measurement error the more measurements you need to detect a signal.

Lomb-Scargle Periodogram of 6 data points of a sine wave: Lots of alias periods and false alarm probability (chance that it is due to noise) is 40%! For small number of data points do not use Scargle, sine fitting is best. But be cautious!!

Comparison of the 3 Period finding techniques

Least squares sine fitting: The best fit period (frequency) has the lowest c2 Discrete Fourier Transform: Gives the power of each frequency that is present in the data. Power is in (m/s)2 or (m/s) for amplitude Amplitude (m/s) Lomb-Scargle Periodogram: Gives the power of each frequency that is present in the data. Power is a measure of statistical signficance

4. Finding Multiple Periods in Data: Pre-whitening What if you have multiple periods in your data? How do you find these and make sure that these are not due to alias effects of your sampling window. Standard procedure: Pre-whitening. Sequentially remove periods from the data until you reach the level of the noise

Prewhitening flow diagram: Find highest peak in DFT/Scargle (Pi) Save Pi Fit sine wave to data and subtract fit Calculate new DFT Are more peaks above the level of the noise? no Stop, publish all Pi yes

False alarm probability ≈ 10–14 Alias Peaks Noise level

False alarm probability ≈ 0.24 Raw data After removal of dominant period False alarm probability ≈ 0.24

Useful program for pre-whitening of time series data: http://www.univie.ac.at/tops/Period04/ Program picks highest peak, but this may be an alias Peaks may be due to noise. A FAP analysis will tell you this

5. Other Techniques: Phase Dispersion Minimization Amplitude Wrong Period Correct Period Choose a period and phase the data. Divide phased data into M bins and compute the standard deviation in each bin. If s2 is the variance of the time series data and s2 the total variance of the M bin samples, the correct period has a minimum value of Q : Q = s2/s2 See Stellingwerf, Astromomical Journal, 224, 953, 1978

PDM is suited to cases in which only a few observations are made of a limited period of time, especially if the light curves are highly non-sinsuisoidal

PDM Scargle In most cases PDM gives the same answer as DFT, Scargle periodograms. With enough data it should give the same answer

5. Other Techniques: String Length Method Wrong period longer string length Phase the data to a test period and minimize the distance between adjacent points Lafler & Kinman, Astrophysical Journal Supplement, 11, 216, 1965 Dworetsky, Monthly Notices Astronomical Society, 203, 917, 1983

5. Other Techniques: Wavelet Analysis This technique is ideal for finding signals that are aperiodic, noisy, intermittent, or transient. Recent interest has been in transit detection in light curves

„You have to be careful that you do not fool yourself, and unfortunately, you are the easiest person in the world to fool“ Richard Feynmann