Frequency Lines Identification: cmt package for On-line (by Fd ) and Off-line Virgo data quality Sabrina D’Antonio Roma2 Tor Vergata Roma 1 Pulsar group: Pia Astone, Sergio Frasca, Cristiano Palomba, Federica Antonucci
Data quality tool: frequency domain lines identification by the PSS library (Rome 1 group) SUMMARY: Time domain disturbances: event’s identification and removal (to obtain clean data) Estimation of the average AR spectrum Frequency domain lines identification The log file and the spectra files
We want to identify and remove “high frequency shorts events”, which will increase the level of the noise. Events identification: after high-pass bilateral filtering (not phase shifts) with f hp =100 Hz. Once high frequency events have been found and their parameters registered, we “subtract” them from the original time series. Hence we produce the “cleaned data sets”. Time domain disturbances: event’s identification and cleaning
Time domain disturbances: event’s identification We evaluate AR mean and standard deviation; We evaluate AR mean and standard deviation; The threshold is set on the critical ratio CR defined as The threshold is set on the critical ratio CR defined as The memory time depends on the apparatus. We set it to 600 s and the CR thr to 6. The memory time depends on the apparatus. We set it to 600 s and the CR thr to 6. The “dead time”, minimum time between two events, depends on the noise and the expected signal. We are using 1s. The “dead time”, minimum time between two events, depends on the noise and the expected signal. We are using 1s.
The cleaning requires the set up of another parameter, “the edge width”: it indicates how many seconds before and after the event are used in the cleaning of the data. We have used 0.15 s The cleaning requires the set up of another parameter, “the edge width”: it indicates how many seconds before and after the event are used in the cleaning of the data. We have used 0.15 s From the “beginning time” up to the “beginning time + duration” we subtract the high frequency component to the data. From the “beginning time” up to the “beginning time + duration” we subtract the high frequency component to the data. Data from the “beginning time – edge width“ to “beginning time” and data up to “beginning time+duraton+edge width” are linearly interpolated. Data from the “beginning time – edge width“ to “beginning time” and data up to “beginning time+duraton+edge width” are linearly interpolated. Time domain disturbances: cleaning
The procedure to estimate the average spectrum A good estimator should have the following properties: If peaks in the frequency domain are present, the estimator should not be affected by peaks. This should be as much as possible independent on the SNR of the peak; If the noise level varies, either slowly or rapidly, the estimator should be able to follow the noise variations. We refined the procedure, with the use of an autoregressive estimation (AR) of the average of the spectrum, with the basic idea of a “clean estimator”.
AR estimation. Already described applied from higher frequencies toward lower frequencies. To deal with the increasing of the noise level toward lower frequencies. FFT length: >T=1048.6s FFT mode: overlapped (or not) by the half, flat top- cosine window The procedure to estimate the average spectrum
The frequency lines search starts with the ratio R of the spectrum to its AR estimation. On this function, we set a threshold at the level of SNR thr =(2.5) 0.5. GEN_FAC* In the log files we record frequency lines with SNR≥GEN_FAC*SNR thr (GEN_FACT used =2). All the data which cross the threshold are local maxima are then registered into the log file. Lines frequency identification: peak map
Output Files Log file Date_of_creation.log: Log file Date_of_creation.log: * information about time domain and frequency domain events *one file for all the processed time period (?) *24 Mb (10 days). Spectra files : Spectra files : PS with high frequency resolution df=9.5367e-004 Hz A new file every 100 FFT (?) Dimension ~ 850 Mb PS with lower frequency resolution df*128 Hz Dimension ~ 6.3 Mb
PSS crea_sfdb job log file PSS crea_sfdb job log file started at Wed Jan 23 09:51: started at Wed Jan 23 09:51: INPUT : VIR_h_4000Hz_ GWF First data time in the first file of the run INPUT : VIR_h_4000Hz_ GWF First data time in the first file of the run OUTPUT : VIR_h_4000Hz_ SBL The first SBL file opened OUTPUT : VIR_h_4000Hz_ SBL The first SBL file opened ! even NEW: a new FFT has started ! even NEW: a new FFT has started ! PAR1: Beginning time of the new FFT ! PAR1: Beginning time of the new FFT ! PAR2: FFT number in the run ! PAR2: FFT number in the run ! even EVT: time domain events ! even EVT: time domain events ! PAR1: Beginning time, in mjd ! PAR1: Beginning time, in mjd ! PAR2: Duration [s] ! PAR2: Duration [s] ! PAR3: Max hp data amplitude*EINSTEIN ! PAR3: Max hp data amplitude*EINSTEIN ! PAR4: Max CR ! PAR4: Max CR ! PAR5: Energy (sum of squared amp) ! PAR5: Energy (sum of squared amp) Log file info
! even EVF: frequency domain events, with high threshold ! even EVF: frequency domain events, with high threshold ! PAR1: Beginning frequency of EVF ! PAR1: Beginning frequency of EVF ! PAR2: Duration [Hz] ! PAR2: Duration [Hz] ! PAR3: Ratio, in amplitude, max/average ! PAR3: Ratio, in amplitude, max/average ! PAR4: Power*EINSTEIN**2 or average*EINSTEIN (average if duration=0, when age>maxage) ! PAR4: Power*EINSTEIN**2 or average*EINSTEIN (average if duration=0, when age>maxage) ! par GEN: parameters of the AR spectrum estimation ! par GEN: parameters of the AR spectrum estimation (PAR) GEN_THR = (PAR) GEN_THR = (PAR) GEN_TAU = (PAR) GEN_TAU = (PAR) GEN_MAXAGE = (PAR) GEN_MAXAGE = (PAR) GEN_FAC = (PAR) GEN_FAC = ! GEN_THR is the threshold in amplitude ! GEN_THR is the threshold in amplitude ! GEN_TAU the memory frequency of the AR estimation ! GEN_TAU the memory frequency of the AR estimation ! GEN_MAXAGE [Hz] the max age of the process. If age>maxage the AR is re-evaluated ! GEN_MAXAGE [Hz] the max age of the process. If age>maxage the AR is re-evaluated ! GEN_FAC is the factor for which the threshold is multiplied, to write less EVF in the log file ! GEN_FAC is the factor for which the threshold is multiplied, to write less EVF in the log file
Log file info ! par GEN: general parameters of the run (PAR) GEN_BEG = (PAR) GEN_NSAM = (PAR) GEN_DELTANU = (PAR) GEN_FRINIT = 0 ! GEN_BEG is the beginning time (mjd) ! GEN_NSAM the number of samples in 1/2 FFT ! GEN_DELTANU the frequency resolution ! GEN_FRINIT the beginning frequency of the FFT (PAR) EVT_CR = 6 (PAR) EVT_TAU = 600 (PAR) EVT_DEADT = 1 (PAR) EVT_EDGE = 0.15 ! EVT_CR is the threshold ! EVT_TAU the memory time of the AR estimation ! EVT_DEADT the dead time [s] ! EVT_EDGE seconds purged around the event
(PAR) EVF_THR = 2.5 (PAR) EVF_THR = 2.5 (PAR) EVF_TAU = 0.02 (PAR) EVF_TAU = 0.02 (PAR) EVF_MAXAGE = 0.02 (PAR) EVF_MAXAGE = 0.02 (PAR) EVF_FAC = 2 (PAR) EVF_FAC = 2 ! EVF_THR is the threshold in amplitude ! EVF_THR is the threshold in amplitude ! EVF_TAU the memory frequency of the AR estimation ! EVF_TAU the memory frequency of the AR estimation ! EVF_MAXAGE [Hz] the max age of the process. If age>maxage the AR is re-evaluated ! EVF_MAXAGE [Hz] the max age of the process. If age>maxage the AR is re-evaluated ! EVF_FAC is the factor for which the threshold is multiplied, to write less EVF in the ! EVF_FAC is the factor for which the threshold is multiplied, to write less EVF in the Log file info
--> NEW > > NEW > > EVT > e+07 --> EVT > e+07 --> EVT > e+07 --> EVT > e+07 ……. ……. --> EVF > > EVF > > EVF > > EVF > > EVF > > EVF > >>> TOT > >>> TOT > > NEW > > NEW > > EVT > e+08 --> EVT > e+08 Log file info
Virgo data from T0= ( :59:45) up to T= FFT ( days) 818 FFT ( days) EVT veto=21303/2 EVT veto=21303/2 (total time vetoed /2 s) (total time vetoed /2 s) EVF= EVF=245996
From the log files: Time-events veto = =
From the log files: Time-events veto one with double duration respect to the other. 4 events family
From the log files: Time-events veto
From the log files: Time-frequency plot
Frequency lines detected from: ( from Gabriele LineMonitor) (mag) to (green) to
From the log files: Time-frequency plot Frequency lines detected from: ( from Gabriele LineMonitor) (mag) to (green) to
From the log files: Time-frequency plot
From the log files: frequency lines hist.
From the log files: Amplitude-frequency.
From the log files: CR-frequency.
From the log files: CR
From the log files: Duration vs frequency red dot : EVF_MAXAGE = 0.02 Duration> EVF_MAXAGE = 0.02 Hz
From the short PS Time-frequency plot
TO BE DONE: To write the documentation To define with interested people : files dimension (open a new file after N FFT) Writing of the output files optional Lower the SNR thr …. Suggestions…