ETA Data Processing Steve Ellingson Low Frequency Software Workshop – Chicago – Aug 10, 2008
RFI Environment: Bad But Manageable TV Ch 4 Ch 3 Ch 2 Ch 5 Ch 6 KEY POINT: Can observe here – but need good linearity and narrow channelization ~ 100 s of noise-limited sensitivity using > 95% of contiguous 5 MHz band around 38 MHz Search Range (29-47 MHz) Primary threat to linearity – receiver design challenge Often, but not always possible.
In-Band RFI Challenges Wideband junk Self-Generated (PC) 6-m Amateur Radio Ionospheric enhancement Citizen’s Band, other HF NC State Police Impulsive noise starts to become a problem at resolutions ~100 s Galactic background clearly visible underneath sparse RFI Self-RFI is a relatively minor problem
Offline Processing Up to 200 x 1GB (17s) Files 7+7 bit 7.5 MSPS Data integrity check Create raw spectragrams Create baseline spectragrams Calibrate spectragrams RFI mitigation Incoherent dedispersion Integrate time series Manual inspection for pulses Data transfer errors (rare but significant) Sample value histograms / clipping (checking for intermittent RFI swamping) 1K FFT (yields freq-time resolution kHz x s) Integrate to ms (for Crab GP search; also, suppresses impulsive RFI) Updated every ~7.5 minutes (timed to track Galactic background variation) using spectragrams hand-picked for low RFI Remove frequency response; Linear interpolation between baseline spectragrams to track Galactic background Three passes of “plinking” (replacing extreme values with median values): (1)Time-frequency pixels one at a time [th1] (2)All freq pixels for a given time, triggered on total power thresholding [th2] (3)All time pixels for a given freq, triggered on integrated spectrum thresholding [th3] Operates on kHz x ms spectragrams w/o interpolation In effect, smoothing to expected resolution of scattered-broadened pulse (We use 498 ms for Crab) Difficult to automate due to RFI and time-domain baseline fluxuations Possible Incoherent combining of polarizations / dipole signals
Example of RFI Mitigation Before 3.75 MHz 3600 s After th1 = 0.40 (time-freq) th2 = 0.03 (time) th3 = 0.02 (freq) < 1% pixels plinked 3.75 MHz = kHz = 498 ms = kHz = 498 ms 38.0 MHz 38.0 MHz Plotting power; Extreme values in this plot are typically within a few % of mean
Example Simple Pulse Detection (old toolchain – sorry!) RFI Mitigation, DM = pc/cm 3 No RFI Mitigation, No Dedispersion RFI Mitigation, No Dedispersion 55 55 Duration ~ 2 s Peak DM = pc/cm 3 Est. flux ~ 876 Jy DM sweep
Example of Relatively Good RFI Conditions No RFI Mit, No Dedispersion RFI Mit, No Dedispersion RFI Mit, DM = pc/cm 3
Off-Line Processing Summary l Data processing –Operates on coherently-sampled voltage data (dipoles or beams) –1 hour of observation is typically about 1 TB raw (data constipation!) –100% new C-language source code / tool chains –Nothing special for computing (tend to use existing PC cluster to minimize amount of data transfer) l Lessons Learned (from the perspective of a dispersed pulse hunter) –Value of extensive diagnostic “pre-analysis” to identify problematic data: Smallest fraction of FLOPS, but greatest fraction of person-hours l Weak RFI (histograms over many domains & resolutions) l Spurious ionospheric conditions l Consistency with sky model (“Error” in time-varying continuum small?) l Repeatability (is today within a few tenths of percent of yesterday?) –Seems to be more productive to reobserve than to try to salvage “subtly problematic” data, even if only portions look bad. l By our standards, we end up throwing out about ½ of data that initially looks good –Extent of site multipath (self-inflicted), impact –Antenna & cable dispersion, impact –Value in keeping coherent dipole voltage data, despite logistics, to maximally facilitate reprocessing
9 ETA A/D-RX Board Analog Signal From ARX 120 MHz System Clock Parallel (4b + CLK) LVDS to RCC: 7.5 MSPS I 7 +Q 7, plus in-band data (240 Mb/s) Altera Stratix EP1S25 25,560 LEs 80 9-bit DSP blocks 1,944,576 memory bits LVDS direct-connects via Mictor connector 12-bit, 120 MSPS digitization
1 Reconfigurable Computing Cluster (RCC) 16-node “Virtual FPGA” Each node is a development board with Xilinx XC2VP30 FPGA Edge nodes (“E”) catch streaming LVDS from digital receivers Gb/s Infiniband-like interconnects Center nodes (“C”) route between RCC nodes & push results to PC cluster PPCs internal to FPGAs run Linux, perform GPP-type functions Xilinx ML310
1 RCC “All Dipoles” Mode 240 MB/s aggregate (60 MB/s per PC) Coherent time series, 3.75 MHz BW
Acknowledgements: John Simonetti Phys Cameron Patterson CpE Zack Boor Phys Sean Cutchins Phys Kshitija DeshpandeEE Mahmud HarunEE Mike KavicPhys Anthony LeeEE Brian MartinCpE Wyatt Taylor EE Vivek VenugopalCpE Pisgah Astronomical Research Institute AST Supported by: