Presentation is loading. Please wait.

Presentation is loading. Please wait.

Arecibo and ALFA Telescope upgrade ALFA = Arecibo L-band Feed Array

Similar presentations


Presentation on theme: "Arecibo and ALFA Telescope upgrade ALFA = Arecibo L-band Feed Array"— Presentation transcript:

1 Large-scale Surveys Using the Arecibo Telescope Jim Cordes Professor of Astronomy
Arecibo and ALFA Telescope upgrade ALFA = Arecibo L-band Feed Array Scientific consortia Pulsars Galactic science Extragalactic science Broad science goals for massive surveys Data management and data mining: Current algorithms Exploiting infrastructure with new algorithms for discovery Astronomy/Computer Science collaboration Why data storage and management at Cornell? The plan for the RI project 4/26/2019

2 The Arecibo Telescope Largest single aperture on the planet
305m diameter 0.04 – 10 GHz operation Used for radio astronomy, radar astronomy and atmospheric science A National facility open to all users ~150 scientific programs/year Arecibo Observatory Visitor and Education Facility: 120,000 visitors/year Two upgrades (mid 1970s, late 1990s) Now used for massive surveys using a multiple feed array (ALFA, installed April 2004) 4/26/2019

3 4/26/2019

4 4/26/2019

5 ALFA Parkes MB Feeds 4/26/2019

6 Parkes MB Feeds 4/26/2019

7 Parkes MB Feeds 4/26/2019

8 ALFA Scientific Consortia
P-ALFA, G-ALFA, E-ALFA 35 to 50 members/consortium Open participation, self-organized Astrophysics goals are consortium driven Fraction of telescope time for ALFA surveys: TBD but expect ~ 25% Baseline processing using Consortium software Datamining advances by individual research groups; at Cornell, enabled by new algorithms used with data storage and processing capability Surveys will be long term legacies for the astrophysical community given appropriate infrastructure Challenges: RFI excision, data rates and volumes, distinguishing astrophysical signals from noise and RFI. 4/26/2019

9 Why do more pulsar surveys?
Astrophysics of neutron stars A recent important discovery (a NS-NS binary) What can Arecibo/ALFA do better? International research groups Student involvement Undergraduate students at Cornell Graduate students at Cornell A620: Large-scale Surveys in Radio Astronomy (Spring 2004) Multi-wavelength astrophysical community 4/26/2019

10 Pulsar Science Extreme matter physics Relativistic plasma physics
10x nuclear density High-temperature superfluid & superconductor B ~ Bq = 4.4 x 1013 Gauss Voltage drops ~ 1012 volts FEM = 109Fg = 109 x 1011FgEarth Relativistic plasma physics Magnetospheres Radiation mechanisms Tests of theories of gravity Gravitational wave detectors Probes of turbulent and magnetized ISM (& IGM) End states of stellar evolution 4/26/2019

11 4/26/2019

12 Double Neutron Star Binary: J0737-0939A,B
4/26/2019

13 The First ALFA Pulsar 4/26/2019

14 Algorithm: matched filtering in the DM-t plane.
A pulsar found through its single-pulse emission, not its periodicity (c.f. Crab giant pulses). Algorithm: matched filtering in the DM-t plane. ALFA’s 7 beams provide powerful discrimination between celestial and RFI transients 4/26/2019

15 Surveys with Parkes, Arecibo & GBT.
Simulated & actual Yield ~ 1000 pulsars. 4/26/2019

16 Data Management Sampling the radio sky
Real-time and average data rates Survey algorithms Approximations to matched filtering Dedispersion + FFT + harmonic summing + threshold tests Single-pulse searching Binary orbital effects Processing times Intermediate data products 4/26/2019

17 Pulsar Surveys Most demanding of the ALFA surveys
~ 100 MB/s to disk ~ 1 PB for entire survey ( % duty cycle) Requires coarsely parallel processing of raw data in discrete, local data chunks processing time ~ x data acquisition time on single processor (Intel 2.5 GHz 512k cache with 1GB ram) depends on data set details, algorithms, code distributed initial processing (Cornell + 5 sites) Requires meta-analysis of data products of the initial analysis enabled by Cornell database and algorithms for datamining 4/26/2019

18 Basic data unit = a dynamic spectrum
106 – 108 samples x 64 s Fast-dump spectrometers: Analog filter banks Correlators FFT (hardware) FFT (software) Polyphase filter bank e.g. WAPP, AOFTM, GBT correlator + spigot card time Frequency 64 to 1024 channels 4/26/2019

19 Pulsar Periodicity Search
time Frequency time DM FFT each DM’s time series |FFT(f)| 1/P 2/P 3/P    4/26/2019

20 4/26/2019

21 4/26/2019

22 x2 (dedispersed time series)
Data increase/reduction: x2 (dedispersed time series) x1 10-3 4/26/2019

23 Data Flow Raw data obtained at Arecibo Local processing at Arecibo
Quality control Targeted scientific processing Transport of raw data to processing centers (Cornell + 5 other institutions) via disk packs CS/CTC role: Initial search analysis (by Cornell researchers) Incorporation of data/products from all processing sites into database plus access tools (joint Astronomy/CS effort) Provide capabilities for meta-analysis of search output for candidate identification; used by Pulsar Consortium Institutions. Long-term archival of raw data and data products for future processing, cross-correlation with future instruments (ground and space-based). Used by the general astronomical community. (National Virtual Observatory) 4/26/2019

24 Data residence times on CTC Facility
Data Type Size Residence Time on Disk Raw Data 14 TB/chunk 1 week Dedispersed Time Series 1 month Candidate lists and plots 0.014 TB/chunk Forever 4/26/2019

25 Why at Cornell? National Astronomy and Ionosphere Center
Cornell operates the Arecibo Observatory under a cooperative agreement with the NSF ALFA surveys are legacy activities whose success NAIC wishes to oversee, including Data acquisition and processing Explicit catalogs of sources disseminated to the astrophysical community Long-term archiving for synergistic activities with future surveys Department of Astronomy Cornell faculty are directly involved with management, operations, and usage of Arecibo, including ALFA surveys Department of Computer Science Provides expertise in database design and data mining tools Will make use of Arecibo data for datamining algorithm research Cornell Theory Center CTC’s mission is to support research groups and centers at Cornell Can provide stable infrastructure and expertise needed for long-term archiving and for providing analysis tools in a high-performance computing environment 4/26/2019

26 Mining Astronomy Data Challenges:
Detection of pulses (matched filtering, thresholding, classification) Classification of pulses (classification) Detection of pulse sequences (sequence mining) Grouping of similar pulses (clustering) Detection of changes over time Example: Scalable classification tree construction 4/26/2019

27 Example: Classification Trees
<30 >=30 Y YES M S, T NO YES 4/26/2019

28 Top-Down Tree Construction
BuildTree(Node t, Training database D, Split Selection Method S) (1) Apply S to D to find splitting criterion (2) if (t is not a leaf node) (3) Create children nodes of t (4) Partition D into children partitions (5) Recurse on each partition (6) endif 4/26/2019

29 Split Selection Method
Numerical Attribute: Find a split point that separates the (two) classes (Yes: No: ) x2 X x1 4/26/2019

30 Split Selection Method
Categorical Attributes: How to group? y1: y2: y3: (y1, y2) -- (y3) (y1) --- (y2, y3) (y1, y2) --- (y3) 4/26/2019

31 Scalable Classification Tree Construction
Assume data partition at a tree node is D. Simple algorithm: 1. Build cross-tab of attribute/class label for all attributes in-memory 2. Choose splitting attribute and splitting predicate 3. Partition D 4. Recurse Problems: Scans training database for each level of the tree Cross-tabs can become very large 4/26/2019

32 Bootstrapped Tree Construction
X Training Database <30 >=30 Right Partition Left Partition 4/26/2019

33 Approximate tree with bounds
Algorithm Overview In-memory Sample Approximate tree with bounds Sampling Phase Cleanup Phase Approximate tree, bounds Final tree All the data 4/26/2019

34 Sample Research Problems
Scalability Tree-structured models Association rules and sequences Clustering Highly correlated variables and auto-correlation Mining with very high noise levels Detecting and quantifying change (presentation this afternoon) 4/26/2019

35 Arecibo and the RI Project
Galactic Plane Survey: MB s-1 = 806 TB Telescope time: ~$2k/hr 3-5 years to acquire data (modulo telescope demand, interference) Reobservations of candidate signals Iterative process of acquiring raw data, data reduction, reobservations, cross correlations with other databases, reprocessing with newer algorithms RI project: establishment of Cornell processing center development of database and meta-analysis tools Collaboration between Astronomy and CS Departments + CTC provision for access of data and tools by Pulsar Consortium and wider community of researchers longer term 4/26/2019


Download ppt "Arecibo and ALFA Telescope upgrade ALFA = Arecibo L-band Feed Array"

Similar presentations


Ads by Google