Detecting the Unexpected: Discovery in the Era of Astronomically Big Data Ray Norris, Western Sydney University & CSIRO Astronomy & Space Science,
Detecting the Unexpected (Discovery in the Era of Astronomically Big Data) Conference at Space Telescope Science Institute, Baltimore, MD 27 Feb to 3 March 2017 See https://webcast.stsci.edu/webcast/searchresults.xhtml?searchty pe=20&eventid=251&sortmode=2 for most talks Breakdown of talks: Discovering the unexpected: 4 Transients/time series outlier detection: 3 Citizen science: 5 Other (algorithms, spectroscopy, visualisation,etc): 12 Also: Hackathon, Unconferences
The four“Discovering the Unexpected talks” Dalya Baron: Finding the weirdest objects in large astronomical surveys Ray Norris: WTF: Discovering the Unexpected in Next-generation Radio Surveys Arjun Dey: Maximizing Serendipity: The Unexpected in the era of Spectroscopic Surveys Kiri Wagstaff: Discovery via Eigenbasis Modeling of Uninteresting Data Maybe also include: Chris Lintott: Citizen Science in the era of Big Data
Dalya Baron: Finding the weirdest objects in large astronomical surveys Reviewed by Ray Norris and Rosalind Wang in ML Projects meeting on 8 December 2016 See http://mlprojects.pbworks.com/20161208 2 million spectra Construct training set of unusaul objects
Now use RF to find objects with unusual spectra Ray Norris, CSIRO & WSU
Results 18, of which 15 already known Ray Norris, CSIRO & WSU
Ray Norris: WTF: Discovering the Unexpected in Next-generation Radio Surveys See Crawford+2016 arXiv:1611.02829 and Norris 2017 arXiv:1611.05570 Most major astronomical discoveries are unexpected and result from advances in technology
Discoveries with HST ✔ Project Key project Planned? Nat. Geo. top ten? Highly cited? Nobel prize? Use Cepheids to improve value of H0 ✔ study intergalactic medium with uv spectroscopy Medium-deep survey Image quasar host galaxies Measure SMBH masses Exoplanet atmospheres Planetary Nebulae Discover Dark Energy Comet Shoemaker-Levy Deep fields (HDF, HDFS, UDF, FF, etc) Proplyds in Orion GRB Hosts from Norris arXiv:1611.05570
Size of radio continuum surveys over time ASKAP Radio Continuum survey: EMU = 70 million NVSS=1.8 million 2016 total=2.5 million 1940 1980 2020
20cm radio continuum surveys SKA2? SKA1? ASKAP-EMU 75% of sky Rms=10μJy, res ~ 10 arcsec ~70 million galaxies c.f. 2.5 million detected over the entire history of radio-astronomy VLA-NVSS 75% of sky Rms=450μJy, res ~ 45 arcsec ~1.8 million galaxies Uncharted observational phase space Meerkat MIGHTEE (approx) 5σ Sensitivity (mJy)
Can ASKAP Discover the Unexpected? Data volumes are huge – cannot sift by eye Instrument is complex – no single individual will be familiar with all possible artifacts ASKAP will be superb at answering well-defined questions (the “known unknowns”) Humans won’t be able to find the “unknown unknowns” Can we mine data for the unexpected, by rejecting the expected? If not, ASKAP will not reach its full potential i.e. it will not deliver value for money
Type 1 discoveries: Unexpected objects (e. g Type 1 discoveries: Unexpected objects (e.g. pulsars, quasars) Simple outlier detection – right?
Type 2 discoveries: unexpected phenomena (e. g Type 2 discoveries: unexpected phenomena (e.g. Hubble expansion, dark energy, dark matter)
Arjun Dey: Maximizing Serendipity: The Unexpected in the era of Spectroscopic Surveys DESI starting photometry now - 1/3 of sky from dec -20 to dec +34 all in public domain ask question whether there really are new objects in extreme areas of parameter space (but missed new correlations or phenomena) spectroscopy starting with a 5000-fibre spectrometer in a few years
Kiri Wagstaff: Discovery via Eigenbasis Modeling of Uninteresting Data
Chris Lintott: Citizen Science in the era of Big Data disagreement of real data with simulations - recognise the value - simulations are an encapsulation of our knowledge cox et al: no correlation between age, education, gender and participation in citizen science or how long that epee continue in it hocking eta l arrive 1507.01589 - looks at how people classify things can build in ML so algorithm interacts with user results and retires objects once they are understood zorilla problem: use ML as a first-pass filter and give the hard ones to citizen scientists when they used ml to remove null images, # of classifications dropped - people like to have some null images
Take-home messages Lots of techniques for outlier detection. Finding new objects in the right parameter space shouldn’t be too hard EXCEPT We need to be able to distinguish between new objects and artefacts Very little thought on finding phenomena (Type 2 problem) Citizen science can be very important for discovering the unexpected if we figure out how to use it correctly
Other interesting things Hackathon Unconferences Techniques bazaar
See arXiv:1611.05570 Western Australia We acknowledge the Wajarri Yamaji people as the traditional owners of the ASKAP site Western Australia See arXiv:1611.05570