DM Plans for Crowded Stellar Fields

Slides:

Advertisements

Similar presentations

(and especially Robert Lupton and Dustin Lang)

Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Surveys and Questionnaires. How Many People Should I Ask? Ask a lot of people many short questions: Yes/No Likert Scale Ask a smaller number.

Human-Computer Interaction Human-Computer Interaction Segmentation Hanyang University Jong-Il Park.

Cosmos Data Products Version Peter Capak (Caltech) May, 23, 2005 Kyoto Cosmos Meeting.

GIANT TO DWARF RATIO OF RED-SEQUENCE GALAXY CLUSTERS Abhishesh N Adhikari Mentor-Jim Annis Fermilab IPM / SDSS August 8, 2007.

Model-fitting E. BertinDES Munich meeting 05/ Automated model fitting in DESDM E.Bertin (IAP)

CS 376b Introduction to Computer Vision 04 / 29 / 2008 Instructor: Michael Eckmann.

Background Estimation and Matching for Strip82 Coadds Yusra AlSayyad University of Washington DM Review September 6, 2012.

Quality Assurance Benchmark Datasets and Processing David Nidever SQuaRE Science Lead.

1 CORE DATA PROCESSING SOFTWARE PLAN REVIEW | SEATTLE, WA | SEPTEMBER 19-20, 2013 Name of Meeting Location Date - Change in Slide Master Data Release Processing.

Conversion of Stackfit to LSST software stack Status as of Feb 20, 2012.

April 2001 OPTICON workshop in Nice 1 PSF-fitting with SExtractor Emmanuel BERTIN (TERAPIX)

MOS Data Reduction Michael Balogh University of Durham.

HLA WFPC2 Source List Photometric Quality Checks Version: August 25, 2008 Brad Whitmore 1.Introduction 2.Comparison with Ground-based Stetson Photometry.

02/12/03© 2003 University of Wisconsin Last Time Intro to Monte-Carlo methods Probability.

SDSS-II Photometric Calibration:

Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.

In conclusion the intensity level of the CCD is linear up to the saturation limit, but there is a spilling of charges well before the saturation if.

All-sky source search Aim: Look for a fast method to find sources over the whole sky Selection criteria Algorithms Iteration Simulations Energy information.

Instructional Technique #2 Use Explicit Instruction to Convey Critical Content.

Validation of HLA Source Lists Feb. 4, 2008 Brad Whitmore 1.Overview 2.Plots 3.Summary.

Source catalog generation Aim: Build the LAT source catalog (1, 3, 5 years) Jean Ballet, CEA SaclaySLAC, 23 May 2005 Four main functions: Find unknown.

Helping you succeed in promoting your club

Algorithms and Problem Solving

Debugging Intermittent Issues

BackTracking CS255.

What bandwidth do I need for my image?

C++ coding standard suggestion… Separate reasoning from action, in every block. Hi, this talk is to suggest a rule (or guideline) to simplify C++ code.

Project Controls: As-Built S-Curves

Part III – Gathering Data

Simple Sorting Algorithms

1. Welcome to Software Construction

CPSC 221: Algorithms and Data Structures Lecture #6 Balancing Act

Jean Ballet, CEA Saclay GSFC, 31 May 2006 All-sky source search

Debugging Intermittent Issues

UNIT 3 – LESSON 5 Creating Functions.

Crowded Field Photometry Stars, Milky Way, Local Volume

Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.

Cosmology and Galaxies Level 3 Requirements

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Recognizing Humans: Action Recognition

Prepared by Kimberly Sayre and Jinbo Bi

OOP Paradigms There are four main aspects of Object-Orientated Programming Inheritance Polymorphism Abstraction Encapsulation We’ve seen Encapsulation.

Computer Vision - A Modern Approach

CPSC 221: Algorithms and Data Structures Lecture #6 Balancing Act

Screen Writing Brylee Huber.

INF 5860 Machine learning for image classification

Basics of Photometry.

Closure Representations in Higher-Order Programming Languages

Sprint Planning April 2018.

Announcements more panorama slots available now

Big-O, Ω, ϴ Trevor Brown CSC263 Tutorial 1 Big-O, Ω, ϴ Trevor Brown

Chapter 12 Power Analysis.

Presentation to - Management Team Javier Garza, HRM B-02

Algorithms and Problem Solving

Dtk-tools Benoit Raybaud, Research Software Manager.

Consensus Algorithms.

Announcements more panorama slots available now

Software Development Techniques

Standard Normal Table Area Under the Curve

Standard Normal Table Area Under the Curve

Information system analysis and design

Presentation transcript:

DM Plans for Crowded Stellar Fields Jim Bosch, Data Release Production Scientist

What does "crowded" mean?

sources/arcsec2 * psf effective area ~ 0.044 DECam data from Fred Moolekamp. More stars than high-latitude, but not crowded, and not a problem. Current pipeline could *probably* process this data without any trouble. LSST PST & SCC - 2017/04/18

sources/arcsec2 * psf effective area ~ 0.075 DECam data from Fred Moolekamp. Still not very crowded, and will not be a problem in the future. We might have trouble with astrometry here, but even that will be fixed soon. Regular detect/deblend/measure works fine. LSST PST & SCC - 2017/04/18

sources/arcsec2 * psf effective area ~ 0.187 DECam data from Fred Moolekamp. Crowded enough that I start to worry, and the current pipeline doesn't work at all (had to do a lot of manual intervention to get detection/backgrounds to work well enough to estimate density). LSST PST & SCC - 2017/04/18

This is, of course, a field of galaxies (from Hyper Suprime-Cam COSMOS). But it has a lot of crowding too, and I actually think any deblender that can handle this image should be able to handle at least the middle one. LSST PST & SCC - 2017/04/18

Overview Requirements and Plans: the Big Picture. Testing, Goals, and Metrics: how well are we actually doing? Algorithms: what does our pipeline do now? how will we better support crowded fields in the future? LSST PST & SCC - 2017/04/18

Requirements

Requirements All of DM's work on crowded fields is on a best effort basis. A major replan of the remaining DM construction work is close to being completed; one of open issues is the formulation of quantitative performance metrics and goals for crowded field processing. Science Collaborations can greatly help with these metrics and goals. LSST PST & SCC - 2017/04/18

Plans

Plans Our baseline design is to use essentially the same algorithms to process high-latitude fields and crowded stellar fields. It's at least plausible that this will let us process very crowded fields; the design combines concepts from existing pipelines for high-latitude images with concepts from existing crowded field codes. This is a novel approach, and hence it's impossible to guarantee it will work. With an infinite compute budget, I'm pretty sure it would work. With our finite one, it's much less clear. I'll provide more details later. LSST PST & SCC - 2017/04/18

Plans We are also evaluating processing crowded stellar fields with specialized code. There is plenty of prior art if we don't have to worry about galaxies at all. Can we really say we don't need to worry about galaxies at all? They don't get less common just because there are more stars, and LSST will be a lot deeper than previous surveys. Specialized code means processing intermediate regions multiple times, and that means bigger compute costs and more complicated databases. LSST PST & SCC - 2017/04/18

Plans We do not plan to use an existing third-party code even if we do specialized processing of crowded fields. The LSST pipelines already have many of the algorithmic components we need: PSF models, background subtraction, detection algorithms, model-fitting. Retrofitting a third-party code to use these components may be more difficult than using them to reimplement the same algorithm. We do not believe existing third-party codes can run effectively at this scale without human intervention. We do not know if existing third-party codes will be fast enough. LSST PST & SCC - 2017/04/18

Testing, Goals, and Metrics Since we can't promise anything about how well we will do, help us learn, track, and report how well we are doing.

Test Data Part of the reason DM's stellar field plans are vague is that we don't regularly run our pipelines on stellar field data. We're aware of some datasets (e.g. Schlafly et al DECam plane survey, HSC satellite galaxy searches). We have not done the work to identify and package "bite size" subsets of the data for regular testing. We need test data that spans (and samples well) the levels of crowding LSST will see. We need test data that goes as deep as LSST. We need data we can process: DECam, HSC, or CFHT. LSST PST & SCC - 2017/04/18

Goals DM will be trying to better define its goals and priorities for crowded fields over the next ~3 months, and we'd appreciate input from science collaborations. For example: What level of crowding should we focus our effort on? How do we weigh better processing of moderately crowded fields against minimal processing of extremely dense fields? How important are variability and astrometry compared to static-sky photometry? What metrics (and goals) should we have for completeness when it can't be described by a magnitude limit? LSST PST & SCC - 2017/04/18

Metrics It's easiest for us if these goals are defined as metrics we can use to test our processing. Ideally, a metric includes: A test dataset. One or more numbers that can be measured from the output catalog that relate to the quality of the processing (e.g. width of the main sequence in some color-color space). A sequence of goals for those sets of numbers. It would be very interesting to define these goals from the outputs of existing crowded field codes. There's no guarantee that we will meet any goals we define, but we will make incremental improvements and report our progress. LSST PST & SCC - 2017/04/18

Contributing, Part 1 Help with any of these steps would be great. Identify and package up a test dataset. Try running the DM stack on a test dataset. If it's more than a little crowded, it'll probably at least require some configuration-tweaks to get it running right now. If it's very crowded, it may not run at all right now, or the results may be complete garbage. But that too is good for us to know. Run an existing crowded field code on a test dataset. Try to define some specific metrics to test the quality of the processing. LSST PST & SCC - 2017/04/18

Communicating DM developers will be doing this kind of work, too. We can help (via community.lsst.org and Slack) with running the DM stack. We'd like to hear what you're planning to work on before you spend too much time on it, so we don't duplicate work. We'll try to find a way to make our testing plans public as well. We'll be identifying a point person or group to receive more structured input on metrics. LSST PST & SCC - 2017/04/18

Algorithms

The LSST Pipeline Today Very roughly, the steps of the pipeline are: We process individual CCD images to detrend, measure PSF models, fit initial WCS and photometric calibration. We do a joint fit to all of the per-CCD catalogs in a region of sky to improve the WCSs and photometric calibrations. We build coadds. We detect, deblend, and perform direct and forced measurements on the coadds. LSST PST & SCC - 2017/04/18

sources/arcsec2 * psf effective area ~ 0.044 DECam data from Fred Moolekamp. Nothing to complain about. Definitely no harder than high-latitude fields. LSST PST & SCC - 2017/04/18

sources/arcsec2 * psf effective area ~ 0.075 DECam data from Fred Moolekamp. Starting to miss some fainter sources, probably due to bad background estimation. Missing some neighbors, too. LSST PST & SCC - 2017/04/18

sources/arcsec2 * psf effective area ~ 0.187 DECam data from Fred Moolekamp. This is not a fair comparison - I had to hack the background level by hand to detect this many sources, and I had to change a lot of the default configs. But that was about an hours work, and I think we'd be able to get that automated without too much work. LSST PST & SCC - 2017/04/18

The LSST Pipeline Today Very roughly, the steps of the pipeline are: We process individual CCD images to detrend, measure PSF models, fit initial WCS and photometric calibration. We do a joint fit to all of the per-CCD catalogs in a region of sky to improve the WCSs and photometric calibrations. We build coadds. We detect, deblend, and perform direct and forced measurements on the coadds. LSST PST & SCC - 2017/04/18

The LSST Pipeline in the Future Very roughly, the steps of the pipeline are: We process individual CCD images to detrend, measure PSF models, fit initial WCS and photometric calibration. We do a joint fit to all of the per-CCD catalogs in a region of sky to improve the WCSs and photometric calibrations. We build coadds. We detect, deblend, and perform direct and forced measurements on the coadd. We simultaneously fit one model for each object to all of the CCD images it appears on. LSST PST & SCC - 2017/04/18

Coadd Processing Today Detection: find above-threshold regions and peaks within them on the coadd. Deblend: separate blended objects via a simultaneous fit with purely empirical models (which can just be the PSF). Measure: replace neighbors with noise using Deblend results, then measure each object separately. LSST PST & SCC - 2017/04/18

Coadd Processing in the Future Detection: find above-threshold regions and peaks within them on the coadd. Deblend: separate blended objects via a simultaneous fit with purely empirical models (which can be encouraged to just be the PSF). Subtract the brightest stars from coadds, and go back to Detection until we converge. Measure: replace neighbors with noise using Deblend results, then measure each object separately. Fit blended objects simultaneously with PSF and galaxy models (maybe). LSST PST & SCC - 2017/04/18

Big Algorithmic Questions Can we do PSF modeling and astrometric registration in severe crowding? We can probably use image differencing techniques to iteratively improve the PSF. We can probably select just the brighter (maybe even saturated) stars for astrometric registration. How does the deblender scale with the number of objects and pixels in the blend? We already handle galaxy clusters with 50+ objects (not well, but galaxies are much harder than PSFs). Prototype deblender in development already uses sparse matrices. LSST PST & SCC - 2017/04/18

Big Algorithmic Questions Can we do multifit in crowded fields? We probably don't want to fit galaxy models at all, but we need to decide when to turn them off. We're already considering fitting moving point-source models to all objects in small blends simultaneously; does that scale to large blends? LSST PST & SCC - 2017/04/18

Contributing, Part 2 Help us build a crowded field code from our algorithmic components. Starting with our PSF modeling and detection code, it'd be pretty easy to write Python scripts to: Detect objects. Simultaneously fit all detected peaks as point sources (with e.g. a SciPy sparse matrix solver). Iterate. The learning curve for developing in the DM stack is steep; it might be worthwhile to spend a week at Princeton or UW. A DM developer could write a first version of this in a few weeks. The hard part (and what we don't have the effort for) is really getting it working. Anything developed this way can easily be integrated into our main pipeline (as an option; the fact that it doesn't handle galaxies means we can't include it by default). LSST PST & SCC - 2017/04/18

In Conclusion There's a lot of uncertainty right now, but we will start putting some bounds on how well we'll do very soon. Predicting how well we'll handle the hardest cases may always be impossible. There are ways to contribute both priorities and algorithms right now, especially if you're willing to spend some time learning to use the DM stack. LSST PST & SCC - 2017/04/18