Classification of GAIA data

Slides:



Advertisements
Similar presentations
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Advertisements

2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
Medical Image Registration Kumar Rajamani. Registration Spatial transform that maps points from one image to corresponding points in another image.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Principle of Locality for Statistical Shape Analysis Paul Yushkevich.
Compilation of stellar fundamental parameters from literature : high quality observations + primary methods Calibration stars for astrophysical parametrization.
Radial Basis Function Networks
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Die Vermessung der Milchstraße: Hipparcos, Gaia, SIM Vorlesung von Ulrich Bastian ARI, Heidelberg Sommersemester 2004.
This week: overview on pattern recognition (related to machine learning)
Ground based observations for Gaia 2001 : need to have reference stars to calibrate AP algorithms for Gaia i.e. stars with well-known APs that will observed.
Peter Capak Associate Research Scientist IPAC/Caltech.
SALTLIB Proposal for a Stellar Spectral Library using H. P. Singh, Department of Physics & Astrophysics University of Delhi, Delhi – ,
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Gyöngyi Kerekes Eötvös Lóránd University, Budapest MAGPOP 2008, Paris István Csabai László Dobos Márton Trencséni.
Peter-Christian Zinn | Bringing order to chaos | SKANZ 2012 | Auckland, New Zealand
SDSS photo-z with model templates. Photo-z Estimate redshift (+ physical parameters) –Colors are special „projection” of spectra, like PCA.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
EÖTVÖS UNIVERSITY BUDAPEST Department of Physics of Complex Systems Photometric parallax estimation using the MILES catalog and BaSeL models István Csabai.
Advanced Stellar Populations Advanced Stellar Populations Raul Jimenez
On the Role of Dataset Complexity in Case-Based Reasoning Derek Bridge UCC Ireland (based on work done with Lisa Cummins)
Linear Models for Classification
Multiobject Spectroscopy: Preparing and performing Michael Balogh University of Durham.
Automated Fitting of High-Resolution Spectra of HAeBe stars Improving fundamental parameters Jason Grunhut Queen’s University/RMC.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Emission Line Galaxy Targeting for BigBOSS Nick Mostek Lawrence Berkeley National Lab BigBOSS Science Meeting Novemenber 19, 2009.
LIGO-G E Network Analysis For Coalescing Binary (or any analysis with Matched Filtering) Benoit MOURS, Caltech & LAPP-Annecy March 2001, LSC Meeting.
Object classification and physical parametrization with GAIA and other large surveys Coryn A.L. Bailer-Jones Max-Planck-Institut für Astronomie, Heidelberg.
The Gaia Challenge Coryn A.L. Bailer-Jones Max-Planck-Institut für Astronomie, Heidelberg acknowledgements: ESA, the Gaia scientific community and industrial.
Pisa, 4 May 2009 Alessandro Spagna A new kinematic survey (from GSC-II and SDSS-DR7) to study the stellar populations of the Milky Way Alessandro Spagna.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Coryn Bailer-Jones, ADASS XVII, September 2007, Kensington A method for exploiting domain information in parameter estimation Coryn Bailer-Jones Max Planck.
The Gaia Galactic Survey Mission Coryn Bailer-Jones Max Planck Institute for Astronomy, Heidelberg 2 nd Heidelberg Astronomy Summer School September 2007.
1 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Machine learning, pattern recognition and statistical data modelling.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
LIGO-G Z Results from LIGO Observations Stephen Fairhurst University of Wisconsin - Milwaukee on behalf of the LIGO Scientific Collaboration.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Principal Components Analysis
USM Photometric Redshifts for Astro-wise
Data Transformation: Normalization
Figure 1. Left – a small region of a typical polarized spectrum acquired with the ESPaDOnS instrument during the MiMeS project. This figure illustrates.
Artificial Neural Networks
Bringing order to chaos
Data Science Algorithms: The Basic Methods
CASE-FOMBS Follow-up of One Million Bright Stars
Transfer Learning in Astronomy: A New Machine Learning Paradigm
Optimum Passive Beamforming in Relation to Active-Passive Data Fusion
Machine learning, pattern recognition and statistical data modelling
Population synthesis models and the VO
Gaia impact on asteroidal occultations
Basics of Photometry.
EE513 Audio Signals and Systems
Gaia Tomaž Zwitter Gaia: > 1.1 billion objects (V ≤ 20.9),
Data Transformations targeted at minimizing experimental variance
An Adaptive Nearest Neighbor Classification Algorithm for Data Streams
Announcements Project 2 artifacts Project 3 due Thursday night
Multivariate Methods Berlin Chen
Machine learning overview
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
Multivariate Methods Berlin Chen, 2005 References:
Topological Signatures For Fast Mobility Analysis
The University of Tokyo Norio Narita
Jiannan Zhang, Yihan Song, Ali Luo NAOC, CHINA
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

Classification of GAIA data Coryn A.L. Bailer-Jones Max-Planck-Institut für Astronomie, Heidelberg calj@mpia.de Overview GAIA classification objectives and available data Approaches to classification: principles and problems Example classification using RVS-like data Some specific issues Summary

GAIA classification objectives discrete classification of objects as star, galaxy, quasar, solar system object, supernovae etc. determination of astrophysical parameters (APs) for stars Teff, logg, [Fe/H], [/Fe], CNO, A(), Vrot, Vrad, activity combination with parallax to determine stellar: luminosity, radius, (mass, age) identification of unresolved binaries (and parametrization of components where possible) efficient identification of new types of objects Goal: catalogue of object classifications and astrophysical parameters

GAIA data BBP: 4+ broad band filters all objects MBP: 10-20 medium band filters all objects  object classification; stellar Teff, logg, [Fe/H], A() RVS: 849-874 nm spectrum, ~ 0.04 nm/pixel G<17  stellar Vrad, Vrot, specific element abundances Astrometry  parallax, kinematics, unresolved binaries Time domain  ~50 epochs over 5 years (photometric variability)  Inhomogeneous data “Redshift”problem: to get RV, need correct SpT template, but to determine SpT (may) need to know  shift  use MBP data to give SpT and iterate Generally: use MBP data to give initial classification of RVS data

Classification principles “Supervised” approach: use pre-classified data (templates) to infer the desired mapping apply mapping to any new data to give APs or classes But, the desired mapping is generally degenerate...

Minimum Distance Methods (MDMs) Search for nearest neighbours (templates) in data space Assign parameters according to these Generally interpolate: either in data space:  = f(d; w) or in parameter space: D = g(; w) Need to scale data dimensions e.g. k-nn, 2 min, cross-correlation a local classification method  astrophysical parameter(s) d1,d2 data D distance to a template

Classification principles selecting just local neighbours in data space can lead to systematic errors or missed solutions need to find global (forward) mapping and identify degenerate regions more complex in higher dimensional spaces (data or parameters) severity of degeneracy depends upon the density of template grid and noise in the data

Artificial Neural Networks (ANNs) As with MDM, degeneracy is a problem Functional mapping: astrophysical parameters = f(data; weights) Weights determined by training on pre-classified data (templates)  least squares minimization of total classification error (numerical methods)  global interpolation of data

Classification example with high-res spectra Database of 611 real stellar spectra from Cenarro et al. (2001) variation over Teff, logg, [Fe/H] coverage: 849 - 874 nm (same as GAIA RVS) resolution: 0.15 nm @ 0.075 nm/pixel (poorer than GAIA?) SNR: median=70; 90% in range 20-140 Randomly split data set into two sets: train a neural network on one set and test its performance on the other.

Distribution over APs in Cenarro et al. data blue = training data (300) red = test data (311)

Results: Teff and logg

Results: [Fe/H]

Requirements of the classification scheme produce both discrete classification and continuous parametrization (e.g. star vs. quasar, APs of stars) recognition of degeneracies in presence of noise (i.e. recognise multiple classifications for given data vector) robustly handle missing and censored data possible RVS lossy compression (as function of magnitude)  handle different amounts/formats of data reliable determination of parametrization uncertainties accommodate ever-improving stellar models all this for a very wide range of type of objects ...

Classification schemes Hierarchical Parallel P = probability; APs = astrophysical parameters

Model training Real spectra and synthetic spectra not identical: systematic differences (modelling uncertainties, e.g. opacities) increased cosmic scatter in real spectra (unaccounted-for APs) 1. Can synthetic spectra be used to reliably parametrize GAIA data? 2. Are performances representative of what can be achieved? 3. Do synthetic spectra give the best optimization of phot/spec systems? 2+3 require accurate synthetic spectra (or large set of real spectra) Can overcome mismatch problem for (1): use real GAIA data of pre-selected targets to apply corrections to synthetic SEDs APs of these targets determined from higher resolution spectra from ground-based spectra

Summary classification with GAIA data is a challenging problem methods used so far in (astronomical) classification literature are suboptimal for this purpose  further development of methods is a high priority particular problems to overcome are: - degeneracy (especially with MBP data and compressed RVS data) - inhomogeneous data development of classification methods is very dependent on appropriate data (real or synthetic) - both of targets of interest - and of “contaminating” objects

ICAP: the GAIA classification working group WG responsible for addressing classification issues for GAIA 14 core members; 17 associate members GAIA Classification meeting 2-3 December Heidelberg, Germany Anyone interested in classification issues broadly related to GAIA is welcome to attend http://www.mpia.de/GAIA/