An Automated Classification Algorithm for Multi-wavelength Data Yanxia Zhang, Ali Luo,Yongheng Zhao National Astronomical Observatories, China , Lijiang
Astronomy facing “ data avalanche ” IRAS 25 2MASS 2 DSS Optical IRAS 100 WENSS 92cm NVSS 20cm GB 6cm ROSAT ~keV
Necessity Necessity Is the Mother of Invention Virtual Observatories Data avalanche DM & KDD
–DM—core of KDD Data cleaning Data federation Database Data warehouse task selection DM Pattern evaluation Data Mining & KDD
One Task of DM:Classification Training set Test set Validated set classifier Classification method New data Cross identification all features Feature selectionSelected features predict The scheme of classification of multiwavelength data
Data sample Near infrared optical X ray 2MASS USNO A2.0 ROSAT J,H,K B,R CR,HR1,HR2, ext,extl
Known sample star Normal galaxies AGNs SIMBAD RC3 Veron(2000)
Feature Selection Parameters:B+2.5lgCR,J+2.5lgCR,B-R,J-H,H-K,lgCR, HR1,HR2,ext,extl Methods: ReliefF B+2.5lgCRJ+2.5lgCRB-RHR2H-K extJ-HlgCRHR1extl Result of feature selection:
Classification Method: Naïve Bayes classifier With the full set of features With weighted features With the subset of features 97.0%97.6%97.9% Classification results for three situations
Summary 1. By feature selection, we can deal with high dimensional data, and select important attributes, thus improve the efficiency and effect of classification. 2. The Naïve Bayes algorithm is an robust method to classify multiwavelength data with high accuracy of classification. It is not only used for multiwavelength data, but also for other data, such as photometric data, spectra data, image data or the combined data of these types of data. 3. With the classifier, it is helpful to preselect source candidates for large surveys and classify the new data. 4. The methods will be part of VO toolkits.
Thanks a lot !!!