Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Automated Classification Algorithm for Multi-wavelength Data Yanxia Zhang, Ali Luo,Yongheng Zhao National Astronomical Observatories, China 2005.8.16,

Similar presentations


Presentation on theme: "An Automated Classification Algorithm for Multi-wavelength Data Yanxia Zhang, Ali Luo,Yongheng Zhao National Astronomical Observatories, China 2005.8.16,"— Presentation transcript:

1 An Automated Classification Algorithm for Multi-wavelength Data Yanxia Zhang, Ali Luo,Yongheng Zhao National Astronomical Observatories, China 2005.8.16, Lijiang

2 Astronomy facing “ data avalanche ” IRAS 25  2MASS 2  DSS Optical IRAS 100  WENSS 92cm NVSS 20cm GB 6cm ROSAT ~keV

3 Necessity Necessity Is the Mother of Invention Virtual Observatories Data avalanche DM & KDD

4 –DM—core of KDD Data cleaning Data federation Database Data warehouse task selection DM Pattern evaluation Data Mining & KDD

5 One Task of DM:Classification Training set Test set Validated set classifier Classification method New data Cross identification all features Feature selectionSelected features predict The scheme of classification of multiwavelength data

6 Data sample Near infrared optical X ray 2MASS USNO A2.0 ROSAT J,H,K B,R CR,HR1,HR2, ext,extl

7 Known sample star Normal galaxies AGNs SIMBAD RC3 Veron(2000)

8 Feature Selection Parameters:B+2.5lgCR,J+2.5lgCR,B-R,J-H,H-K,lgCR, HR1,HR2,ext,extl Methods: ReliefF B+2.5lgCRJ+2.5lgCRB-RHR2H-K 0.042070.038380.032300.013320.01011 extJ-HlgCRHR1extl 0.007160.003170.002130.001850.00096 Result of feature selection:

9 Classification Method: Naïve Bayes classifier With the full set of features With weighted features With the subset of features 97.0%97.6%97.9% Classification results for three situations

10 Summary 1. By feature selection, we can deal with high dimensional data, and select important attributes, thus improve the efficiency and effect of classification. 2. The Naïve Bayes algorithm is an robust method to classify multiwavelength data with high accuracy of classification. It is not only used for multiwavelength data, but also for other data, such as photometric data, spectra data, image data or the combined data of these types of data. 3. With the classifier, it is helpful to preselect source candidates for large surveys and classify the new data. 4. The methods will be part of VO toolkits.

11 Thanks a lot !!!


Download ppt "An Automated Classification Algorithm for Multi-wavelength Data Yanxia Zhang, Ali Luo,Yongheng Zhao National Astronomical Observatories, China 2005.8.16,"

Similar presentations


Ads by Google