Presentation is loading. Please wait.

Presentation is loading. Please wait.

Astrostatistics Antonios Karampelas, PhD

Similar presentations


Presentation on theme: "Astrostatistics Antonios Karampelas, PhD"— Presentation transcript:

1 Astrostatistics Antonios Karampelas, PhD
Τεχνικές Παρατήρησης και Επεξεργασίας Δεδομένων στην Αστροφυσική Τομέας Αστροφυσικής, Αστρονομίας & Μηχανικής Astrostatistics Antonios Karampelas, PhD

2 Astroinformatics Astrostatistics Big Data Data Mining Machine Learning
Glossary Astroinformatics The combination of Astronomy and Information/Communications technologies. Astrostatistics A discipline used to process the vast amount of astronomical data. Big Data Large or complex data sets difficult to process using traditional data processing applications. Data Mining The computational process of discovering patterns in large data sets. Machine Learning A scientific discipline that explores the construction and study of algorithms that can learn from data.

3 Big Data: An alternative for Astrophysicists?
McKinsey Global Institute Report1 Big data: The next frontier for innovation, competition, and productivity Harvard Business Review2 Data Scientist: The Sexiest Job of the 21st Century Fortune3 Big Data could generate millions of new jobs 1http:// 1https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ 1http://fortune.com/2013/05/21/big-data-could-generate-millions-of-new-jobs/

4 Data-intensive Astronomy
LSST (expected) 100 Petabytes1 SDSS DR Terabytes2 Gaia (expected) 100 Terabytes3 1http:// 2http:// 3http://blogs.esa.int/gaia/2013/10/31/some-gaia-numbers/

5 From Data to Meaning (From Sensors to Sense)
Kirk et al. 2009, astro2010, P, 6B Data Layer Informatics Layer Data Mining Layer [Note: KDD = knowledge Discovery in Databases]

6 Artwork by Sandbox Studio, Chicago with Kimberly Boustead
Astrostatistics Artwork by Sandbox Studio, Chicago with Kimberly Boustead

7 Astrostatistics and Astroinformatics Portal (ASAIP)
Used for public outreach by: IAA International Astrostatistics Association AAS/WGAA American Astronomical Society/ Working Group in Astroinformatics and Astrostatistics IAU/WGAA International Astronomical Union/ Working Group in Astrostatistics and Astroinformatics LSST/ISSC Large Synoptic Survey Telescope/ Information and Statistical Sciences ASA/IGA American Statistical Association / Interest Group in Astrostatistics

8 Indicative Data Mining/Machine Learning methods
Principal Components Analysis (PCA) Support Vector Machine (SVM) Artificial Neural Network (ANN) Minimum Spanning Trees (MST) Self-Organizing Map (SOM) K-means Clustering Decision Trees and Random Forests Wavelets Bayesian Statistics

9 Software

10 Principal Components Analysis (PCA)
Linear orthogonal transformation in a new base, in which the data variance is highlighted. New axes = Principal Components (PCs). Data = linear combination of PCs. Very effective in data compression, dimensionality reduction, noise extraction.

11 No widespread Information
Principal Components Analysis (PCA) Full reconstruction Data = α1PC1 + α2PC2 + α3PC3 + … + αk-1PC(k-1) + αkPCk Widespread information Noise No widespread Information

12 Principal Components Analysis (PCA)
Yip et al. 2004, AJ SDSS galaxy spectra PC1 PC2 Outliers Red galaxies PC3 PC4 Blue galaxies Post starburst galaxies

13 Support Vector Machines (SVM)
Supervised learning models for classification and regression. Training data are used to define an optimal hyperplane that separates the members of the two classes. New data are then mapped into that same space and they are classified accordingly. Optimal Hyperplane Maximum Margin Support Vectors

14 Support Vector Machines (SVM)
Usually, classes are not linearly separable… Non-linear clasification Data are mapped into a higher-dimensional space, using a kernel.

15 Support Vector Machines (SVM)
Tsalmantza et al. 2009, A&A PEGASE synthetic galaxy spectra Spectral class prediction (classification) Redshift prediction (regression)

16 Artificial Neural Networks (ANN)
Input layer Hidden Output Adaptive weights Learning algorithms inspired by biological neural networks (in particular the brain). ANN are generally presented as systems of interconnected "neurons". The network is based on training data. The network is then used to classify or parameterize new data.

17 Artificial Neural Networks (ANN)
Snider et al. 2001, ApJ Galactic F- and G-type stars Testing set Training set 1-1 correspondence Highly accurate Teff prediction: σ(Teff) = K

18 Voronoi Tessellation It is the partitioning of a plane with n points into n convex polygons. Each polygon contains exactly one point and every point in a given polygon is closer to its generating point than to any other. The higher the density, the smaller are the areas of the polygons. It can be used to analyze structure. Relevant Methods: Minimum Spanning Tree (MST) k-means clustering

19 Minimum Spanning Tree (MST)
It is the unique set of straight lines (edges) connecting a given set of points without closed loops, such that the sum of the edge lengths is a minimum. It can be used to analyze structure. MST separation: All the edges of the MST whose lengths exceed a certain limit are removed. Friends-of-friends: Subtrees with edges smaller than a certain limit are built. edge

20 Minimum Spanning Tree (MST)
Schmeja & Klessen 2006, A&A Taurus stellar cluster Smaller maximum lengths

21 Minimum Spanning Tree (MST)
Schmeja 2011, AN Model cluster clusters background difficult detection


Download ppt "Astrostatistics Antonios Karampelas, PhD"

Similar presentations


Ads by Google