Download presentation
Presentation is loading. Please wait.
1
Principal Components Analysis
Τεχνικές Παρατήρησης και Επεξεργασίας Δεδομένων στην Αστροφυσική Τομέας Αστροφυσικής, Αστρονομίας & Μηχανικής Principal Components Analysis Antonios Karampelas, PhD
2
Astrostatistics Big Data Data Mining Machine Learning Glossary
A discipline used to process the vast amount of astronomical data. Big Data Large or complex data sets difficult to process using traditional data processing applications. Data Mining The computational process of discovering patterns in large data sets. Machine Learning A scientific discipline that explores the construction and study of algorithms that can learn from data.
3
Big Data: An alternative for Astrophysicists?
McKinsey Global Institute Report1 Big data: The next frontier for innovation, competition, and productivity Harvard Business Review2 Data Scientist: The Sexiest Job of the 21st Century Fortune3 Big Data could generate millions of new jobs 1http:// 1https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ 1http://fortune.com/2013/05/21/big-data-could-generate-millions-of-new-jobs/
4
Data Mining/Machine Learning categories
1. Descriptive Data Mining/ Unsupervised Machine Learning 2. Predictive Data Mining/ Supervised Machine learning
5
Data Mining/Machine Learning categories
1. Descriptive Data Mining/ Unsupervised Machine Learning Discover patterns, trends, clusters, outliers Principal Components Analysis (PCA) Self-Organizing Map (SOM) K-means Clustering 2. Predictive Data Mining/ Supervised Machine learning Parameterize, Classify, Recognize Patterns Decision Trees (DT) Support Vector Machine (SVM) Artificial Neural Network (ANN)
6
Artwork by Sandbox Studio, Chicago with Kimberly Boustead
Astrostatistics Artwork by Sandbox Studio, Chicago with Kimberly Boustead
7
Principal Components Analysis (PCA)
(Karhounen-Loeve transformation) Linear orthogonal transformation in a new base, in which the data variance is highlighted. New axes = Principal Components (PCs) Data = linear combination of PCs
8
Fingerprint Recognition
Very effective in: Data compression, Dimensionality reduction, Noise extraction Applications Astronomy Biology Graphology Face Recognition Fingerprint Recognition
9
Indicative Astronomy Articles
SDSS Yip et al. 2004, AJ Gaia Karampelas et al. 2012, A&A Spitzer Wang et al. 2011, MNRAS 2dF Folkes et al. 1999, MNRAS
10
PCA procedure Standardize the original data, if necessary.
2. Construct the variance-covariance matrix or the correlation matrix. 3. Determine the eigenvalues (λi) and eigenvectors (PCi) of the matrix. Data covariance has been eliminated. Eigenvalues λ represent the variances of the transformed data. PC1 corresponds to the biggest λ (λ1) and summarizes the majority of the data variance. PC2 corresponds to λ2 and summarizes the majority of the rest of the data variance etc. 4. Determine the admixture coefficients αi (data projection on the new axes). Full data reconstruction: Data = α1PC1 + α2PC2 + … + αkPCk Partial reconstruction is usually sufficient: Data ≈ α1PC1 + α2PC2 + … + α5PC5
11
No widespread Information
PCA procedure Full reconstruction Data = α1PC1 + α2PC2 + α3PC3 + … + αk-1PC(k-1) + αkPCk Widespread information Noise No widespread Information Partial reconstruction Data ≈ α1PC1 + α2PC2 + α3PC3
12
PCA implementation (Data)
Data set Synthetic galaxy spectra1 used for the Gaia Mission. Size 7160 spectra X 801 flux values Waveband 300 – 1,100 nm Redshift No Spectral types 4 (E, S, I, QSFG) 1 Karampelas et al. 2012, Fioc & Rocca-Volmerange 1997, 1999, Le Borgne & Rocca-Volmerange 2002
13
PCA implementation (IDL)
7160 spectra X 801 pixels (admixture coefficients) 7160 spectra X 801 pixels (original data) result=PCOMP(data, COEFFICIENTS = coefficients, EIGENVALUES = eigenvalues, VARIANCES = variances, /COVARIANCE) m=801 & eigenvectors = coefficients/REBIN(eigenvalues, m, m) 801 spectra X 801 pixels (PCs)
14
PCA implementation (PC1, PC2, PC3)
OIII SIII OII Ha Resembles the average spectrum Strong emission lines Dominant emission lines
15
PCA implementation (PC4, PC5, PC6)
Strong emission lines Dominant emission lines
16
PCA implementation (Reconstruction)
Spectrum = α1PC1 + α2PC2 + α3PC3 + α4PC4 + … = = α1 + α2 + α3 + … + α4
17
Admixture coefficients α2 Admixture coefficients α1
PCA implementation (Projection to PC1/PC2) Irregular Admixture coefficients α2 Spiral QSFG Early-type Admixture coefficients α1
18
PCA & Astrostatistics - 1
Yip et al. 2004, AJ ≈ 170, 000 SDSS galaxy spectra PC1 PC2 Outliers Red galaxies PC3 PC4 Blue galaxies Post starburst galaxies
19
PCA & Astrostatistics - 2
Folkes et al. 1999, MNRAS ≈ 6, 000 2dF galaxy spectra
20
Bailer-Jones et al. 1998, MNRAS
PCA & Astrostatistics - 3 Bailer-Jones et al. 1998, MNRAS 5, 000 Michigan Spectral Survey stellar spectra
21
PCA & Astrostatistics - 4
Karampelas et al. 2012, A&A ≈ 30, 000 PEGASE synthetic galaxy spectra
22
Karampelas et al. in preparation
PCA & Astrostatistics - 5 Karampelas et al. in preparation ≈ 7, 000 PEGASE synthetic galaxy spectra Classification with PCA + Decision Trees
23
PCA & Face Recognition Eigenfaces
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.