Principal Components Analysis

Slides:



Advertisements
Similar presentations
EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Advertisements

Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Face Recognition and Biometric Systems Eigenfaces (2)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
PCA Tomography and its application to nearby galactic nuclei João Steiner IAG - Universidade de São Paulo +R. B. Menezes, T. V. Ricci +F. Ferrari (UNIPAMPA)
Machine Learning Lecture 8 Data Processing and Representation
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
As applied to face recognition.  Detection vs. Recognition.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
An introduction to Principal Component Analysis (PCA)
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
Principal Component Analysis
Unsupervised Learning - PCA The neural approach->PCA; SVD; kernel PCA Hertz chapter 8 Presentation based on Touretzky + various additions.
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
CONTENT BASED FACE RECOGNITION Ankur Jain 01D05007 Pranshu Sharma Prashant Baronia 01D05005 Swapnil Zarekar 01D05001 Under the guidance of Prof.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 9(b) Principal Components Analysis Martin Russell.
Implementing a reliable neuro-classifier
Face Recognition Using Eigenfaces
3D Geometry for Computer Graphics
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Biomedical Image Analysis and Machine Learning BMI 731 Winter 2005 Kun Huang Department of Biomedical Informatics Ohio State University.
Face Recognition Using EigenFaces Presentation by: Zia Ahmed Shaikh (P/IT/2K15/07) Authors: Matthew A. Turk and Alex P. Pentland Vision and Modeling Group,
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Summarized by Soo-Jin Kim
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Presented By Wanchen Lu 2/25/2013
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Gap-filling and Fault-detection for the life under your feet dataset.
CSE 185 Introduction to Computer Vision Face Recognition.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Unsupervised Learning II Feature Extraction
4.0 - Data Mining Sébastien Lemieux Elitra Canada Ltd.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Principal Component Analysis (PCA)
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Automated Classification of Galaxy Images
Astrostatistics Antonios Karampelas, PhD
Principal Component Analysis
Background on Classification
University of Ioannina
Principal Components Analysis
Principal Component Analysis (PCA)
Machine Learning Dimensionality Reduction
What is Pattern Recognition?
Principal Component Analysis
Classification of GAIA data
A principled way to principal components analysis
Principal Component Analysis (PCA)
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
Descriptive Statistics vs. Factor Analysis
Hairong Qi, Gonzalez Family Professor
X.1 Principal component analysis
Dimensionality Reduction
CS4670: Intro to Computer Vision
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Principal Component Analysis
CAMCOS Report Day December 9th, 2015 San Jose State University
Presentation transcript:

Principal Components Analysis Τεχνικές Παρατήρησης και Επεξεργασίας Δεδομένων στην Αστροφυσική Τομέας Αστροφυσικής, Αστρονομίας & Μηχανικής Principal Components Analysis Antonios Karampelas, PhD

Astrostatistics Big Data Data Mining Machine Learning Glossary A discipline used to process the vast amount of astronomical data. Big Data Large or complex data sets difficult to process using traditional data processing applications. Data Mining The computational process of discovering patterns in large data sets. Machine Learning A scientific discipline that explores the construction and study of algorithms that can learn from data.

Big Data: An alternative for Astrophysicists? McKinsey Global Institute Report1 Big data: The next frontier for innovation, competition, and productivity Harvard Business Review2 Data Scientist: The Sexiest Job of the 21st Century Fortune3 Big Data could generate millions of new jobs 1http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation 1https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ 1http://fortune.com/2013/05/21/big-data-could-generate-millions-of-new-jobs/

Data Mining/Machine Learning categories 1. Descriptive Data Mining/ Unsupervised Machine Learning 2. Predictive Data Mining/ Supervised Machine learning

Data Mining/Machine Learning categories 1. Descriptive Data Mining/ Unsupervised Machine Learning Discover patterns, trends, clusters, outliers Principal Components Analysis (PCA) Self-Organizing Map (SOM) K-means Clustering 2. Predictive Data Mining/ Supervised Machine learning Parameterize, Classify, Recognize Patterns Decision Trees (DT) Support Vector Machine (SVM) Artificial Neural Network (ANN)

Artwork by Sandbox Studio, Chicago with Kimberly Boustead Astrostatistics Artwork by Sandbox Studio, Chicago with Kimberly Boustead

Principal Components Analysis (PCA) (Karhounen-Loeve transformation) Linear orthogonal transformation in a new base, in which the data variance is highlighted. New axes = Principal Components (PCs) Data = linear combination of PCs

Fingerprint Recognition Very effective in: Data compression, Dimensionality reduction, Noise extraction Applications Astronomy Biology Graphology Face Recognition Fingerprint Recognition

Indicative Astronomy Articles SDSS Yip et al. 2004, AJ Gaia Karampelas et al. 2012, A&A Spitzer Wang et al. 2011, MNRAS 2dF Folkes et al. 1999, MNRAS

PCA procedure Standardize the original data, if necessary. 2. Construct the variance-covariance matrix or the correlation matrix. 3. Determine the eigenvalues (λi) and eigenvectors (PCi) of the matrix. Data covariance has been eliminated. Eigenvalues λ represent the variances of the transformed data. PC1 corresponds to the biggest λ (λ1) and summarizes the majority of the data variance. PC2 corresponds to λ2 and summarizes the majority of the rest of the data variance etc. 4. Determine the admixture coefficients αi (data projection on the new axes). Full data reconstruction: Data = α1PC1 + α2PC2 + … + αkPCk Partial reconstruction is usually sufficient: Data ≈ α1PC1 + α2PC2 + … + α5PC5

No widespread Information PCA procedure Full reconstruction Data = α1PC1 + α2PC2 + α3PC3 + … + αk-1PC(k-1) + αkPCk Widespread information Noise No widespread Information Partial reconstruction Data ≈ α1PC1 + α2PC2 + α3PC3

PCA implementation (Data) Data set Synthetic galaxy spectra1 used for the Gaia Mission. Size 7160 spectra X 801 flux values Waveband 300 – 1,100 nm Redshift No Spectral types 4 (E, S, I, QSFG) 1 Karampelas et al. 2012, Fioc & Rocca-Volmerange 1997, 1999, Le Borgne & Rocca-Volmerange 2002

PCA implementation (IDL) 7160 spectra X 801 pixels (admixture coefficients) 7160 spectra X 801 pixels (original data) result=PCOMP(data, COEFFICIENTS = coefficients, EIGENVALUES = eigenvalues, VARIANCES = variances, /COVARIANCE) m=801 & eigenvectors = coefficients/REBIN(eigenvalues, m, m) 801 spectra X 801 pixels (PCs)

PCA implementation (PC1, PC2, PC3) OIII SIII OII Ha Resembles the average spectrum Strong emission lines Dominant emission lines

PCA implementation (PC4, PC5, PC6) Strong emission lines Dominant emission lines

PCA implementation (Reconstruction) Spectrum = α1PC1 + α2PC2 + α3PC3 + α4PC4 + … = = α1 + α2 + α3 + … + α4

Admixture coefficients α2 Admixture coefficients α1 PCA implementation (Projection to PC1/PC2) Irregular Admixture coefficients α2 Spiral QSFG Early-type Admixture coefficients α1

PCA & Astrostatistics - 1 Yip et al. 2004, AJ ≈ 170, 000 SDSS galaxy spectra PC1 PC2 Outliers Red galaxies PC3 PC4 Blue galaxies Post starburst galaxies

PCA & Astrostatistics - 2 Folkes et al. 1999, MNRAS ≈ 6, 000 2dF galaxy spectra

Bailer-Jones et al. 1998, MNRAS PCA & Astrostatistics - 3 Bailer-Jones et al. 1998, MNRAS 5, 000 Michigan Spectral Survey stellar spectra

PCA & Astrostatistics - 4 Karampelas et al. 2012, A&A ≈ 30, 000 PEGASE synthetic galaxy spectra

Karampelas et al. in preparation PCA & Astrostatistics - 5 Karampelas et al. in preparation ≈ 7, 000 PEGASE synthetic galaxy spectra Classification with PCA + Decision Trees

PCA & Face Recognition Eigenfaces http://graphics.cs.cmu.edu/courses/15-463/2004_fall/www/handins/brh/final/