Astrostatistics Antonios Karampelas, PhD

Slides:



Advertisements
Similar presentations
CHAPTER 13: Alpaydin: Kernel Machines
Advertisements

1 Image Classification MSc Image Processing Assignment March 2003.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Principal Component Analysis
Un Supervised Learning & Self Organizing Maps Learning From Examples
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
Chapter 1 Introduction to Data Mining
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 14/15 – TP19 Neural Networks & SVMs Miguel Tavares.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Machine Learning Documentation Initiative Workshop on the Modernisation of Statistical Production Topic iii) Innovation in technology and methods driving.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Chong Ho Yu.  Data mining (DM) is a cluster of techniques, including decision trees, artificial neural networks, and clustering, which has been employed.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
COSC 4426 AJ Boulay Julia Johnson Artificial Neural Networks: Introduction to Soft Computing (Textbook)
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
4.0 - Data Mining Sébastien Lemieux Elitra Canada Ltd.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Principal Components Analysis
Neural networks and support vector machines
Big data classification using neural network
CS 9633 Machine Learning Support Vector Machines
Data Transformation: Normalization
Machine Learning for Data Certification at CMS
Artificial Neural Networks
Machine Learning overview Chapter 18, 21
Deep Learning Amin Sobhani.
Background on Classification
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Image Recognition. Contents: Motivation Objective Definition Introduction Preprocessing / Edge Detection Neural Networks in Image Recognition Practical.
Basic machine learning background with Python scikit-learn
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Self organizing networks
CAMCOS Report Day December 9th, 2015 San Jose State University
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Blind Signal Separation using Principal Components Analysis
Data Warehousing and Data Mining
Neuro-Computing Lecture 4 Radial Basis Function Network
COSC 4335: Other Classification Techniques
Creating Data Representations
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Artificial Intelligence Lecture No. 28
Other Classification Models: Support Vector Machine (SVM)
Machine Learning with Clinical Data
CAMCOS Report Day December 9th, 2015 San Jose State University
Support Vector Machines 2
What is Artificial Intelligence?
Presentation transcript:

Astrostatistics Antonios Karampelas, PhD Τεχνικές Παρατήρησης και Επεξεργασίας Δεδομένων στην Αστροφυσική Τομέας Αστροφυσικής, Αστρονομίας & Μηχανικής Astrostatistics Antonios Karampelas, PhD

Astroinformatics Astrostatistics Big Data Data Mining Machine Learning Glossary Astroinformatics The combination of Astronomy and Information/Communications technologies. Astrostatistics A discipline used to process the vast amount of astronomical data. Big Data Large or complex data sets difficult to process using traditional data processing applications. Data Mining The computational process of discovering patterns in large data sets. Machine Learning A scientific discipline that explores the construction and study of algorithms that can learn from data.

Big Data: An alternative for Astrophysicists? McKinsey Global Institute Report1 Big data: The next frontier for innovation, competition, and productivity Harvard Business Review2 Data Scientist: The Sexiest Job of the 21st Century Fortune3 Big Data could generate millions of new jobs 1http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation 1https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ 1http://fortune.com/2013/05/21/big-data-could-generate-millions-of-new-jobs/

Data-intensive Astronomy LSST (expected) 100 Petabytes1 SDSS DR12 100 Terabytes2 Gaia (expected) 100 Terabytes3 1http://www.lsst.org/News/enews/teragrid-1004.html 2http://www.sdss.org/press/the-sloan-digital-sky-survey-opens-a-new-public-view-of-the-sky/ 3http://blogs.esa.int/gaia/2013/10/31/some-gaia-numbers/

From Data to Meaning (From Sensors to Sense) Kirk et al. 2009, astro2010, P, 6B Data Layer Informatics Layer Data Mining Layer [Note: KDD = knowledge Discovery in Databases]

Artwork by Sandbox Studio, Chicago with Kimberly Boustead Astrostatistics Artwork by Sandbox Studio, Chicago with Kimberly Boustead

Astrostatistics and Astroinformatics Portal (ASAIP) Used for public outreach by: IAA International Astrostatistics Association AAS/WGAA American Astronomical Society/ Working Group in Astroinformatics and Astrostatistics IAU/WGAA International Astronomical Union/ Working Group in Astrostatistics and Astroinformatics LSST/ISSC Large Synoptic Survey Telescope/ Information and Statistical Sciences ASA/IGA American Statistical Association / Interest Group in Astrostatistics

Indicative Data Mining/Machine Learning methods Principal Components Analysis (PCA) Support Vector Machine (SVM) Artificial Neural Network (ANN) Minimum Spanning Trees (MST) Self-Organizing Map (SOM) K-means Clustering Decision Trees and Random Forests Wavelets Bayesian Statistics

Software

Principal Components Analysis (PCA) Linear orthogonal transformation in a new base, in which the data variance is highlighted. New axes = Principal Components (PCs). Data = linear combination of PCs. Very effective in data compression, dimensionality reduction, noise extraction.

No widespread Information Principal Components Analysis (PCA) Full reconstruction Data = α1PC1 + α2PC2 + α3PC3 + … + αk-1PC(k-1) + αkPCk Widespread information Noise No widespread Information

Principal Components Analysis (PCA) Yip et al. 2004, AJ SDSS galaxy spectra PC1 PC2 Outliers Red galaxies PC3 PC4 Blue galaxies Post starburst galaxies

Support Vector Machines (SVM) Supervised learning models for classification and regression. Training data are used to define an optimal hyperplane that separates the members of the two classes. New data are then mapped into that same space and they are classified accordingly. Optimal Hyperplane Maximum Margin Support Vectors

Support Vector Machines (SVM) Usually, classes are not linearly separable… Non-linear clasification Data are mapped into a higher-dimensional space, using a kernel.

Support Vector Machines (SVM) Tsalmantza et al. 2009, A&A PEGASE synthetic galaxy spectra Spectral class prediction (classification) Redshift prediction (regression)

Artificial Neural Networks (ANN) Input layer Hidden Output Adaptive weights Learning algorithms inspired by biological neural networks (in particular the brain). ANN are generally presented as systems of interconnected "neurons". The network is based on training data. The network is then used to classify or parameterize new data.

Artificial Neural Networks (ANN) Snider et al. 2001, ApJ Galactic F- and G-type stars Testing set Training set 1-1 correspondence Highly accurate Teff prediction: σ(Teff) = 135-150K

Voronoi Tessellation It is the partitioning of a plane with n points into n convex polygons. Each polygon contains exactly one point and every point in a given polygon is closer to its generating point than to any other. The higher the density, the smaller are the areas of the polygons. It can be used to analyze structure. Relevant Methods: Minimum Spanning Tree (MST) k-means clustering

Minimum Spanning Tree (MST) It is the unique set of straight lines (edges) connecting a given set of points without closed loops, such that the sum of the edge lengths is a minimum. It can be used to analyze structure. MST separation: All the edges of the MST whose lengths exceed a certain limit are removed. Friends-of-friends: Subtrees with edges smaller than a certain limit are built. edge

Minimum Spanning Tree (MST) Schmeja & Klessen 2006, A&A Taurus stellar cluster Smaller maximum lengths

Minimum Spanning Tree (MST) Schmeja 2011, AN Model cluster clusters background difficult detection