CD-HPF:New Habitability Score Via Data Analytic Modeling

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Random Forest Predrag Radenković 3237/10
Naïve-Bayes Classifiers Business Intelligence for Managers.
Data Mining Classification: Alternative Techniques
Salvatore giorgi Ece 8110 machine learning 5/12/2014
Support Vector Machines
Indian Statistical Institute Kolkata
Chapter 1 Computing Tools Data Representation, Accuracy and Precision Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Evaluating Hypotheses
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Chapter 4 (part 2): Non-Parametric Classification
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Bayesian Networks. Male brain wiring Female brain wiring.
Chapter 1 Computing Tools Analytic and Algorithmic Solutions Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
An Introduction to Programming and Algorithms. Course Objectives A basic understanding of engineering problem solving process. A basic understanding of.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Non-Bayes classifiers. Linear discriminants, neural networks.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Feature extraction using fuzzy complete linear discriminant analysis The reporter : Cui Yan
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
SUPPORT VECTOR MACHINES
Machine Learning: Ensemble Methods
Spectral Methods for Dimensionality
CS223: Software Engineering
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
A Signal Processing Approach to Vibration Control and Analysis with Applications in Financial Modeling By Danny Kovach.
103rd National Science Congress, 2016
Classification of unlabeled data:
Basic machine learning background with Python scikit-learn
Clustering (3) Center-based algorithms Fuzzy k-means
Introduction Feature Extraction Discussions Conclusions Results
Learning.
Learning with information of features
Hidden Markov Models Part 2: Algorithms
Ying shen Sse, tongji university Sep. 2016
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
COSC 4335: Other Classification Techniques
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Nearest Neighbors CSC 576: Data Mining.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Parametric Methods Berlin Chen, 2005 References:
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Nonparametric density estimation and classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Linear Discrimination
Machine Learning – a Probabilistic Perspective
Hairong Qi, Gonzalez Family Professor
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

CD-HPF:New Habitability Score Via Data Analytic Modeling Snehanshu Saha Professor., Computer Science Dept.,PESIT-BSC, Bangalore.

INTRODUCTION At present, only known habitable planet is Earth. 2 important questions to be answered – Can we have life’s existence exactly similar to Earth , somewhere else? Or Do we have life in unknown form existing somewhere else?

Introduction.. Astronomers uses mainly 2 important parameters to answer these 2 questions- Earth’s Similarity Index (ESI)- based on 4 parameters radius, surface temperature, density and escape velocity and is given as-

Introduction.. Planetary Habitability Index(PHI) which is basically to estimate 2nd question and is given as- Where S is Substrate, E is Energy, C is Chemistry of compounds and L is liquid solvent.

Introduction.. We are proposing an approach that makes use of Cobb Douglas Production Function to obtain the new habitability score for exoplanets.

Cobb Douglas Production Function (CDPF) In general, CDPF is given as- Where Y is the production function and α and β are elasticity constants which are some positive fractions , k is a positive constant and x1 & x2 are input parameters. NOTE: Function can be extended for any number of inputs.

Elasticity Constants in CDPF Sum of α & β helps us in deducing the important results about the function , If their sum is 1, function is homogeneous of degree 1, and is called as Constant returns to Scale(CRS), increase in 1 input gives increase in output in same proportion. If their sum is less than 1, it is decreasing returns to scale (DRS), where diminishing returns will set in.

Elasticity Constants in CDPF.. If the sum is more than 1, it is increasing returns to scale(IRS), here output increases with variable factors.

Use of CDPF to estimate CDHS We have formulated CDPF to estimate Cobb Douglas Habitability Score(CDHS) for exoplanets. First, we have calculated interior and surface CDHS for each exoplanet and then used a convex combination of the two to compute final CDHS. While doing this we found and proved that function is maximized for CRS & DRS case.

Use of CDPF to estimate CDHS We specify CDPF as- where above parameters are radius, density , surface temperature & escape velocity with their elasticity constraints.

Use of CDPF to estimate CDHS  

Use of CDPF to estimate CDHS Set criteria is to choose α,β,γ,δ so as to maximize Y. For CRS, where all the elasticities of different cost components are equal,Y can be specified as- and In such cases, Y is geometric mean of all inputs.

Data set For this work , we used HABCAT database, available at http://phl.upr.edu/projects/habitable-exoplanets-catalog From this database we selected 664 confirmed exoplanets for which surface temperature was known.

Our Results CDPF was applied to get CDHS based on radius & density , we called it as CDHSi and another CDHS based on escape velocity and surface temperature and we called it as CDHSs. We got data in the range from 0.8607 to 168.35 for CDHSi and from 1.01521 to 19.9395 for CDHSs.

Graphs for CDHSi & CDHSs CDHSi Here α & β values are 0.8 & 0.1 respectively.

Graphs for CDHSi & CDHSs CDHSs Here α & β values are 0.8 & 0.1 respectively.

CDHS Calculation Next we calculated final CDHS as- where w’ was considered as 0.99 & w” as 0.01, because w’+w” must be equal to 1. Values obtained for CDHS were in the range from 0.87225 to 166.87.

Classification based on CDHS After calculating CDHS , we applied KNN classification algorithm to check the number of planets belonging to Earth’s Class. According to CDHS result , 5 classes are considered with k =7, where Earth’s class was class 4th. These classes have following ranges according to classification done-

KNN Algorithm Consider k as the desired number of nearest neighbors and S:=p1,...,pn be the set of training samples in the form pi=(xi,ci), where xi is the d-dimensional feature vector (1 in our case) of the point pi and ci is the class that pi belongs to. Let T:=p1’,p2’,…..pr’ be the testing samples. for each p’=(x’,c’) in T { compute the distance d(x’,xi) between p’ & all pi belonging to S; Sort pi according to d(x’,xi); Select k closest training samples to p’ from the sorted list; Assign a class to p’ based on majority vote ; }

Class Ranges According to CDHS Class1 - From 1.98 to 166.87 Class2 - Less than 1.98 to greater than 1.680 Class3 - From 1.68 to greater than 1.443 Class4 - From 1.443 to greater than 1.23 Class5 - From 1.23 to greater than equal to 0.87225.

Result of Classification 80% of data is used as training and 20% data for testing. In Class5, we obtained 13 exoplanets. Result plot of KNN Accuracy obtained is 92.5%.

High Level Dataflow Planets highly likely to be habitable are- GJ163c Employ KNN (preliminary classification) No. of classes Load the dataset Cobb Douglas Engine Attribute Filtering Computed CDHPF Reorganized Classes Final 5 classes Class 5 contains Earth & other habitable planets Planets highly likely to be habitable are- GJ163c GJ667C c GJ832c HD40307g Kepler-186f Kepler-62e Kepler-62f etc.

Acknowledgement I sincerely thank Ms. Kakoli Bora and Ms. Surbhi Agrawal, my Ph.D. students for sincere efforts. I acknowledge Dr. Margarita Safanova , Indian Institute of Astrophysics , who has been an excellent collaborator in this project.

THANK YOU