Modeling the Human Classification of Galaxy Morphology Wednesday, December 5, 2007 Mike Specian.

Slides:



Advertisements
Similar presentations
Global Hands-On Universe meeting July 15, 2007 Authentic Data in the Classroom with the Sloan Digital Sky Survey Jordan Raddick (Johns Hopkins University)
Advertisements

The zooniverse.org real science online. The Zooniverse is a collection of websites where members of the public are asked to look at data and interpret.
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Assuming normally distributed data! Naïve Bayes Classifier.
Star-Formation in Close Pairs Selected from the Sloan Digital Sky Survey Overview The effect of galaxy interactions on star formation has been investigated.
Galaxy Distributions Analysis of Large-scale Structure Using Visualization and Percolation Technique on the SDSS Early Data Release Database Yuk-Yan Lam.
Calibration of the SDSS Spectroscopic Line Width Scaling Relations Calibration of the SDSS Spectroscopic Line Width Scaling Relations Barbara Catinella.
Ensemble Learning: An Introduction
Redshift Evolution Of The Morphology Density Relation
First Results from an HST/ACS Snapshot Survey of Intermediate Redshift, Intermediate X-ray Luminosity Clusters of Galaxies: Early Type Galaxies and Weak.
“ Testing the predictive power of semi-analytic models using the Sloan Digital Sky Survey” Juan Esteban González Birmingham, 24/06/08 Collaborators: Cedric.
Bayesian Learning Rong Jin.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Chapter 7 Probability and Samples: The Distribution of Sample Means
GIANT TO DWARF RATIO OF RED-SEQUENCE GALAXY CLUSTERS Abhishesh N Adhikari Mentor-Jim Annis Fermilab IPM / SDSS August 8, 2007.
High Redshift Quasar Discoveries Scientific knowledge of the Universe’s genesis was advanced with the Sloan Digital Sky Survey’s discovery of three, new.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
 Take a sheet of paper and answer the following questions.  What are two kinds of visible light telescopes scientists use to gather information from.
1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Three data analysis problems Andreas Zezas University of Crete CfA.
Classifying Galaxies A.N. Other and N. O’Body All Saints School, Upper Nowhere, UK.
GALAXIES, GALAXIES, GALAXIES! A dime a dozen… just one of a 100,000,000,000! 1.Galaxy Classification Ellipticals Dwarf Ellipticals Spirals Barred Spirals.
The Evolution of Quasars and Massive Black Holes “Quasar Hosts and the Black Hole-Spheroid Connection”: Dunlop 2004 “The Evolution of Quasars”: Osmer 2004.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CLASSIFICATION: Ensemble Methods
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Read pp Fill in your Cornell notes about galaxies!
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
Course Review FORE 3218 Course Review  Sampling  Inventories  Growth and yield.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Redshift Evolution Of The Morphology Density Relation Peter Capak B. Mobasher, R. Abraham, R. Ellis, K. Sheth, N. Scoville Postdoctoral Fellow California.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Kevin Cooke.  Galaxy Characteristics and Importance  Sloan Digital Sky Survey: What is it?  IRAF: Uses and advantages/disadvantages ◦ Fits files? 
J. Jasche, Bayesian LSS Inference Jens Jasche La Thuile, 11 March 2012 Bayesian Large Scale Structure inference.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
The Conspiracy That the dark matter conspire to just make the rotation curves nearly flat Bottom line: M/L 40 M O /L O from these “flat rotation curves”..
Classification Ensemble Methods 1
MORPHOLOGICAL ANALYSIS OF SDSS DISC GALAXIES József Varga 1 Supervisor: István Csabai 1 1 Department of Physics of Complex Systems Eötvös University Budapest.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Sampling Theory and Some Important Sampling Distributions.
The dependence on redshift of quasar black hole masses from the SLOAN survey R. Decarli Università dell’Insubria, Como, Italy A. Treves Università dell’Insubria,
Copyright  2004 limsoon wong Using WEKA for Classification (without feature selection)
On Using SIFT Descriptors for Image Parameter Evaluation Authors: Patrick M. McInerney 1, Juan M. Banda 1, and Rafal A. Angryk 2 1 Montana State University,
GSC-II Classifications Oct 2000 Annual Meeting V. Laidler G. Hawkins, R. White, R. Smart, A. Rosenberg, A. Spagna.
Budapest Group Eötvös University MAGPOP kick-off meeting Cassis 2005 January
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Review Law of averages, expected value and standard error, normal approximation, surveys and sampling.
Bias Management in Time Changing Data Streams We assume data is generated randomly according to a stationary distribution. Data comes in the form of streams.
Automated Classification of Galaxy Images
Machine Learning with Spark MLlib
Chapter 3: Maximum-Likelihood Parameter Estimation
2. Skin - color filtering.
Photometric redshift estimation.
CH 5: Multivariate Methods
Prepared by: Mahmoud Rafeek Al-Farra
Sampling Distribution
Sampling Distribution
Somi Jacob and Christian Bach
… 1 2 n A B V W C X 1 2 … n A … V … W … C … A X feature 1 feature 2
Classification of Morphology of Interacting Galaxy Systems
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Modeling the Human Classification of Galaxy Morphology Wednesday, December 5, 2007 Mike Specian

Galaxy Zoo Statistics Site announced on July 15, 2007 Over 50,000 volunteers within first week Most galaxies classified 10 times or more More classifications = better data Probably world’s most robust morphology database with millions of objects classified

Data Preprocessing

1, 11 = Elliptical 2, 12 = Clockwise Spiral 3, 13 = Counterclockwise Spiral 4, 14 = Other (Edge-On Spiral) 5, 15 = Star / Don’t-Know 6, 16 = Galaxy Merger

How People Voted TypeNumber Classified Elliptical666,679 Spiral94,429 Other (Edge-On)112,148 Star / Don’t Know23,735 Galaxy Merger11,846 There’s almost too much data! Limiting the sample: 1.Model on 10,000 objects 2.Distinguish only between ‘Elliptical’ and ‘Spiral’ 3.Accept objects that received >= 60% of the total vote

Two Data Sets Set 1 Only contains information that human eyes could use to distinguish morphology. (30 attributes) Examples: Petrosian flux, Petrosian radius, radius containing 50% and 90% of Petrosian flux, Adaptive Shape Measures, DeVaucouleurs fits, Exponential fits Set 2 Contains additional information likely correlated to morphology, but for which human eyes on Galaxy Zoo do not have access. (71 attributes) Examples: Light polarization (Stokes parameters), DeVaucouleurs magnitude fits, dereddened magnitudes, redshift For Set 1 all categories are measured in the telescope’s three visible color filters. For Set 2, all, save redshift, are measured with all 5 filters. Feature data pulled from Sloan Digital Sky Survey Data Release 6

How many trees in an ideal random forest? Accuracies above trained on 2179 instances, ~50/50 spiral/elliptical, 66% holdout

Probing Learning Rate and Momentum in ANN Momentum Learning Rate Accuracies above trained on 2179 instances, ~50/50 spiral/elliptical, 66% holdout To 3 Sigma ->

Quantifying Estimator Error Number of FoldsAccuracy Example taken from Random Forest, Data Set 2, 15 Trees Average = 95.6 Standard Deviation = All errors taken to 3 sigma. Error = 95.6  0.5

Conclusions Naïve Bayes is not the way to go. Random Forests, ANN, and SVM all have small variances, high accuracies Spirals harder to identify (need more training instances, or has human bias taken over?) Including information beyond what the human eye can see is, remarkably, helpful.