Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling the Human Classification of Galaxy Morphology Wednesday, December 5, 2007 Mike Specian.

Similar presentations


Presentation on theme: "Modeling the Human Classification of Galaxy Morphology Wednesday, December 5, 2007 Mike Specian."— Presentation transcript:

1 Modeling the Human Classification of Galaxy Morphology Wednesday, December 5, 2007 Mike Specian

2

3

4 Galaxy Zoo Statistics Site announced on July 15, 2007 Over 50,000 volunteers within first week Most galaxies classified 10 times or more More classifications = better data Probably world’s most robust morphology database with millions of objects classified

5 Data Preprocessing

6 1, 11 = Elliptical 2, 12 = Clockwise Spiral 3, 13 = Counterclockwise Spiral 4, 14 = Other (Edge-On Spiral) 5, 15 = Star / Don’t-Know 6, 16 = Galaxy Merger

7 How People Voted TypeNumber Classified Elliptical666,679 Spiral94,429 Other (Edge-On)112,148 Star / Don’t Know23,735 Galaxy Merger11,846 There’s almost too much data! Limiting the sample: 1.Model on 10,000 objects 2.Distinguish only between ‘Elliptical’ and ‘Spiral’ 3.Accept objects that received >= 60% of the total vote

8 Two Data Sets Set 1 Only contains information that human eyes could use to distinguish morphology. (30 attributes) Examples: Petrosian flux, Petrosian radius, radius containing 50% and 90% of Petrosian flux, Adaptive Shape Measures, DeVaucouleurs fits, Exponential fits Set 2 Contains additional information likely correlated to morphology, but for which human eyes on Galaxy Zoo do not have access. (71 attributes) Examples: Light polarization (Stokes parameters), DeVaucouleurs magnitude fits, dereddened magnitudes, redshift For Set 1 all categories are measured in the telescope’s three visible color filters. For Set 2, all, save redshift, are measured with all 5 filters. Feature data pulled from Sloan Digital Sky Survey Data Release 6

9 How many trees in an ideal random forest? Accuracies above trained on 2179 instances, ~50/50 spiral/elliptical, 66% holdout

10 Probing Learning Rate and Momentum in ANN Momentum Learning Rate.10.15.20.25.30.20 83.0082.86.25 82.8682.1982.59 82.46.30 82.5982.7382.5981.5183.27.35 82.4682.8683.0082.5982.05.40 83.4082.5981.9281.7881.65 Accuracies above trained on 2179 instances, ~50/50 spiral/elliptical, 66% holdout To 3 Sigma ->

11

12 Quantifying Estimator Error Number of FoldsAccuracy 295.20 495.54 695.59 895.46 1095.46 1295.70 1495.75 1695.61 1895.54 2095.68 Example taken from Random Forest, Data Set 2, 15 Trees Average = 95.6 Standard Deviation = 0.158 All errors taken to 3 sigma. Error = 95.6  0.5

13

14 Conclusions Naïve Bayes is not the way to go. Random Forests, ANN, and SVM all have small variances, high accuracies Spirals harder to identify (need more training instances, or has human bias taken over?) Including information beyond what the human eye can see is, remarkably, helpful.


Download ppt "Modeling the Human Classification of Galaxy Morphology Wednesday, December 5, 2007 Mike Specian."

Similar presentations


Ads by Google