Crystallization Image Analysis on the World Community Grid

Slides:



Advertisements
Similar presentations
What We’ve Learned from the First 717,312 Crystallization Experiments George T. DeTitta 1, Melissa A. Bianca 1, Robert J. Collins 1, Ann Marie E. Faust.
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Pinpointing the security boundary in high-dimensional spaces using importance sampling Simon Tindemans, Ioannis Konstantelos, Goran Strbac Risk and Reliability.
Shape Classification Alex Yakubovich Elderlab Oct 7, 2011 John Wilder, Jacob Feldman, Manish Singh, Superordinate shape classification using natural shape.
What is Statistical Modeling
WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.
Assuming normally distributed data! Naïve Bayes Classifier.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:
Deconstruction of Drop Volume Ratio/Temperature Optimization Experiments Joseph R. Luft, Edward H. Snell, Jennifer R. Wolfley, Meriem I. Said, Ann M. Wojtaszcayk,
Evaluating Classifiers
Efficient Model Selection for Support Vector Machines
► Image Features for Neural Network Quantitative shape descriptions of a first-order Laplacian pyramid coefficients histogram combined with the power spectrum.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Leveraging Genetic Algorithm and Neural Networks in Automated Protein Crystal Recognition Ming Jack Po and Andrew Laine Department of Biomedical Engineering.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Computational Image Classification UMBC Department of Computer Science eBiquity Research Group February 19, 2010.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Human pose recognition from depth image MS Research Cambridge.
Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.
Christian A. Cumbaa and Igor Jurisica Division of Signaling Biology, Ontario Cancer Institute, Toronto,
Chapter 13 (Prototype Methods and Nearest-Neighbors )
A New Method for Crater Detection Heather Dunlop November 2, 2006.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
The Neural Engineering Data Consortium Mission: To focus the research community on a progression of research questions and to generate massive data sets.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Big Data Processing of School Shooting Archives
7. Performance Measurement
By Brian Lam and Vic Ciesielski RMIT University
Latent variable discovery in classification models
How to forecast solar flares?
Clustering MacKay - Chapter 20.
Machine Learning – Classification David Fenyő
Evaluating Classifiers
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Session 7: Face Detection (cont.)
CSSE463: Image Recognition Day 11
Erasmus University Rotterdam
Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel
Features & Decision regions
Classifying enterprises by economic activity
Roberto Battiti, Mauro Brunato
Mitchell Kossoris, Catelyn Scholl, Zhi Zheng
Improved Rooftop Detection in Aerial Images with Machine Learning
SEG 4630 E-Commerce Data Mining — Final Review —
CSSE463: Image Recognition Day 11
Evaluating Classifiers (& other algorithms)
Learning Algorithm Evaluation
Machine Learning in Practice Lecture 26
Estimation of Skin Color Range Using Achromatic Features
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
Model Evaluation and Selection
Boltzmann Machine (BM) (§6.4)
Midterm Exam Closed book, notes, computer Similar to test 1 in format:
Midterm Exam Closed book, notes, computer Similar to test 1 in format:
CSSE463: Image Recognition Day 11
Machine Learning with Clinical Data
Roc curves By Vittoria Cozza, matr
CSSE463: Image Recognition Day 11
Jia-Bin Huang Virginia Tech
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Data Mining Anomaly Detection
Introduction to Artificial Intelligence Lecture 22: Computer Vision II
Using simple machine learning for image segmentation
Information Organization: Evaluation of Classification Performance
An introduction to Machine Learning (ML)
Presentation transcript:

Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer Institute, Toronto, Ontario

Why automate classification of protein crystallization trial images? clear phase separation precipitate skin crystal X garbage unsure Hauptman-Woodward has 65,000,000 images. They want 65,000,000 outcomes.

Why automate classification of protein crystallization trial images? Assist or replace human screening Speed the search phase in protein crystallization Improve throughput, consistency, objectivity Enables data mining and statistical optimization of the crystallization process clear precipitate crystal

Image classification clear phase separation precipitate skin crystal feature extraction classification clear phase separation precipitate skin crystal X garbage unsure feature 1 feature 2 … feature k 100000s of numbers 7 numbers 10s of numbers

Truth data 96 study NESG & SGPP 50% unanimously-scored images 96 proteins X 1536 images hand-scored by 3 experts Presence/absence of 7 independent outcomes NESG & SGPP 15000 images Hand-scored by 1 expert, same scoring system 50% unanimously-scored images 10 most interesting compound categories 96-study NESG (crystals) SGPP (crystals)

Feature set 12375 features computed per image A few basic statistics 50 microcrystal features Euler number features, two variations 11 Blur levels 11 Blur levels X 4 thresholds Image “energy” 11 blur levels 2925 Grey-Level Co-occurrence Matrix features 3 different grey-level quantizations 13 basic functions 25 sample distances ~100 directions Computable from every point in the image Distilled to max range, max mean, min mean ~9500 image-blob features Radon & edge-detection

Our image analysis problem Computing all 12,375 features takes >5 hours for a single image We have 165,000 images in our training set Features must be evaluated for quality The best features (10s or low 100s) must be computed for the remaining 65,000,000 images Massive computing resources required!

Image analysis on the World Community Grid http://www.worldcommunitygrid.org a global, distributed-computing platform for solving large scientific computing problems with human impact 377,627 volunteers contribute idle CPU time of 960,346 devices. Our project: Help Conquer Cancer* launched November 2007. HCC has two goals: To survey a wide tract of image-feature space and identify image analysis algorithms and parameters (features) that best determine crystallization outcome. To perform the necessary image analysis on Hauptman Woodward’s archive of 65,000,000 crystallization trial images. * fundraising slogan of the Ontario Cancer Institute and its parent organization.

Image analysis on the World Community Grid HCC has two phases Phase I: calculate 12,375 features per image on high-priority images, including 165,441 hand-scored images. November 2007-May 2008 analysis on hand-scored images completed January 2008 Phase II: calculate the best features from Phase I on the backlog of HWI images Grid members have contributed 8,919 CPU-years so far to HCC, an average of 55 CPU-years per day.

Phase I: feature assessment

Measuring feature quality feature entropy Treat as random variables: Image class Feature value Measure the mutual information between them (unit: bits) = entropy(class) + entropy(feature) – entropy(class,feature) class entropy

Measuring feature quality clear precipitate (no crystal) other

Information density: microcrystal counts parameter space Clear Precipitate Crystal

Information density: GLCM maximum range parameter space Clear Precipitate Crystal

Information density: Radon-Sobel soft sum parameter space Clear Precipitate Crystal

Information density: Radon-Sobel blob metrics (means) parameter space Clear Precipitate Crystal

Towards Phase II: image classification

Building classifiers handpicked 74 features from peaks in the clear, precipitate and other mutual information plots two classification schemes three-way: clear, non-crystal precipitate, other ten-way: clear, phase separation, phase + precipitate, skin, phase + crystal, precip, precip + skin, precip + crystal, crystal, garbage naïve Bayes model leave-one-out cross-validation

Measuring classifier accuracy: precision and recall crystals false negatives recall “I think these are crystals” precision true positives false positives

Three-class distribution Clear 24.3% Precipitate AND NOT crystal 52.7% Other 23.0% 17095 5258 5109 15928 45112 1819 617 817 27615 clear non-crystal precipitate other non-crystal precipitate machine says true class Confusion matrix

Recall & precision

10-class distribution Clear 33.83% Phase separation 7.00% Phase separation + precipitate 0.50% Skin 0.79% Phase separation + crystal 2.32% Precipitate 34.25% Precipitate + skin 4.95% Precipitate + crystal 7.53% Crystal 8.34% Garbage 0.55%

Confusion matrix machine says true class clear phase separation 313 20 2 52 1 49 4 28 129 3129 1072 90 219 649 586 56 345 888 8 914 2852 611 1063 562 111 85 222 35 29 305 395 2008 692 328 243 33 205 12 385 512 4088 3440 16907 553 617 494 1972 441 10 551 292 88 75 511 37 268 74 105 5 13 6 372 126 3 31 107 81 97 51 32 24 91 503 139 298 668 281 40 2433 1446 1193 92 815 1135 227 25585 clear phase separation phase and precipitate skin phase and crystal precipitate precipitate and skin precipitate and crystal crystal garbage machine says true class

Recall & precision

Acknowledgements Hauptman-Woodward Medical Research Institute George DeTitta, Joe Luft, Eddie Snell, Mike Malkowski, Angela Lauricella, Max Thayer, Raymond Nagel, Steve Potter, and the 96-study reviewers. World Community Grid Bill Bovermann, Viktors Berstis, Jonathan D. Armstrong, Tedi Hahn, Kevin Reed, Keith J. Uplinger, Nels Wadycki IBM Deep Computing: Jerry Heyman Jurisica Lab: Richard Lu All crystallization images were generated at the High-Throughput Screening lab at The Hauptman-Woodward Institute. Funding from NIH U54 GM074899 Genome Canada IBM NSERC (and earlier work from) NIH P50 GM62413 CITO