Christian A. Cumbaa and Igor Jurisica Division of Signaling Biology, Ontario Cancer Institute, Toronto,

Slides:



Advertisements
Similar presentations
Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.
Advertisements

Applications of one-class classification
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Evaluation of segmentation. Example Reference standard & segmentation.
Localization of Diagnostically Relevant Regions of Interest in Whole Slide Images Ezgi Mercan 1, Selim Aksoy 2, Linda G. Shapiro 1, Donald L. Weaver 3,
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
What is Statistical Modeling
Confidence Measures for Speech Recognition Reza Sadraei.
Model Evaluation Metrics for Performance Evaluation
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Page-level Template Detection via Isotonic Smoothing Deepayan ChakrabartiYahoo! Research Ravi KumarYahoo! Research Kunal PuneraUniv. of Texas at Austin.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Evaluation.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:
Lecture 14: Classification Thursday 18 February 2010 Reading: Ch – 7.19 Last lecture: Spectral Mixture Analysis.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Introduction to machine learning
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 9 Functional Testing
Linear Algebra and Image Processing
Exercise Session 10 – Image Categorization
Evaluating Classifiers
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Chapter 4 Pattern Recognition Concepts continued.
► Image Features for Neural Network Quantitative shape descriptions of a first-order Laplacian pyramid coefficients histogram combined with the power spectrum.
Principles of Pattern Recognition
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.
Analyzing and Interpreting Quantitative Data
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Image Classification 영상분류
Chapter 16 The Chi-Square Statistic
Accuracy Assessment Having produced a map with classification is only 50% of the work, we need to quantify how good the map is. This step is called the.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Remote Sensing Classification Accuracy
Prostate Cancer CAD Michael Feldman, MD, PhD Assistant Professor Pathology University Pennsylvania.
Interactive Learning of the Acoustic Properties of Objects by a Robot
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley,
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Wonjun Kim and Changick Kim, Member, IEEE
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Supervise Learning. 2 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.”
Information Organization: Evaluation of Classification Performance.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Creating Data Representations
Crystallization Image Analysis on the World Community Grid
Multivariate Methods Berlin Chen
Making Use of Associations Tests
Roc curves By Vittoria Cozza, matr
Information Organization: Evaluation of Classification Performance
Presentation transcript:

Christian A. Cumbaa and Igor Jurisica Division of Signaling Biology, Ontario Cancer Institute, Toronto, Ontario Automated Classification of Crystallization Images Acknowledgements All images in these studies were generated at the High-Throughput Screening lab at The Hauptman- Woodward Institute. Multi-outcome truth data was painstakingly generated by eight heroes at HWI, and carefully organized, cleaned, and curated by Max Thayer and Raymond Nagel at HWI. This work was funded by the following grants and organizations: NIH U54 GM , Genome Canada, IBM, NSERC RGPIN Earlier work supported by NIH P50 GM , NSERC and CITO. Image classification New directions Table 2: The confusion matrix summarizing the match between actual crystallization outcomes and the labels assigned by the image analysis system (Experiment 2). Numbers indicate counts of actual images. Figure 2: Distributions of observed crystallization outcomes (rows) grouped by labels (columns) applied by the image analysis system. Elements on the diagonal indicate correct classifications. Numbers indicate Precision scores for each class. True positives (highest-scoring) False negatives (lowest-scoring) False positives (highest-scoring) Crystal Crystal + Phase Sep. Crystal + Precip. Precip + Skin Precip + Phase Sep. + Skin Phase Sep. Clear Garbage Figure 3: Example classifications and misclassifications for each category (Experiment 2). Goal: We aim to automatically classify all images generated by the HWI robotic imaging system, and eliminate the need for a crystallographer to search among hundreds of images for crystal hits, or other conditions of interest. Data source: Truth data for images from Hauptman Woodward's High- Throughput Screening (HTS) Laboratory Each image evaluated by 3 or more experts Scored for presence/absence of 7 independent crystallization conditions: clear, phase separation, precipitate, skin, crystal, garbage, unsure Experiment 2 was supplemented with 6456 crystal images (NESG-sourced proteins) crystal images (SGPP-sourced proteins) Image analysis: Each image was processed by our image processing algorithms in order to extract 840 numeric measures of image texture. These features measure the presence of straight edges, grey-tone statistics, etc., each measured at multiple scale and contrast levels. Feature selection: For each target category of images, we select a subset of the 840 features that most effectively distinguishes positive/negative examples of each category. Images are therefore reduced to a short vectors of numeric values. Image classification: To train a classifier, we construct statistical models of the probability distribution of feature-vector values: one for each category. For these experiments, we use multivariate Gaussians to estimate probability density. New images are classified by comparing their feature vectors to each category's probability distribution. The result, for each image, is itself a probability distribution across all categories. The category with the highest probability will be output by the classifier. To avoid bias in our models, each data point is used in turn for training and testing in a 10-fold cross-validation process. Measuring performance: Two important performance measures are used, precision and recall. Precision measures the fraction of images classified as category C that actually belong to C. Recall measures the fraction of images belonging to C that were classified as C. Experiment 1: Independent crystallization conditions. 6 classifiers trained to detect clear, phase separation, precipitate, skin, crystal, garbage. Training/test images limited to unanimously-scored images (per- category). Table 1 summarizes the performance of each. Experiment 2: Compound crystallization conditions. One 10-way classifier trained to distinguish between 10 compound categories: crystal only, crystal+phase separation, crystal+precipitate, precipitate only, precipitate+skin, precipitate+phase separation, phase separation+skin, phase separation only, clear drop, and garbage. Training/test images limited to unanimously-scored images belonging to one of the 10 categories. Table 2 summarizes the performance of the classifier. Figure 1 illustrates the distribution of true positives and false negatives. Figure 2 illustrates the distribution of true positives and false positives. Figure 3 gives example images of each. Discussion Experiments 1 and 2 reveal degrees of difficulty in recognizing crystallization outcomes. Most singleton categories in Experiment 2 performed generally well. Clear drops are most accurately classified. Many compound categories demonstrate the classifier's confusion between certain mixtures of outcomes. All crystal-bearing categories are confused to a degree. Precipitates as a whole are easily detected, but compound precipitates are difficult to subdivide. Results New image analysis system (under development) Revised and expanded feature set Textural features of local regions of the image More precise texture, straight edge, and discrete object metrics World Community Grid New system will run on the World Community Grid 150 CPU-years compute time per day Will compute features for 60 million images Project launch Spring/Summer Figure 1: Distributions of classification labels (columns), as applied by the image analysis system to observed crystallization outcomes (rows). Elements on the diagonal indicate correct classifications. Numbers indicate Recall scores for each outcome Table 1: The confusion matrices summarizing the match between actual crystallization outcomes and the labels assigned by the classification system (Experiment 1). Numbers indicate counts of actual images.