Computational Biology, Part 28 Automated Interpretation of Subcellular Patterns in Microscope Images III Robert F. Murphy Copyright  1996, 1999, 2000-2006.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Visual Vocabulary Construction for Mining Biomedical Images Arnab Bhattacharya, Vebjorn Ljosa, Jia-Yu Pan Presented by Li An, CIS, TU.
Wavelets Fast Multiresolution Image Querying Jacobs et.al. SIGGRAPH95.
Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.
Image Interpretation Methods for Protein Location in Cells Meel Velliste Murphy Lab Dept. of Biomedical Engineering Carnegie Mellon University Copyright.
Model-based clustering of gene expression data Ka Yee Yeung 1,Chris Fraley 2, Alejandro Murua 3, Adrian E. Raftery 2, and Walter L. Ruzzo 1 1 Department.
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Computational Biology, Part 23 Segmentation and Feature Calculation for Automated Interpretation of Subcellular Patterns Robert F. Murphy Copyright 
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Region labelling Giving a region a name. Image Processing and Computer Vision: 62 Introduction Region detection isolated regions Region description properties.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Machine Learning Challenges in Location Proteomics Robert F. Murphy Departments of Biological Sciences and Biomedical Engineering & Center for Automated.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Classification of Protein Localization Patterns in 3-D Meel Velliste Carnegie Mellon University.
1 Cluster Analysis EPP 245 Statistical Analysis of Laboratory Data.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Radial Basis Function Networks
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Integration of PSLID and SLIF with “Virtual Cell” Robert F. Murphy, Les Loew & Ion Moraru Ray and Stephanie Lane Professor of Computational Biology Molecular.
Environmental Remote Sensing Lecture 5: Image Classification " Purpose: – categorising data – data abstraction / simplification – data interpretation –
Statistics Chapter 9. Statistics Statistics, the collection, tabulation, analysis, interpretation, and presentation of numerical data, provide a viable.
OBJECT RECOGNITION. The next step in Robot Vision is the Object Recognition. This problem is accomplished using the extracted feature information. The.
More on Microarrays Chitta Baral Arizona State University.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Computational Biology, Part 24 Biological Imaging IV Robert F. Murphy Copyright  All rights reserved.
Image Classification 영상분류
Chapter 21 R(x) Algorithm a) Anomaly Detection b) Matched Filter.
Big Ideas Differentiation Frames with Icons. 1. Number Uses, Classification, and Representation- Numbers can be used for different purposes, and numbers.
Course 9 Texture. Definition: Texture is repeating patterns of local variations in image intensity, which is too fine to be distinguished. Texture evokes.
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Computer Graphics and Image Processing (CIS-601).
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Chapter 12 Object Recognition Chapter 12 Object Recognition 12.1 Patterns and pattern classes Definition of a pattern class:a family of patterns that share.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Data Mining and Decision Support
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Semi-Supervised Clustering
Chapter 7. Classification and Prediction
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Typical Image Selection
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
Dimension reduction : PCA and Clustering
Text Categorization Berlin Chen 2003 Reference:
Using Clustering to Make Prediction Intervals For Neural Networks
Presentation transcript:

Computational Biology, Part 28 Automated Interpretation of Subcellular Patterns in Microscope Images III Robert F. Murphy Copyright  1996, 1999, All rights reserved.

Results

1. Create sets of images showing the location of many different proteins (each set defines one class of pattern) 2. Reduce each image to a set of numerical values (“features”) that are insensitive to position and rotation of the cell 3. Use statistical classification methods to “learn” how to distinguish each class using the features Supervised learning of patterns

ER TubulinDNATfR Actin Nucleolin MitoLAMP gpp130giantin Boland & Murphy D Images of 10 Patterns (HeLa)

Evaluating Classifiers Divide ~100 images for each class into training set and test set Divide ~100 images for each class into training set and test set Use the training set to determine rules for the classes Use the training set to determine rules for the classes Use the test set to evaluate performance Use the test set to evaluate performance Repeat with different division into training and test Repeat with different division into training and test Evaluate different sets of features chosen as most discriminative by feature selection methods Evaluate different sets of features chosen as most discriminative by feature selection methods Evaluate different classifiers Evaluate different classifiers

2D Classification Results Overall accuracy = 92% TrueClass Output of the Classifier DNAERGiaGppLamMitNucActTfRTub DNA ER Gia Gpp Lam Mit Nuc Act TfR Tub Murphy et al 2000; Boland & Murphy 2001; Huang & Murphy 2004

Human Classification Results Overall accuracy = 83% TrueClass Output of the Classifier DNAERGiaGppLamMitNucActTfRTub DNA ER Gia Gpp Lam Mit Nuc Act TfR Tub Murphy et al 2003

Computer vs. Human

3D HeLa cell images GiantinNuclearERLysosomalgpp130 ActinMitoch.NucleolarTubulinEndosomal Images collected using facilities at the Center for Biologic Imaging courtesy of Simon Watkins Velliste & Murphy 2002

3D Classification Results Overall accuracy = 98% TrueClass Output of the Classifier DNAERGiaGppLamMitNucActTfRTub DNA ER Gia Gpp Lam Mit Nuc Act TfR Tub Velliste & Murphy 2002; Chen & Murphy 2004

Unsupervised Learning to Identify High-Resolution Protein Patterns

Location Proteomics Tag many proteins Tag many proteins  We have used CD-tagging (developed by Jonathan Jarvik and Peter Berget): Infect population of cells with a retrovirus carrying DNA sequence that will “tag” in a random gene in each cell Isolate separate clones, each of which produces express one tagged protein Isolate separate clones, each of which produces express one tagged protein Use RT-PCR to identify tagged gene in each clone Use RT-PCR to identify tagged gene in each clone Collect many live cell images for each clone using spinning disk confocal fluorescence microscopy Collect many live cell images for each clone using spinning disk confocal fluorescence microscopy Jarvik et al 2002

What Now? Group ~90 tagged clones by pattern

Solution: Group them automatically How? How? SLF features can be used to measure similarity of protein patterns SLF features can be used to measure similarity of protein patterns This allows us for the first time to create a systematic, objective, framework for describing subcellular locations: a Subcellular Location Tree This allows us for the first time to create a systematic, objective, framework for describing subcellular locations: a Subcellular Location Tree Start by grouping two proteins whose patterns are most similar, keep adding branches for less and less similar patterns Start by grouping two proteins whose patterns are most similar, keep adding branches for less and less similar patterns Chen et al 2003; Chen and Murphy 2005

Protein name Human description From databases

Nucleolar Proteins

Punctate Nuclear Proteins

Predominantly Nuclear Proteins with Some Punctate Cytoplasmic Staining

Nuclear and Cytoplasmic Proteins with Some Punctate Staining

Uniform

Bottom: Visual Assignment to “known” locations Top: Automated Grouping and Assignment Protein name

Refining clusters using temporal textures

Incorporating Temporal Information Time series images could be useful for Time series images could be useful for  Distinguishing proteins that are not distinguishable in static images  Analyzing protein movement in the presence of drugs, or during different stages of the cell cycle Need approach that does not require detailed understanding of the objects/organelles in which each protein is located Need approach that does not require detailed understanding of the objects/organelles in which each protein is located  Generic Object Tracking approach? Not all proteins in discernible objects Not all proteins in discernible objects  Non-tracking approach needed

Texture Features Haralick texture features describe the correlation in intensity of pixels that are next to each other in space. Haralick texture features describe the correlation in intensity of pixels that are next to each other in space.  These have been valuable for classifying static patterns. Temporal texture features describe the correlation in intensity of pixels in the same position in images next to each other over time. Temporal texture features describe the correlation in intensity of pixels in the same position in images next to each other over time.

Temporal Textures based on Co-occurrence Matrix Temporal co-occurrence matrix P: Temporal co-occurrence matrix P: N level by N level matrix, Element P[i, j] is the probability that a pixel with value i has value j in the next image (time point). N level by N level matrix, Element P[i, j] is the probability that a pixel with value i has value j in the next image (time point). Thirteen statistics calculated on P are used as features Thirteen statistics calculated on P are used as features

Temporal co-occurrence matrix (for image that does not change) Image at t0Image at t1

Temporal co-occurrence matrix (for image that changes) Image at t0Image at t1

Implementation of Temporal Texture Features Compare image pairs with different time interval,compute 13 temporal texture features for each pair. Compare image pairs with different time interval,compute 13 temporal texture features for each pair. Use the average and variance of features in each kind of time interval, yields 13*5*2=130 features Use the average and variance of features in each kind of time interval, yields 13*5*2=130 features T= 0s 45s 90s 135s 180s 225s 270s 315s 360s 405s …

Test: Evaluate ability of temporal textures to improve discrimination of similar protein patterns

Results for temporal texture and static features Dia1SdprAtp5a1Adfptimm23 Dia Sdpr Atp5a Adfp Timm Average Accuracy 85.1%

Conclusion Addition of temporal texture features improves classification accuracy of protein locations Addition of temporal texture features improves classification accuracy of protein locations

Generative models of subcellular patterns

Decomposing mixture patterns Clustering or classifying whole cell patterns will consider each combination of two or more “basic” patterns as a unique new pattern Clustering or classifying whole cell patterns will consider each combination of two or more “basic” patterns as a unique new pattern Desirable to have a way to decompose mixtures instead Desirable to have a way to decompose mixtures instead One approach would be to assume that each basic pattern has a recognizable combination of different types of objects One approach would be to assume that each basic pattern has a recognizable combination of different types of objects

Object-based subcellular pattern models Goals: Goals:  to be able to recognize “pure” patterns using only objects  to be able to recognize and unmix patterns consisting of two or more “pure” patterns  to enable building of generative models that can synthesize patterns from objects: needed for systems biology

Object type determination Rather than specifying object types, we chose to learn them from the data Rather than specifying object types, we chose to learn them from the data Use subset of SLFs to describe objects Use subset of SLFs to describe objects Perform k-means clustering for k from 2 to 40 Perform k-means clustering for k from 2 to 40 Evaluate goodness of clustering using Akaike Information Criterion Evaluate goodness of clustering using Akaike Information Criterion Choose k that gives lowest AIC Choose k that gives lowest AIC Zhao et al 2005

Unmixing: Learning strategy Once object types are known, each cell in the training (pure) set can be represented as a vector of the amount of fluorescence for each object type Once object types are known, each cell in the training (pure) set can be represented as a vector of the amount of fluorescence for each object type Learn probability model for these vectors for each class Learn probability model for these vectors for each class Mixed images can then be represented using mixture fractions times the probability distribution of objects for each class Mixed images can then be represented using mixture fractions times the probability distribution of objects for each class

Pure Golgi Pattern Pure Lysosomal Pattern 50% mix of each

Two-stage Strategy for unmixing unknown image Find objects in unknown (test) image, classify each object into one of the object types using learned object type classifier built with all objects from training images Find objects in unknown (test) image, classify each object into one of the object types using learned object type classifier built with all objects from training images For each test image, make list of how often each object type is found For each test image, make list of how often each object type is found Find the fractions of each class that give “best” match to this list Find the fractions of each class that give “best” match to this list

Test of unmixing Use 2D HeLa data Use 2D HeLa data Generate random mixture fractions for eight major patterns (summing to 1) Generate random mixture fractions for eight major patterns (summing to 1) Use these to synthesize “images” corresponding to these mixtures Use these to synthesize “images” corresponding to these mixtures Try to estimate mixture fractions from the synthesized images Try to estimate mixture fractions from the synthesized images Compare to true mixture fractions Compare to true mixture fractions

Results Given 5 synthesized “cell images” with any mixture of 8 basic patterns Given 5 synthesized “cell images” with any mixture of 8 basic patterns Average accuracy of estimating the mixture coefficients is 83% Average accuracy of estimating the mixture coefficients is 83% Zhao et al 2005

Overview Object Detection Real images objects Object type assigning Object type modeling Statistical models Object types

Generating images Object Detection Real images objects Object type assigning Object type modeling Statistical models Object types Sampling Generated images

Generating objects and images Object Detection Real images objects Object type assigning Object type modeling Statistical models Object morphology modeling Object types Sampling Generated images

LAMP2 pattern Nucleus Cell membrane Protein

Nuclear Shape - Medial Axis Model Rotate Medial axis Represented by two curves the medial axis width along the medial axis width

Synthetic Nuclear Shapes

With added nuclear texture

Cell Shape Description: Distance Ratio d1 d2 Capture variation as a principal components model

Generation

Small Objects Approximated by 2D Gaussian distribution Approximated by 2D Gaussian distribution

Object Positions d1 d2

Positions Logistic regression Logistic regression Generation Generation  Each pixel has a weight according to the logistic model

Fully Synthetic Cell Image

RealSynthetic

Conclusions and Future Work Object-based generative models useful for communicating information about subcellular patterns Object-based generative models useful for communicating information about subcellular patterns Work continues! Work continues!

Final word Goal of automated image interpretation should not be Goal of automated image interpretation should not be  Quantitating intensity or colocalization  Making it easier for biologists to see what’s happening Goal should be generalizable, verifiable, mechanistic models of cell organization and behavior automatically derived from images Goal should be generalizable, verifiable, mechanistic models of cell organization and behavior automatically derived from images