Computational Biology, Part 28 Automated Interpretation of Subcellular Patterns in Microscope Images III Robert F. Murphy Copyright 1996, 1999, All rights reserved.
Results
1. Create sets of images showing the location of many different proteins (each set defines one class of pattern) 2. Reduce each image to a set of numerical values (“features”) that are insensitive to position and rotation of the cell 3. Use statistical classification methods to “learn” how to distinguish each class using the features Supervised learning of patterns
ER TubulinDNATfR Actin Nucleolin MitoLAMP gpp130giantin Boland & Murphy D Images of 10 Patterns (HeLa)
Evaluating Classifiers Divide ~100 images for each class into training set and test set Divide ~100 images for each class into training set and test set Use the training set to determine rules for the classes Use the training set to determine rules for the classes Use the test set to evaluate performance Use the test set to evaluate performance Repeat with different division into training and test Repeat with different division into training and test Evaluate different sets of features chosen as most discriminative by feature selection methods Evaluate different sets of features chosen as most discriminative by feature selection methods Evaluate different classifiers Evaluate different classifiers
2D Classification Results Overall accuracy = 92% TrueClass Output of the Classifier DNAERGiaGppLamMitNucActTfRTub DNA ER Gia Gpp Lam Mit Nuc Act TfR Tub Murphy et al 2000; Boland & Murphy 2001; Huang & Murphy 2004
Human Classification Results Overall accuracy = 83% TrueClass Output of the Classifier DNAERGiaGppLamMitNucActTfRTub DNA ER Gia Gpp Lam Mit Nuc Act TfR Tub Murphy et al 2003
Computer vs. Human
3D HeLa cell images GiantinNuclearERLysosomalgpp130 ActinMitoch.NucleolarTubulinEndosomal Images collected using facilities at the Center for Biologic Imaging courtesy of Simon Watkins Velliste & Murphy 2002
3D Classification Results Overall accuracy = 98% TrueClass Output of the Classifier DNAERGiaGppLamMitNucActTfRTub DNA ER Gia Gpp Lam Mit Nuc Act TfR Tub Velliste & Murphy 2002; Chen & Murphy 2004
Unsupervised Learning to Identify High-Resolution Protein Patterns
Location Proteomics Tag many proteins Tag many proteins We have used CD-tagging (developed by Jonathan Jarvik and Peter Berget): Infect population of cells with a retrovirus carrying DNA sequence that will “tag” in a random gene in each cell Isolate separate clones, each of which produces express one tagged protein Isolate separate clones, each of which produces express one tagged protein Use RT-PCR to identify tagged gene in each clone Use RT-PCR to identify tagged gene in each clone Collect many live cell images for each clone using spinning disk confocal fluorescence microscopy Collect many live cell images for each clone using spinning disk confocal fluorescence microscopy Jarvik et al 2002
What Now? Group ~90 tagged clones by pattern
Solution: Group them automatically How? How? SLF features can be used to measure similarity of protein patterns SLF features can be used to measure similarity of protein patterns This allows us for the first time to create a systematic, objective, framework for describing subcellular locations: a Subcellular Location Tree This allows us for the first time to create a systematic, objective, framework for describing subcellular locations: a Subcellular Location Tree Start by grouping two proteins whose patterns are most similar, keep adding branches for less and less similar patterns Start by grouping two proteins whose patterns are most similar, keep adding branches for less and less similar patterns Chen et al 2003; Chen and Murphy 2005
Protein name Human description From databases
Nucleolar Proteins
Punctate Nuclear Proteins
Predominantly Nuclear Proteins with Some Punctate Cytoplasmic Staining
Nuclear and Cytoplasmic Proteins with Some Punctate Staining
Uniform
Bottom: Visual Assignment to “known” locations Top: Automated Grouping and Assignment Protein name
Refining clusters using temporal textures
Incorporating Temporal Information Time series images could be useful for Time series images could be useful for Distinguishing proteins that are not distinguishable in static images Analyzing protein movement in the presence of drugs, or during different stages of the cell cycle Need approach that does not require detailed understanding of the objects/organelles in which each protein is located Need approach that does not require detailed understanding of the objects/organelles in which each protein is located Generic Object Tracking approach? Not all proteins in discernible objects Not all proteins in discernible objects Non-tracking approach needed
Texture Features Haralick texture features describe the correlation in intensity of pixels that are next to each other in space. Haralick texture features describe the correlation in intensity of pixels that are next to each other in space. These have been valuable for classifying static patterns. Temporal texture features describe the correlation in intensity of pixels in the same position in images next to each other over time. Temporal texture features describe the correlation in intensity of pixels in the same position in images next to each other over time.
Temporal Textures based on Co-occurrence Matrix Temporal co-occurrence matrix P: Temporal co-occurrence matrix P: N level by N level matrix, Element P[i, j] is the probability that a pixel with value i has value j in the next image (time point). N level by N level matrix, Element P[i, j] is the probability that a pixel with value i has value j in the next image (time point). Thirteen statistics calculated on P are used as features Thirteen statistics calculated on P are used as features
Temporal co-occurrence matrix (for image that does not change) Image at t0Image at t1
Temporal co-occurrence matrix (for image that changes) Image at t0Image at t1
Implementation of Temporal Texture Features Compare image pairs with different time interval,compute 13 temporal texture features for each pair. Compare image pairs with different time interval,compute 13 temporal texture features for each pair. Use the average and variance of features in each kind of time interval, yields 13*5*2=130 features Use the average and variance of features in each kind of time interval, yields 13*5*2=130 features T= 0s 45s 90s 135s 180s 225s 270s 315s 360s 405s …
Test: Evaluate ability of temporal textures to improve discrimination of similar protein patterns
Results for temporal texture and static features Dia1SdprAtp5a1Adfptimm23 Dia Sdpr Atp5a Adfp Timm Average Accuracy 85.1%
Conclusion Addition of temporal texture features improves classification accuracy of protein locations Addition of temporal texture features improves classification accuracy of protein locations
Generative models of subcellular patterns
Decomposing mixture patterns Clustering or classifying whole cell patterns will consider each combination of two or more “basic” patterns as a unique new pattern Clustering or classifying whole cell patterns will consider each combination of two or more “basic” patterns as a unique new pattern Desirable to have a way to decompose mixtures instead Desirable to have a way to decompose mixtures instead One approach would be to assume that each basic pattern has a recognizable combination of different types of objects One approach would be to assume that each basic pattern has a recognizable combination of different types of objects
Object-based subcellular pattern models Goals: Goals: to be able to recognize “pure” patterns using only objects to be able to recognize and unmix patterns consisting of two or more “pure” patterns to enable building of generative models that can synthesize patterns from objects: needed for systems biology
Object type determination Rather than specifying object types, we chose to learn them from the data Rather than specifying object types, we chose to learn them from the data Use subset of SLFs to describe objects Use subset of SLFs to describe objects Perform k-means clustering for k from 2 to 40 Perform k-means clustering for k from 2 to 40 Evaluate goodness of clustering using Akaike Information Criterion Evaluate goodness of clustering using Akaike Information Criterion Choose k that gives lowest AIC Choose k that gives lowest AIC Zhao et al 2005
Unmixing: Learning strategy Once object types are known, each cell in the training (pure) set can be represented as a vector of the amount of fluorescence for each object type Once object types are known, each cell in the training (pure) set can be represented as a vector of the amount of fluorescence for each object type Learn probability model for these vectors for each class Learn probability model for these vectors for each class Mixed images can then be represented using mixture fractions times the probability distribution of objects for each class Mixed images can then be represented using mixture fractions times the probability distribution of objects for each class
Pure Golgi Pattern Pure Lysosomal Pattern 50% mix of each
Two-stage Strategy for unmixing unknown image Find objects in unknown (test) image, classify each object into one of the object types using learned object type classifier built with all objects from training images Find objects in unknown (test) image, classify each object into one of the object types using learned object type classifier built with all objects from training images For each test image, make list of how often each object type is found For each test image, make list of how often each object type is found Find the fractions of each class that give “best” match to this list Find the fractions of each class that give “best” match to this list
Test of unmixing Use 2D HeLa data Use 2D HeLa data Generate random mixture fractions for eight major patterns (summing to 1) Generate random mixture fractions for eight major patterns (summing to 1) Use these to synthesize “images” corresponding to these mixtures Use these to synthesize “images” corresponding to these mixtures Try to estimate mixture fractions from the synthesized images Try to estimate mixture fractions from the synthesized images Compare to true mixture fractions Compare to true mixture fractions
Results Given 5 synthesized “cell images” with any mixture of 8 basic patterns Given 5 synthesized “cell images” with any mixture of 8 basic patterns Average accuracy of estimating the mixture coefficients is 83% Average accuracy of estimating the mixture coefficients is 83% Zhao et al 2005
Overview Object Detection Real images objects Object type assigning Object type modeling Statistical models Object types
Generating images Object Detection Real images objects Object type assigning Object type modeling Statistical models Object types Sampling Generated images
Generating objects and images Object Detection Real images objects Object type assigning Object type modeling Statistical models Object morphology modeling Object types Sampling Generated images
LAMP2 pattern Nucleus Cell membrane Protein
Nuclear Shape - Medial Axis Model Rotate Medial axis Represented by two curves the medial axis width along the medial axis width
Synthetic Nuclear Shapes
With added nuclear texture
Cell Shape Description: Distance Ratio d1 d2 Capture variation as a principal components model
Generation
Small Objects Approximated by 2D Gaussian distribution Approximated by 2D Gaussian distribution
Object Positions d1 d2
Positions Logistic regression Logistic regression Generation Generation Each pixel has a weight according to the logistic model
Fully Synthetic Cell Image
RealSynthetic
Conclusions and Future Work Object-based generative models useful for communicating information about subcellular patterns Object-based generative models useful for communicating information about subcellular patterns Work continues! Work continues!
Final word Goal of automated image interpretation should not be Goal of automated image interpretation should not be Quantitating intensity or colocalization Making it easier for biologists to see what’s happening Goal should be generalizable, verifiable, mechanistic models of cell organization and behavior automatically derived from images Goal should be generalizable, verifiable, mechanistic models of cell organization and behavior automatically derived from images