Computational Biology, Part 28 Automated Interpretation of Subcellular Patterns in Microscope Images III Robert F. Murphy Copyright  1996, 1999, 2000-2006.

Results

1. Create sets of images showing the location of many different proteins (each set defines one class of pattern) 2. Reduce each image to a set of numerical values (“features”) that are insensitive to position and rotation of the cell 3. Use statistical classification methods to “learn” how to distinguish each class using the features Supervised learning of patterns

ER TubulinDNATfR Actin Nucleolin MitoLAMP gpp130giantin Boland & Murphy 2001 2D Images of 10 Patterns (HeLa)

Evaluating Classifiers Divide ~100 images for each class into training set and test set Divide ~100 images for each class into training set and test set Use the training set to determine rules for the classes Use the training set to determine rules for the classes Use the test set to evaluate performance Use the test set to evaluate performance Repeat with different division into training and test Repeat with different division into training and test Evaluate different sets of features chosen as most discriminative by feature selection methods Evaluate different sets of features chosen as most discriminative by feature selection methods Evaluate different classifiers Evaluate different classifiers

2D Classification Results Overall accuracy = 92% TrueClass Output of the Classifier DNAERGiaGppLamMitNucActTfRTub DNA99100000000 ER09700020001 Gia00917000020 Gpp001482002010 Lam001088100100 Mit03000920033 Nuc00000099010 Act000000010000 TfR010012201812 Tub12000100195 Murphy et al 2000; Boland & Murphy 2001; Huang & Murphy 2004

Human Classification Results Overall accuracy = 83% TrueClass Output of the Classifier DNAERGiaGppLamMitNucActTfRTub DNA100000000000 ER09000360000 Gia005636330000 Gpp005433000030 Lam006073000200 Mit03000960003 Nuc000000100000 Act000000010000 TfR013003000830 Tub03000003093 Murphy et al 2003

Computer vs. Human

3D HeLa cell images GiantinNuclearERLysosomalgpp130 ActinMitoch.NucleolarTubulinEndosomal Images collected using facilities at the Center for Biologic Imaging courtesy of Simon Watkins Velliste & Murphy 2002

3D Classification Results Overall accuracy = 98% TrueClass Output of the Classifier DNAERGiaGppLamMitNucActTfRTub DNA98200000000 ER010000000000 Gia001000000000 Gpp00096400000 Lam00049500002 Mit00200960200 Nuc000000100000 Act000000010000 TfR00002000962 Tub02000000098 Velliste & Murphy 2002; Chen & Murphy 2004

Unsupervised Learning to Identify High-Resolution Protein Patterns

Location Proteomics Tag many proteins Tag many proteins  We have used CD-tagging (developed by Jonathan Jarvik and Peter Berget): Infect population of cells with a retrovirus carrying DNA sequence that will “tag” in a random gene in each cell Isolate separate clones, each of which produces express one tagged protein Isolate separate clones, each of which produces express one tagged protein Use RT-PCR to identify tagged gene in each clone Use RT-PCR to identify tagged gene in each clone Collect many live cell images for each clone using spinning disk confocal fluorescence microscopy Collect many live cell images for each clone using spinning disk confocal fluorescence microscopy Jarvik et al 2002

What Now? Group ~90 tagged clones by pattern

Solution: Group them automatically How? How? SLF features can be used to measure similarity of protein patterns SLF features can be used to measure similarity of protein patterns This allows us for the first time to create a systematic, objective, framework for describing subcellular locations: a Subcellular Location Tree This allows us for the first time to create a systematic, objective, framework for describing subcellular locations: a Subcellular Location Tree Start by grouping two proteins whose patterns are most similar, keep adding branches for less and less similar patterns Start by grouping two proteins whose patterns are most similar, keep adding branches for less and less similar patterns Chen et al 2003; Chen and Murphy 2005

Protein name Human description From databases http://murphylab.web.cmu.edu/services/PSLID/tree.html

Nucleolar Proteins

Punctate Nuclear Proteins

Predominantly Nuclear Proteins with Some Punctate Cytoplasmic Staining

Nuclear and Cytoplasmic Proteins with Some Punctate Staining

Uniform

Bottom: Visual Assignment to “known” locations Top: Automated Grouping and Assignment Protein name http://murphylab.web.cmu.edu/services/PSLID/tree.html

Refining clusters using temporal textures

Incorporating Temporal Information Time series images could be useful for Time series images could be useful for  Distinguishing proteins that are not distinguishable in static images  Analyzing protein movement in the presence of drugs, or during different stages of the cell cycle Need approach that does not require detailed understanding of the objects/organelles in which each protein is located Need approach that does not require detailed understanding of the objects/organelles in which each protein is located  Generic Object Tracking approach? Not all proteins in discernible objects Not all proteins in discernible objects  Non-tracking approach needed

Texture Features Haralick texture features describe the correlation in intensity of pixels that are next to each other in space. Haralick texture features describe the correlation in intensity of pixels that are next to each other in space.  These have been valuable for classifying static patterns. Temporal texture features describe the correlation in intensity of pixels in the same position in images next to each other over time. Temporal texture features describe the correlation in intensity of pixels in the same position in images next to each other over time.

Temporal Textures based on Co-occurrence Matrix Temporal co-occurrence matrix P: Temporal co-occurrence matrix P: N level by N level matrix, Element P[i, j] is the probability that a pixel with value i has value j in the next image (time point). N level by N level matrix, Element P[i, j] is the probability that a pixel with value i has value j in the next image (time point). Thirteen statistics calculated on P are used as features Thirteen statistics calculated on P are used as features

Temporal co-occurrence matrix (for image that does not change) 70004 06003 00902 00031 4321 Image at t0Image at t1

Temporal co-occurrence matrix (for image that changes) 13304 10503 51122 02011 4321 Image at t0Image at t1

Implementation of Temporal Texture Features Compare image pairs with different time interval,compute 13 temporal texture features for each pair. Compare image pairs with different time interval,compute 13 temporal texture features for each pair. Use the average and variance of features in each kind of time interval, yields 13*5*2=130 features Use the average and variance of features in each kind of time interval, yields 13*5*2=130 features T= 0s 45s 90s 135s 180s 225s 270s 315s 360s 405s …

Test: Evaluate ability of temporal textures to improve discrimination of similar protein patterns

Results for temporal texture and static features Dia1SdprAtp5a1Adfptimm23 Dia150355010 Sdpr987040 Atp5a1059500 Adfp202924 Timm23050888 Average Accuracy 85.1%

Conclusion Addition of temporal texture features improves classification accuracy of protein locations Addition of temporal texture features improves classification accuracy of protein locations

Generative models of subcellular patterns

Decomposing mixture patterns Clustering or classifying whole cell patterns will consider each combination of two or more “basic” patterns as a unique new pattern Clustering or classifying whole cell patterns will consider each combination of two or more “basic” patterns as a unique new pattern Desirable to have a way to decompose mixtures instead Desirable to have a way to decompose mixtures instead One approach would be to assume that each basic pattern has a recognizable combination of different types of objects One approach would be to assume that each basic pattern has a recognizable combination of different types of objects

Object-based subcellular pattern models Goals: Goals:  to be able to recognize “pure” patterns using only objects  to be able to recognize and unmix patterns consisting of two or more “pure” patterns  to enable building of generative models that can synthesize patterns from objects: needed for systems biology

Object type determination Rather than specifying object types, we chose to learn them from the data Rather than specifying object types, we chose to learn them from the data Use subset of SLFs to describe objects Use subset of SLFs to describe objects Perform k-means clustering for k from 2 to 40 Perform k-means clustering for k from 2 to 40 Evaluate goodness of clustering using Akaike Information Criterion Evaluate goodness of clustering using Akaike Information Criterion Choose k that gives lowest AIC Choose k that gives lowest AIC Zhao et al 2005

Unmixing: Learning strategy Once object types are known, each cell in the training (pure) set can be represented as a vector of the amount of fluorescence for each object type Once object types are known, each cell in the training (pure) set can be represented as a vector of the amount of fluorescence for each object type Learn probability model for these vectors for each class Learn probability model for these vectors for each class Mixed images can then be represented using mixture fractions times the probability distribution of objects for each class Mixed images can then be represented using mixture fractions times the probability distribution of objects for each class

Pure Golgi Pattern Pure Lysosomal Pattern 50% mix of each

Two-stage Strategy for unmixing unknown image Find objects in unknown (test) image, classify each object into one of the object types using learned object type classifier built with all objects from training images Find objects in unknown (test) image, classify each object into one of the object types using learned object type classifier built with all objects from training images For each test image, make list of how often each object type is found For each test image, make list of how often each object type is found Find the fractions of each class that give “best” match to this list Find the fractions of each class that give “best” match to this list

Test of unmixing Use 2D HeLa data Use 2D HeLa data Generate random mixture fractions for eight major patterns (summing to 1) Generate random mixture fractions for eight major patterns (summing to 1) Use these to synthesize “images” corresponding to these mixtures Use these to synthesize “images” corresponding to these mixtures Try to estimate mixture fractions from the synthesized images Try to estimate mixture fractions from the synthesized images Compare to true mixture fractions Compare to true mixture fractions

Results Given 5 synthesized “cell images” with any mixture of 8 basic patterns Given 5 synthesized “cell images” with any mixture of 8 basic patterns Average accuracy of estimating the mixture coefficients is 83% Average accuracy of estimating the mixture coefficients is 83% Zhao et al 2005

Overview Object Detection Real images objects Object type assigning Object type modeling Statistical models Object types

Generating images Object Detection Real images objects Object type assigning Object type modeling Statistical models Object types Sampling Generated images

Generating objects and images Object Detection Real images objects Object type assigning Object type modeling Statistical models Object morphology modeling Object types Sampling Generated images

LAMP2 pattern Nucleus Cell membrane Protein

Nuclear Shape - Medial Axis Model Rotate Medial axis Represented by two curves the medial axis width along the medial axis width

Synthetic Nuclear Shapes

With added nuclear texture

Cell Shape Description: Distance Ratio d1 d2 Capture variation as a principal components model

Generation

Small Objects Approximated by 2D Gaussian distribution Approximated by 2D Gaussian distribution

Object Positions d1 d2

Positions Logistic regression Logistic regression Generation Generation  Each pixel has a weight according to the logistic model

Fully Synthetic Cell Image

RealSynthetic

Conclusions and Future Work Object-based generative models useful for communicating information about subcellular patterns Object-based generative models useful for communicating information about subcellular patterns Work continues! Work continues!

Final word Goal of automated image interpretation should not be Goal of automated image interpretation should not be  Quantitating intensity or colocalization  Making it easier for biologists to see what’s happening Goal should be generalizable, verifiable, mechanistic models of cell organization and behavior automatically derived from images Goal should be generalizable, verifiable, mechanistic models of cell organization and behavior automatically derived from images

Computational Biology, Part 28 Automated Interpretation of Subcellular Patterns in Microscope Images III Robert F. Murphy Copyright  1996, 1999, 2000-2006.

Similar presentations

Presentation on theme: "Computational Biology, Part 28 Automated Interpretation of Subcellular Patterns in Microscope Images III Robert F. Murphy Copyright  1996, 1999, 2000-2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computational Biology, Part 28 Automated Interpretation of Subcellular Patterns in Microscope Images III Robert F. Murphy Copyright  1996, 1999, 2000-2006.

Similar presentations

Presentation on theme: "Computational Biology, Part 28 Automated Interpretation of Subcellular Patterns in Microscope Images III Robert F. Murphy Copyright  1996, 1999, 2000-2006."— Presentation transcript:

Similar presentations

About project

Feedback