Download presentation
Presentation is loading. Please wait.
1
Computational Biology, Part 28 Automated Interpretation of Subcellular Patterns in Microscope Images III Robert F. Murphy Copyright 1996, 1999, 2000-2006. All rights reserved.
2
Results
3
1. Create sets of images showing the location of many different proteins (each set defines one class of pattern) 2. Reduce each image to a set of numerical values (“features”) that are insensitive to position and rotation of the cell 3. Use statistical classification methods to “learn” how to distinguish each class using the features Supervised learning of patterns
4
ER TubulinDNATfR Actin Nucleolin MitoLAMP gpp130giantin Boland & Murphy 2001 2D Images of 10 Patterns (HeLa)
5
Evaluating Classifiers Divide ~100 images for each class into training set and test set Divide ~100 images for each class into training set and test set Use the training set to determine rules for the classes Use the training set to determine rules for the classes Use the test set to evaluate performance Use the test set to evaluate performance Repeat with different division into training and test Repeat with different division into training and test Evaluate different sets of features chosen as most discriminative by feature selection methods Evaluate different sets of features chosen as most discriminative by feature selection methods Evaluate different classifiers Evaluate different classifiers
6
2D Classification Results Overall accuracy = 92% TrueClass Output of the Classifier DNAERGiaGppLamMitNucActTfRTub DNA99100000000 ER09700020001 Gia00917000020 Gpp001482002010 Lam001088100100 Mit03000920033 Nuc00000099010 Act000000010000 TfR010012201812 Tub12000100195 Murphy et al 2000; Boland & Murphy 2001; Huang & Murphy 2004
7
Human Classification Results Overall accuracy = 83% TrueClass Output of the Classifier DNAERGiaGppLamMitNucActTfRTub DNA100000000000 ER09000360000 Gia005636330000 Gpp005433000030 Lam006073000200 Mit03000960003 Nuc000000100000 Act000000010000 TfR013003000830 Tub03000003093 Murphy et al 2003
8
Computer vs. Human
9
3D HeLa cell images GiantinNuclearERLysosomalgpp130 ActinMitoch.NucleolarTubulinEndosomal Images collected using facilities at the Center for Biologic Imaging courtesy of Simon Watkins Velliste & Murphy 2002
10
3D Classification Results Overall accuracy = 98% TrueClass Output of the Classifier DNAERGiaGppLamMitNucActTfRTub DNA98200000000 ER010000000000 Gia001000000000 Gpp00096400000 Lam00049500002 Mit00200960200 Nuc000000100000 Act000000010000 TfR00002000962 Tub02000000098 Velliste & Murphy 2002; Chen & Murphy 2004
11
Unsupervised Learning to Identify High-Resolution Protein Patterns
12
Location Proteomics Tag many proteins Tag many proteins We have used CD-tagging (developed by Jonathan Jarvik and Peter Berget): Infect population of cells with a retrovirus carrying DNA sequence that will “tag” in a random gene in each cell Isolate separate clones, each of which produces express one tagged protein Isolate separate clones, each of which produces express one tagged protein Use RT-PCR to identify tagged gene in each clone Use RT-PCR to identify tagged gene in each clone Collect many live cell images for each clone using spinning disk confocal fluorescence microscopy Collect many live cell images for each clone using spinning disk confocal fluorescence microscopy Jarvik et al 2002
13
What Now? Group ~90 tagged clones by pattern
14
Solution: Group them automatically How? How? SLF features can be used to measure similarity of protein patterns SLF features can be used to measure similarity of protein patterns This allows us for the first time to create a systematic, objective, framework for describing subcellular locations: a Subcellular Location Tree This allows us for the first time to create a systematic, objective, framework for describing subcellular locations: a Subcellular Location Tree Start by grouping two proteins whose patterns are most similar, keep adding branches for less and less similar patterns Start by grouping two proteins whose patterns are most similar, keep adding branches for less and less similar patterns Chen et al 2003; Chen and Murphy 2005
15
Protein name Human description From databases http://murphylab.web.cmu.edu/services/PSLID/tree.html
17
Nucleolar Proteins
18
Punctate Nuclear Proteins
19
Predominantly Nuclear Proteins with Some Punctate Cytoplasmic Staining
20
Nuclear and Cytoplasmic Proteins with Some Punctate Staining
21
Uniform
22
Bottom: Visual Assignment to “known” locations Top: Automated Grouping and Assignment Protein name http://murphylab.web.cmu.edu/services/PSLID/tree.html
23
Refining clusters using temporal textures
24
Incorporating Temporal Information Time series images could be useful for Time series images could be useful for Distinguishing proteins that are not distinguishable in static images Analyzing protein movement in the presence of drugs, or during different stages of the cell cycle Need approach that does not require detailed understanding of the objects/organelles in which each protein is located Need approach that does not require detailed understanding of the objects/organelles in which each protein is located Generic Object Tracking approach? Not all proteins in discernible objects Not all proteins in discernible objects Non-tracking approach needed
25
Texture Features Haralick texture features describe the correlation in intensity of pixels that are next to each other in space. Haralick texture features describe the correlation in intensity of pixels that are next to each other in space. These have been valuable for classifying static patterns. Temporal texture features describe the correlation in intensity of pixels in the same position in images next to each other over time. Temporal texture features describe the correlation in intensity of pixels in the same position in images next to each other over time.
26
Temporal Textures based on Co-occurrence Matrix Temporal co-occurrence matrix P: Temporal co-occurrence matrix P: N level by N level matrix, Element P[i, j] is the probability that a pixel with value i has value j in the next image (time point). N level by N level matrix, Element P[i, j] is the probability that a pixel with value i has value j in the next image (time point). Thirteen statistics calculated on P are used as features Thirteen statistics calculated on P are used as features
27
Temporal co-occurrence matrix (for image that does not change) 70004 06003 00902 00031 4321 Image at t0Image at t1
28
Temporal co-occurrence matrix (for image that changes) 13304 10503 51122 02011 4321 Image at t0Image at t1
29
Implementation of Temporal Texture Features Compare image pairs with different time interval,compute 13 temporal texture features for each pair. Compare image pairs with different time interval,compute 13 temporal texture features for each pair. Use the average and variance of features in each kind of time interval, yields 13*5*2=130 features Use the average and variance of features in each kind of time interval, yields 13*5*2=130 features T= 0s 45s 90s 135s 180s 225s 270s 315s 360s 405s …
30
Test: Evaluate ability of temporal textures to improve discrimination of similar protein patterns
31
Results for temporal texture and static features Dia1SdprAtp5a1Adfptimm23 Dia150355010 Sdpr987040 Atp5a1059500 Adfp202924 Timm23050888 Average Accuracy 85.1%
32
Conclusion Addition of temporal texture features improves classification accuracy of protein locations Addition of temporal texture features improves classification accuracy of protein locations
33
Generative models of subcellular patterns
34
Decomposing mixture patterns Clustering or classifying whole cell patterns will consider each combination of two or more “basic” patterns as a unique new pattern Clustering or classifying whole cell patterns will consider each combination of two or more “basic” patterns as a unique new pattern Desirable to have a way to decompose mixtures instead Desirable to have a way to decompose mixtures instead One approach would be to assume that each basic pattern has a recognizable combination of different types of objects One approach would be to assume that each basic pattern has a recognizable combination of different types of objects
35
Object-based subcellular pattern models Goals: Goals: to be able to recognize “pure” patterns using only objects to be able to recognize and unmix patterns consisting of two or more “pure” patterns to enable building of generative models that can synthesize patterns from objects: needed for systems biology
36
Object type determination Rather than specifying object types, we chose to learn them from the data Rather than specifying object types, we chose to learn them from the data Use subset of SLFs to describe objects Use subset of SLFs to describe objects Perform k-means clustering for k from 2 to 40 Perform k-means clustering for k from 2 to 40 Evaluate goodness of clustering using Akaike Information Criterion Evaluate goodness of clustering using Akaike Information Criterion Choose k that gives lowest AIC Choose k that gives lowest AIC Zhao et al 2005
37
Unmixing: Learning strategy Once object types are known, each cell in the training (pure) set can be represented as a vector of the amount of fluorescence for each object type Once object types are known, each cell in the training (pure) set can be represented as a vector of the amount of fluorescence for each object type Learn probability model for these vectors for each class Learn probability model for these vectors for each class Mixed images can then be represented using mixture fractions times the probability distribution of objects for each class Mixed images can then be represented using mixture fractions times the probability distribution of objects for each class
38
Pure Golgi Pattern Pure Lysosomal Pattern 50% mix of each
39
Two-stage Strategy for unmixing unknown image Find objects in unknown (test) image, classify each object into one of the object types using learned object type classifier built with all objects from training images Find objects in unknown (test) image, classify each object into one of the object types using learned object type classifier built with all objects from training images For each test image, make list of how often each object type is found For each test image, make list of how often each object type is found Find the fractions of each class that give “best” match to this list Find the fractions of each class that give “best” match to this list
40
Test of unmixing Use 2D HeLa data Use 2D HeLa data Generate random mixture fractions for eight major patterns (summing to 1) Generate random mixture fractions for eight major patterns (summing to 1) Use these to synthesize “images” corresponding to these mixtures Use these to synthesize “images” corresponding to these mixtures Try to estimate mixture fractions from the synthesized images Try to estimate mixture fractions from the synthesized images Compare to true mixture fractions Compare to true mixture fractions
41
Results Given 5 synthesized “cell images” with any mixture of 8 basic patterns Given 5 synthesized “cell images” with any mixture of 8 basic patterns Average accuracy of estimating the mixture coefficients is 83% Average accuracy of estimating the mixture coefficients is 83% Zhao et al 2005
42
Overview Object Detection Real images objects Object type assigning Object type modeling Statistical models Object types
43
Generating images Object Detection Real images objects Object type assigning Object type modeling Statistical models Object types Sampling Generated images
44
Generating objects and images Object Detection Real images objects Object type assigning Object type modeling Statistical models Object morphology modeling Object types Sampling Generated images
45
LAMP2 pattern Nucleus Cell membrane Protein
46
Nuclear Shape - Medial Axis Model Rotate Medial axis Represented by two curves the medial axis width along the medial axis width
47
Synthetic Nuclear Shapes
48
With added nuclear texture
49
Cell Shape Description: Distance Ratio d1 d2 Capture variation as a principal components model
50
Generation
51
Small Objects Approximated by 2D Gaussian distribution Approximated by 2D Gaussian distribution
52
Object Positions d1 d2
53
Positions Logistic regression Logistic regression Generation Generation Each pixel has a weight according to the logistic model
54
Fully Synthetic Cell Image
55
RealSynthetic
56
Conclusions and Future Work Object-based generative models useful for communicating information about subcellular patterns Object-based generative models useful for communicating information about subcellular patterns Work continues! Work continues!
57
Final word Goal of automated image interpretation should not be Goal of automated image interpretation should not be Quantitating intensity or colocalization Making it easier for biologists to see what’s happening Goal should be generalizable, verifiable, mechanistic models of cell organization and behavior automatically derived from images Goal should be generalizable, verifiable, mechanistic models of cell organization and behavior automatically derived from images
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.