Presentation is loading. Please wait.

Presentation is loading. Please wait.

Image Interpretation Methods for Protein Location in Cells Meel Velliste Murphy Lab Dept. of Biomedical Engineering Carnegie Mellon University Copyright.

Similar presentations


Presentation on theme: "Image Interpretation Methods for Protein Location in Cells Meel Velliste Murphy Lab Dept. of Biomedical Engineering Carnegie Mellon University Copyright."— Presentation transcript:

1 Image Interpretation Methods for Protein Location in Cells Meel Velliste Murphy Lab Dept. of Biomedical Engineering Carnegie Mellon University Copyright  2002

2 Introduction Image source http://www.biologie.uni-hamburg.de/b-online/library/bio201/cellfrlife.html

3 Introduction Sequence databases allow search by similarity Database GSNWLAMQLT yfbI Rv2560 fliR The same is true for protein structure databases

4 Introduction Sequence databases allow search by similarity Database ? ? ? The same is true for protein structure databases How about protein location?

5 Basic Idea in Sequence Comparison M A T N W G S L L Q M D T N P V S L L R 5 -1 3 2 -9 4 2 1 1 -3 Similarity Matrix 25.7

6 Location Info in Databases Unstructured text - most databases Standardized keywords - YPD Fluorescence microscope images - TRIPLES, YPL.db Numerical descriptors needed 7.84 -0.097 24 1 2.3 -31.03 -2

7 Subcellular Location Features (SLF) 49 Zernike Moments 13 Haralick Texture Features 22 Morphological Features - derived from morphological image processing: –Object finding –Edge finding –Convex Hulls

8 Morphological Features Area Distance from COF Distance from DNA COF 894 89 102 252 23 12

9 Some Example Features –Number of Objects –Euler Number –Average Object Size –Standard Deviation of Object sizes –Ratio of the Largest to the Smallest Object Size –Average Distance of Objects from COF –Standard Deviation of Object Distances from COF –Ratio of the Largest to Smallest Object Distance

10 DNA Features –The average object distance from the COF of the DNA image –The variance of object distances from the DNA COF –The ratio of the largest to the smallest object to DNA COF distance –The distance between the protein COF and the DNA COF –The ratio of the area occupied by protein to that occupied by DNA –The fraction of the protein fluorescence that co-localizes with DNA

11 Ten Major Classes of Protein Location

12 Classification Numerical Features computed from each image This is a Microtubule pattern feature1 feature2... featureN Image1 0.3489 0.1294... 1.9012 Image2 0.4985 0.4823... 1.8390... ImageM 1.8245 0.8290... 0.9018 Artificial Neural Network classifies the image 83% Accuracy achieved

13 Goals Implement new 2D features and improve Haralick texture features Test performance on mixtures of more than one cell type and more than one microscopy source Extend features to 3D Develop Object-level classification

14 Skeleton Features: –Length of skeleton –Number of branch points –Fraction of object area taken up by skeleton Fraction of fluorescence below threshold New Features (SLF7)

15 Based on gray-level co-occurrence If image has G gray-levels: –Compute G x G co-occurrence probability matrix P( i, j) –Compute features by summing and differencing the matrix Features highly dependent on: –Number of gray-levels –Pixel resolution Haralick Texture Features

16 Percent Benefit of Texture Features Baseline accuracy = 86.4%

17 Solution Always down-sample and re- quantize to: –1.15 um/pixel –256 gray-levels Resolution-independent robust classification possible

18 Original Image 256 Gray-levels, 0.23 um/pixel

19 Down-sampled 256 Gray-levels, 1.15 um/pixel

20 Classification Results with SLF8 Overall accuracy = 88%

21 Classification of Images from Mixed Sources Overall accuracy = 92% 97102Tubul 28981Lyso 23951Golgi 73188DNA TubulLysoGolgiDNA True Class Predicted Class

22 Extending to 3D Results for 2-D images can be dependent on the z-position of the slice BOTTOMTOP

23 Extending to 3D Features sensitive to 3D distribution will be needed for polarized cells (e.g. epithelial cells) Proteins may distribute differently to the basolateral and apical surfaces

24 Actin (Microfilament)

25 Tubulin (Microtubule)

26 Mitochondrial

27 Endoplasmic Reticulum (ER)

28 TfR (Endosomal)

29 LAMP2 (Lysosomal)

30 Giantin (Golgi)

31 gpp130 (Golgi)

32 Nucleolin (Nucleolar)

33 DNA (Nuclear)

34 Total-Protein (Cytoplasmic)

35 Features for 3-D Images Used a subset of the same Morphological features as used with 2-D patterns: –Number of Objects –Euler Number –Average Object Size –Standard Deviation of Object sizes –Ratio of the Largest to the Smallest Object Size –Average Distance of Objects from COF –Standard Deviation of Object Distances from COF –Ratio of the Largest to Smallest Object Distance

36 Separating Components of Distance Features Can separate out Horizontal and Vertical components of distance –2D euclidean for x and y –Signed 1D distance for z Some morphological features involve measures of distance –e.g., Average distance of objects from the COF of DNA

37 Classification with 3D-SLF9 Features 10 classes, Overall accuracy = 91%

38 Classification with 3D-SLF9 Features 11 classes, Overall accuracy = 91%

39 11 classes, Overall accuracy = 94% …with 9 Selected 3D-SLF9 Features

40 2D Classification with 14 SLF2 Features 11 classes, Overall accuracy = 88%

41 Set size 9, Overall accuracy = 99.7% Classification of Sets of 3D Images

42 Conclusions For accurate determination of subcellular location: –High resolution microscopy is essential –3D images have an advantage over 2D images –SDA can achieve severely sub-optimal results

43 Protein Subcellular Location Patterns can be represented as Numerical Vectors Can be computed from either 2D or 3D images Features are robust to different microscopy methods or cell types Conclusions

44 Feature Extractor 38.1 Quantitative comparison of location is possible 7.84 -0.097 24 1 2.3 -31.03 -2 2.19 +0.271 98 8 0.9 -11.21 0 Roques and Murphy (2002)

45 Protein databases can be searched by similarity of location Database crp21 froX CAP-9 Conclusions

46 Automated interpretation of location patterns is possible: –Automated classification of location patterns (Boland and Murphy, 2001; Murphy et al. 2001; Velliste and Murphy, 2002) –Automated choice of representative images (Markey et al. 1999) –Rigorous statistical comparison of imaging experiments (Roques and Murphy, 2002) –Building a “family tree” of protein location Conclusions

47 Acknowledgements Robert F. Murphy - for being a great thesis advisor Michael V. Boland - founding work on 2D Subcellular Location Features Simon Watkins and the staff at the Center for Biologic Imaging at UPitt - providing the facilities for and assisting with microscopy Aaron C. Rising - help with collecting 3D images Gregory Porreca - improving Haralick features and classifying mixed image sets


Download ppt "Image Interpretation Methods for Protein Location in Cells Meel Velliste Murphy Lab Dept. of Biomedical Engineering Carnegie Mellon University Copyright."

Similar presentations


Ads by Google