Download presentation
Presentation is loading. Please wait.
Published byKory Garrett Modified over 9 years ago
1
Gene id GO0000166GO0000267... GO0001883 Feature_1Feature_2... Feature_t 962510 1 2.991-2.2960.68 2111 1 -0.013-1.0480.767 8451900 0 -0.8430.1551.396... 2362101...0-0.5310.843... 2.304 Ensembles of PCTs and Rule ensembles Predictive clustering relates gene annotations to phenotype properties extracted from images Dragi Kocev, Bernard Ženko, Petra Paul, Coenraad Kuijl, Jacques Neefjes, and Sašo Džeroski Jožef Stefan Institute, Dept. of Knowledge Technologies, Ljubljana, Slovenia {dragi.kocev, bernard.zenko, saso.dzeroski}@ijs.si Division of Cell Biology and Centre for Biomedical Genetics, NKI, Netherlands {p.paul, c.kuijl, j.neefjes}@nki.nl IntroductionData description Results Methodology Conclusions Induced by standard TDIDT algorithm Able to make a prediction for given structured output Heuristic score: minimization of intra-cluster variance Definition of a distance and prototype function for a given output Medoid is taken as a prototype in each leaf Cosine similarity as distance measure Predictive clustering trees Training set … 1 2 n 3 (Randomized) Decision tree algorithm n classifiers … n bootstrap replicates (Randomized) Decision tree algorithm (Randomized) Decision tree algorithm (Randomized) Decision tree algorithm Vote Ensemble prediction Convert tree to rules Optimization and regularization Set of rules Ensembles are able to lift the predictive performance of a single predictive clustering tree Random forests are efficient to learn Ensembles are not interpretable −conversion to fitted rule ensembles Each tree from the ensembles is converted to a set of rules Using some optimization techniques, select the rule set with best performance Easy to interpret by domain experts C1C2C3C4C5C6C7C8 Mean Cells Intensity StdIntensity GFP-1.661.03-0.77-0.93-1.31-1.660.48-1.6 Mean Cells Texture AngularSecondMoment GFP 50-1.41-1.480.540.220.880.48-0.75-0.45 Mean Cytoplasm Intensity IntegratedIntensityE GFP-1.240.25-0.230.540.20.072.34 Mean Cytoplasm Texture InfoMeas1 GFP 501.291.861.05-0.411.471.660.460.44 Mean Means ClassII per Cells AreaShape Eccentricity1.730.24-3.260.22-3.040.252.070.76 Mean Means ClassII per Cells Texture Entropy GFP 3-5.483.84-1.93-3.060.85-5.11-0.89-7.8 Mean Cells Children EE Count-1.470.08-1.381.92.637.77-1.73.47 Mean Means EE per Cells AreaShape Perimeter2.26-2.24-2.48-1.96-1.320.280.17-3.37 Mean Nuclei AreaShape Solidity1.03-0.69-1.631.140.21-1.18-0.66-0.67 Mean Means Golgi per Cells Intensity IntegratedIntensity RFP-0.50.65-1.69-0.93-1.55-1.2-0.42-0.34 Mean Means Golgi per Cells Intensity IntegratedIntensityE RFP-1.021.49-1.61-0.6-1.43-0.8-0.480.79 Mean Means Golgi per Cells RadialIntensityDist FracAtD RFP 20.87-0.9-2.09-0.34-2.95-3.820.83-1.26 Mean Means Golgi per Cells RadialIntensityDist FracAtD RFP 4-0.981.67-0.170.491.121.76-0.922.08 Size of Cluster38333547186 Genes involved in the defense response (GO0006952) and regulation of metabolic processes (GO0019222) Genes involved in receptor binding (GO0005102) and are present in the cytoplasm (GO0005737) siRNA screen performed on 269 genes, from which 20 were hypothetical Each gene is described by: −its annotation with terms from the Gene Ontology −resulting phenotypes (images from confocal microscopy) Only the GO terms that are used to annotate at least 1 gene from the ones analysed (in total - 334) CellProfiler for extracting features from the images: in total 700 features, from which 13 most relevant to the study are used Grouping genes with similar phenotypes upon siRNA mediated downregulation Phenotypes are described by features extracted from images using some free general-purpose or custom-made software (e.g., CellProfiler) siRNA screen designed to study MHC Class II antigen presentation −A major regulatory process in the immune system −Controls most aspects of the adaptive immune response −Strongly linked to almost all autoimmune diseases IF GO0006139 = 1 AND GO0065007 = 1 THEN Genes involved in regulation (GO0065007) and in particular cellular nucleobase, nucleoside, nucleotide and nucleic acid metabolic processes (GO0006139) Application of the predictive clustering paradigm for analysis of phenotype images from siRNA screen for MHC Class II antigen presentation The clusters and their descriptions are obtained in a single step Identified and described groups of genes which yield similar phenotypes upon siRNA mediated downregulation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.