An improved metric for the comparison of RNAi knockout phenotypes XX
Background RNAi can effectively ‘knock out’ a gene Large-scale studies systematically perform RNAi on many genes, identify phenotypes Embryonic Lethal, Uncoordinated, Thin…
Background Phenotypes can be thought of as gene descriptors Each gene has a binary vector, with each entry corresponding to a single phenotype Classic information theory setup
Previous methods Classic approach: given a collection of genes, “eye them up” for common phenotypes Piano 2002. “Gene Clustering Based on RNAi Phenotypes of Ovary-Enriched Genes in C. elegans” Gunsalus 2004. “RNAiD and PhenoBlast: web tools for genome-wide phenotypic mapping projects.” Gunsalus 2005. “Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis”
Tested metrics PREVIOUS METRICS Pearson Correlation Uncentered Pearson Correlation Simple Match (1s) Simple Match (1s and 0s) NOVEL METRICS “Scaled Match” “Loss of function agreement score” IDF AND RELATED Inverse Document Frequency (IDF) Frequency Dot Product (FDP) Residual IDF Scaled IDF OTHER CanB Euclidean Distance Hamming Distance Jaccard Distance Mutual Information Rand Index
Precision/Recall
Network Degree Distributions
Shared Phenotypes per linked gene pair
Overview of subnetwork phenotypes
Number of enriched phenotypes per subnetwork
Subnetwork coverage of best GO category
Circularity Issues Go is basically built from knockout phenotypes Makes it very hard to evaluate predictions on a large scale 19/35 phenotypes overlap a GO category by at least 50% (several overlap a few) For example, 71 genes have the ‘Sluggish Movement’ (SLU) phenotype. Of these, 70 are in the ‘positive regulation of locomotion’ category, which itself is comprised of only 82 genes.
Future Work Smaller subnetworks (or clustering) How well does the new phenotype data integrate with other functional data (co-expression, p2p, genetic, combination)? Metric level Network level Triangle level Subnetwork level Look for interesting biology in 9 novel subnetworks