An improved metric for the comparison of RNAi knockout phenotypes

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 4.
Computations have to be distributed !
Clustering approaches for high- throughput data Sushmita Roy BMI/CS 576 Nov 12 th, 2013.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Experimental and computational assessment of conditionally essential genes in E. coli Chao WANG, Oct
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Finding associated genes in large collections of microarrays.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Tag-based Social Interest Discovery
Web search basics (Recap) The Web Web crawler Indexer Search User Indexes Query Engine 1 Ad indexes.
EnrichNet: network-based gene set enrichment analysis Presenter: Lu Liu.
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Networks and Interactions Boo Virk v1.0.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Specific aim 1: correlation of DNA methylation with clinical traits age stage gradesurvivalProgression free interval Preliminary analyses didn’t show any.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Top X interactions of PIN Network A interactions Coverage of Network A Figure S1 - Network A interactions are distributed evenly across the top 60,000.
CS 478 – Tools for Machine Learning and Data Mining Clustering Quality Evaluation.
Chapter 2: Getting to Know Your Data
Cluster validation Integration ICES Bioinformatics.
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
IR 6 Scoring, term weighting and the vector space model.
1 What’s New on the Web? The Evolution of the Web from a Search Engine Perspective A. Ntoulas, J. Cho, and C. Olston, the 13 th International World Wide.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Université d’Ottawa / University of Ottawa 2003 Bio 8102A Applied Multivariate Biostatistics L4.1 Lecture 4: Multivariate distance measures l The concept.
IDENTIFYING CANCER SUBTYPES BASED ON SOMATIC MUTATION PROFILE BIOINFORMATICS SEMINAR 2016 SPRING YOUJIN SHIN.
CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells Presented by Nur Ata Bruss and Xinyi Ma.
Similarity Measures for Text Document Clustering
Finding associated genes in large collections of microarrays
Forward Genetic Screen for Genes Required for Embryonic Morphogenesis in C. elegans Alexander Miller1, Molly Jud1, Thalia Padilla1, Josh Lowry2, Bruce.
Networks and Interactions
gene-to-gene relationships & networks
Clustering of Web pages
Lecture 2-2 Data Exploration: Understanding Data
Vector-Space (Distributional) Lexical Semantics
Similarity and Dissimilarity
10.3 – Gene Linkage and Polyploidy
Genomes and Their Evolution
Chapter 15 Overview: Locating Genes Along Chromosomes.
Representation of documents and queries
Presented by Meeyoung Park
Morgan’s Experiment Sex-linked genes in Drosophila flies
From frequency to meaning: vector space models of semantics
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Chinaza Nnawulezi Research Mentors: Arjun Krishnan and Jianrong Wang
Volume 85, Issue 2, Pages (January 2015)
PheWAS and Beyond: The Landscape of Associations with Medical Diagnoses and Clinical Measures across 38,662 Individuals from Geisinger  Anurag Verma,
Volume 1, Issue 2, Pages (August 2015)
Working in the Post-Genomic C. elegans World
Volume 63, Issue 4, Pages (August 2016)
Group 9 – Data Mining: Data
Michal Levin, Tamar Hashimshony, Florian Wagner, Itai Yanai 
Volume 12, Issue 22, Pages (November 2002)
Label propagation algorithm
Cancer Cell Line Encyclopedia
Interactome Networks and Human Disease
Global analysis of the chemical–genetic interaction map.
Presentation transcript:

An improved metric for the comparison of RNAi knockout phenotypes XX

Background RNAi can effectively ‘knock out’ a gene Large-scale studies systematically perform RNAi on many genes, identify phenotypes Embryonic Lethal, Uncoordinated, Thin…

Background Phenotypes can be thought of as gene descriptors Each gene has a binary vector, with each entry corresponding to a single phenotype Classic information theory setup

Previous methods Classic approach: given a collection of genes, “eye them up” for common phenotypes Piano 2002. “Gene Clustering Based on RNAi Phenotypes of Ovary-Enriched Genes in C. elegans” Gunsalus 2004. “RNAiD and PhenoBlast: web tools for genome-wide phenotypic mapping projects.” Gunsalus 2005. “Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis”

Tested metrics PREVIOUS METRICS Pearson Correlation Uncentered Pearson Correlation Simple Match (1s) Simple Match (1s and 0s) NOVEL METRICS “Scaled Match” “Loss of function agreement score” IDF AND RELATED Inverse Document Frequency (IDF) Frequency Dot Product (FDP) Residual IDF Scaled IDF OTHER CanB Euclidean Distance Hamming Distance Jaccard Distance Mutual Information Rand Index

Precision/Recall

Network Degree Distributions

Shared Phenotypes per linked gene pair

Overview of subnetwork phenotypes

Number of enriched phenotypes per subnetwork

Subnetwork coverage of best GO category

Circularity Issues Go is basically built from knockout phenotypes Makes it very hard to evaluate predictions on a large scale 19/35 phenotypes overlap a GO category by at least 50% (several overlap a few) For example, 71 genes have the ‘Sluggish Movement’ (SLU) phenotype. Of these, 70 are in the ‘positive regulation of locomotion’ category, which itself is comprised of only 82 genes.

Future Work Smaller subnetworks (or clustering) How well does the new phenotype data integrate with other functional data (co-expression, p2p, genetic, combination)? Metric level Network level Triangle level Subnetwork level Look for interesting biology in 9 novel subnetworks