Computing Co-Expression Relationships Wen-Dar Lin.

Slides:



Advertisements
Similar presentations
The Maize Inflorescence Project Website Tutorial Nov 7, 2014.
Advertisements

Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
Improving miRNA Target Genes Prediction Rikky Wenang Purbojati.
Pathways & Networks analysis COST Functional Modeling Workshop April, Helsinki.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Sandrine Dudoit1 Microarray Experimental Design and Analysis Sandrine Dudoit jointly with Yee Hwa Yang Division of Biostatistics, UC Berkeley
Microarray Data Preprocessing and Clustering Analysis
VIPAS: Virtual Link Powered Authority Search in the Web Chi-Chun Lin and Ming-Syan Chen Network Database Laboratory National Taiwan University.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Data Extraction cDNA arrays Affy arrays. Stanford microarray database.
Fuzzy K means.
Microarray Analysis Software at NIH. BRB ArrayTools Visualization and Statistical analysis of gene expression data Features –Excel Add-in –Flexible Data.
Testing an individual module
Lecture Nine Database Planning, Design, and Administration
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Quantitative Genetics
Distribution of Marks Internal Sessional Evaluation Assignments – 10 Quizzes – 10 Class Participation Attendence – 5 Mid – Term Test – 25 External Evaluation.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
ICBP, Stanford University 1 Implication Networks from Large Gene-expression Datasets Debashis Sahoo PhD Candidate, Electrical Engineering, Stanford University.
Microsoft Access Lecture -13- By lec. (Eng.) Hind Basil University of Technology Department of Materials Engineering 1.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
ITEC224 Database Programming
Graph and Topological Structure Mining on Scientific Articles Fan Wang, Ruoming Jin, Gagan Agrawal and Helen Piontkivska The Ohio State University The.
Summary Data Modeling SDLC What is Data Modeling Application Audience and Services Entities Attributes Relationships Entity Relationship Diagrams Conceptual,Logical.
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis Species: C. Elegans Project: Advanced.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
RNAseq analyses -- methods
Agenda Introduction to microarrays
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Data Analysis Project Advanced Bioinformatics BIF
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
UBio Training Courses Micro-RNA web tools Gonzalo
Supplementary Figure S1 eQTL prior model modified from previous approaches to Bayesian gene regulatory network modeling. Detailed description is provided.
Today’s Agenda  Reminder: HW #1 Due next class  Quick Review  Input Space Partitioning Software Testing and Maintenance 1.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Contribution of Epigenetic Variation to Expression Changes Among Tissues and Genotypes Steve Eichten – Springer Lab PAG iPlant Workshop 1/17/12.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
Statistical Testing with Genes Saurabh Sinha CS 466.
A B Supporting Information Figure S1: Distribution of the density of expression intensities for the complete microarray dataset (A) and after removal of.
Visualization Four groups Design pattern for information visualization
Cluster validation Integration ICES Bioinformatics.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Equivalent Opposite PTPRC low  CD19 low FAM60A low  NUAK1 high XIST high  RPS4Y1 low COL3A1 high  SPARC high Boolean analysis of large gene-expression.
Microarray Data Analysis The Bioinformatics side of the bench.
Network Analysis Goal: to turn a list of genes/proteins/metabolites into a network to capture insights about the biological system 1.Types of high-throughput.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Correlation Analysis. 2 Introduction Introduction  Correlation analysis is one of the most widely used statistical measures.  In all sciences, natural,
Ingenuity Pathway Analysis Alex Pico. Description "IPA is a software application that enables researchers to analyze and understand the complex biological.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Using ArrayExpress.
Statistical Testing with Genes
Content-Based Image Retrieval
Content-Based Image Retrieval
Batyr Charyyev.
Statistical Testing with Genes
Cancer Cell Line Encyclopedia
Bootstrapping and Bootstrapping Regression Models
Presentation transcript:

Computing Co-Expression Relationships Wen-Dar Lin

Contents Motivation Basic Idea Case Studies –An Example of Single Experiment –An Example of Time-Course Experiment Potential Applications Availability Future Works

Motivation Given a set of differentially displayed genes that are reported by an array experiment. –We would like to know relationships among these genes. –These relationships may recover important modules or motifs with respect to the experiment.

Motivation Co-expression relationships are one kind of the most biologically meaningful and easily computable relationships. –Co-expression relationships form modules that may infer important biological information. –They can be computed from a large amount of publicly available array data.

Basic Idea Array data can be retrieved from publicly available data repository –like the NASCarrays, NCBI GEO, EMBL-EBI ArrayExpress They should be normalized before computing the co-expression relationships. –e.g. normalized by the RMA method

Basic Idea Defining co-expression relationships –We define that a co- expression relationship between two genes exists if the pearson correlation coefficient between their normalized expression levels is greater than or equal to a certain threshold. slide #1234 … gene X12103 … gene Y52124 … X Y

Basic Idea Properties of pearson correlation coefficient –Let Correl(A, B) be the pearson correlation coefficient between normalized expression levels of gene A and gene B. –0   Correl(A, B)   1 from negative correlation

Basic Idea The computational assistance –Given a set of interested genes –Compute co-expression relationships among them –Identify co-expression clusters

Case Studies We have implemented aforementioned ideas into a tool kit and applied it to two case studies. –A single experiment –A time-course experiment

A Single Experiment In this example, an array experiment was performed –178 differentially displayed genes were identified. –Based on RMA array data of 300 ATH1 slides downloaded from the NASCarrays sample of each slide was derived nonexclusively from roots Threshold for pearson correlation coefficient = 0.7

A Single Experiment Two larger clusters One minor subcluster

A Single Experiment We may compute co-expression relationships based on all kinds of array experiment data –Based on RMA array data of 1436 ATH1 slides downloaded from the TAIR, co-expression relationships were identified Threshold for pearson correlation coefficient = 0.7

A Single Experiment Two larger clusters

A Single Experiment Is there any difference between the graphs based on root-array data and that based on all-array data? –By differentially marking clusters of one graph onto the other graph.

A Single Experiment Two clusters mapped by the other graph One cluster that should be root-specific

A Single Experiment Cluster sizes: 47 & 14 Cluster size: 9

A Single Experiment Some remarks –The number of differentially displayed genes reported by the experiment is 178 –The number of clustered genes is = 70 Reduced by more than 50% –The co-expression relationships are recovered Each cluster may be a module that usually work together. –Finding tissue-specific co-expression relationships Can be done by mapping the graph based on all-array data onto the graph based on tissue-related-array data.

A Single Experiment In addition to cluster genes according to co-expression relationships, we also fished genes that may potentially co-expressed. –These genes may not be identified as differentially displayed in the experiment.

A Single Experiment A GO enrichment analysis was also carried out –using the GOBU software (gobu.iis.sinica.edu.tw) –which should give a conceptual view of clustered genes.

A time-course experiment In this example, a time-course array experiment was performed –Three time points –About 800 genes differentially displayed at least one time point. –Based on array data of 300 ATH1 slides extracted from RMA array data of about 2600 ATH1 slides downloaded from the NASCarrays Threshold for pearson correlation coefficient = 0.8

A time-course experiment Time point 1 About 100 genes

A time-course experiment Time point 2 About 100 genes

A time-course experiment Time point 3 About 100 genes

A time-course experiment Though this clustering and time-course expression data shows some biological meaning, –this size of clustered genes (more than 200) makes the graph too complex and is too large to be realized in a short time.

A time-course experiment Reducing the size of clustered genes may help –reducing complexity of the graph and –realizing revealed co-expression module We reduced the graph by removing co-expression relationships that generally exist in the entire plant –based on RMA array data of about 2600 ATH1 slides downloaded from the NASCarrays –Threshold for pearson correlation coefficient = 0.7

A time-course experiment Edges (relationships) to be removed Y root-related others X

A time-course experiment Edges (relationships) to be retained Y root-related others X

A time-course experiment About 20 genes About 60 genes About 50 genes Time point 1

A time-course experiment About 20 genes About 60 genes About 50 genes Time point 2

A time-course experiment About 20 genes About 60 genes About 50 genes Time point 3

A time-course experiment Some remarks –The number of differentially displayed genes at least one time point is about 800. –The number of clustered genes is about = 130 Reduced by more than 80% –The retained graph contains edges, i.e., gene pairs, that are co-expressed in root but not in the entire plant The recovered clusters should be root specific.

Potential Applications We have created a tool kit that –computes co-expression relationships based on array data where probe names can be replaced by aliases made by something like orthologous mapping can be used for studying non-model organism using array data of a model organism.

Potential Applications We have created a tool kit that –fills colors according to graphs by intensity fold-changes, or clusters in another graph

Potential Applications We have created a tool kit that –removes/retains co-expression relationships in another graph –finds specific or common co-expression relationships 200 genes120 genes

Potential Applications We have created a tool kit that –fishes genes that are potentially co-expressed with assigned bait

Future Works Incorporate pathway database –like the AraCyc –for finding relationships between co-expression clusters and known pathways A user-friendly interface which would –facilitate using this tool kit and –help manage output data

Availability The tool kit is now an open-source project – –Project name: MACCU Multi-Array Correlation Computation Utility –A detailed description of each program module has been created. –A running script with example is provided.

Special Thanks I would like to thank –Drs. Chang (Bill), Schmidt & Wu for raising this idea, the initial implementation, and valuable comments.

Thank you!