Gene Correlation Networks

Slides:



Advertisements
Similar presentations
12-3 RNA and Protein Synthesis
Advertisements

Chapter 6 Matrix Algebra.
Introductory Mathematics & Statistics for Business
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Winter Education Conference Consequential Validity Using Item- and Standard-Level Residuals to Inform Instruction.
Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Bayesian network for gene regulatory network construction
Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
(This presentation may be used for instructional purposes)
1 Generating Network Topologies That Obey Power LawsPalmer/Steffan Carnegie Mellon Generating Network Topologies That Obey Power Laws Christopher R. Palmer.
3 Logic The Study of What’s True or False or Somewhere in Between.
CHAPTER 6 Introduction to Graphing and Statistics Slide 2Copyright 2012, 2008, 2004, 2000 Pearson Education, Inc. 6.1Tables and Pictographs 6.2Bar Graphs.
The genetic dissection of complex traits
2009 Foster School of Business Cost Accounting L.DuCharme 1 Determining How Costs Behave Chapter 10.
CS1022 Computer Programming & Principles Lecture 8.1 Digraphs (1)
Basic Gene Expression Data Analysis--Clustering
Psychology Practical (Year 2) PS2001 Correlation and other topics.
25 seconds left…...
Determining How Costs Behave
Chapter 18: The Chi-Square Statistic
Simple Linear Regression Analysis
Correlation and Linear Regression
CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.
Outlines Background & motivation Algorithms overview
Andy Yip, Steve Horvath Depts Human Genetics and Biostatistics, University of California, Los Angeles The Generalized Topological.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Gene Expression Chapter 9.
DNA Microarray: A Recombinant DNA Method. Basic Steps to Microarray: Obtain cells with genes that are needed for analysis. Isolate the mRNA using extraction.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Steve Horvath, Andy Yip Depts Human Genetics and Biostatistics, University of California, Los Angeles The Generalized Topological.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Statistical Analysis of Microarray Data
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Analysis of microarray data
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Ai Li and Steve Horvath Depts Human Genetics and Biostatistics, University of California, Los Angeles Generalizations of.
(2) Ratio statistics of gene expression levels and applications to microarray data analysis Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat,
Differential Analysis & FDR Correction
“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA.
A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Data Type 1: Microarrays
Expression Modules Brian S. Yandell (with slides from Steve Horvath, UCLA, and Mark Keller, UW-Madison)
Scenario 6 Distinguishing different types of leukemia to target treatment.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Microarray: An Introduction
Graph clustering to detect network modules
M. Fu, G. Huang, Z. Zhang, J. Liu, Z. Zhang, Z. Huang, B. Yu, F. Meng 
Assessing Hierarchical Modularity in Protein Interaction Networks
Topological overlap matrix (TOM) plots of weighted, gene coexpression networks constructed from one mouse studies (A–F) and four human studies including.
Clustering.
Volume 3, Issue 1, Pages (July 2016)
Presentation transcript:

Gene Correlation Networks Jin Chen CSE891-001 Fall 2012

Gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product It is used by all known life Several steps in the gene expression process may be modulated transcription, RNA splicing, translation, and post-translational modification of a protein Gene regulation gives the cell control over structure and function, and is the basis for cellular differentiation, morphogenesis and the versatility and adaptability of any organism The process of transcription is carried out by RNA polymerase (RNAP), uses DNA (black) as a template and produces RNA (blue) http://en.wikipedia.org/wiki/Gene_expression

Gene expression detection Single gene expression detection Northern blots & RT-qPCR Genome-wide gene expression detection DNA microarray Next generation of sequencing, esp., RNA-seq Recent advances in microarray technology allow for the quantification, on a single array, of transcript levels for every known gene in several organism's genomes, including humans.

DNA microarray Microarray consists of an arrayed series of thousands of probes Probe-target hybridization is usually detected and quantified to determine relative abundance of nucleic acid sequences in the target One cDNA sample was labelled with red fluorophore, the other cDNAs with green fluorophore Selective hybridization of cDNA from either sample to a DNA spot produces red or green signal Hybridization of cDNA from both RNA samples produces yellow signal Since an array can contain tens of thousands of probes, a microarray experiment can accomplish many genetic tests in parallel. Therefore arrays have dramatically accelerated many types of investigation. Valerie Reinke, WormBook. http://www.wormbook.org/chapters/www_germlinegenomics/germlinegenomics.html

Normalization A microarray experiment is performed under the assumption that gene intensities reflect actual mRNA levels But raw gene expression intensities are highly influenced by a number of non-biological sources of variation Normalization and quantification of differential expression in gene expression microarrays Thus, for achieving biologically meaningful data, computational preprocessing including normalization steps is essential C. Steinhoff et al, BRIEFINGS IN BIOINFORMATICS (2006). VOL 7. NO 2. 166-177

RNA-seq To use the next generation of sequencing (NGS) technologies to sequence cDNA in order to get information about a sample's RNA content NGS technologies generate millions of short reads from a library of nucleotide sequences NGS technologies generate millions of short reads from a library of nucleotide sequences, whether they come from DNA, RNA, or a mixture

Gene co-expression network Construction of co-expression networks from gene expression datasets has become a popular alternative to the conventional analytic approaches Large-scale gene co-expression networks have been used, e.g. to demonstrate that functionally related genes are frequently co-expressed across multiple datasets and across different organisms By constructing separate co-expression networks for different conditions, such as normal and cancerous states, it is possible to identify disease-mediated changes in the network connectivity patterns L. Elo et al. Bioinformatics (2007) Vol 23, Iss. 16 Pp. 2096-2103

http://www.functionalnet.org/

Gene co-expression network Definition: a gene co-expression network is a graph, where each node corresponds to a gene and a pair of nodes is connected with an undirected edge if their pair-wise expression similarity is above a particular threshold “standard” methods for network construction Computation of co-expression: Pearson correlation Edge threshold: pre-defined cutoff value Statistical significance test: Student's t-test

Pearson correlation Pearson correlation is a measure of the correlation (linear dependence) between two variables X and Y, giving a value between +1 and −1 inclusive For uncentered data, the Pearson correlation coefficient corresponds with the the cosine of the angle φ between both possible regression lines y=gx(x) and x=gy(y).

Unweighted gene co-expression network Measure concordance of gene expression with a Pearson correlation Pearson correlation matrix is dichotomized to arrive at an adjacency matrix Binary values in the adjacency matrix correspond to an unweighted network Bin Zhang and Steve Horvath (2005) Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1

Weighted gene co-expression network Bin Zhang and Steve Horvath (2005) Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1

Weighted vs. unweighted Weighted Network View Unweighted View All genes are connected A subset of genes are connected Connection widths=connection strengths All connections are equal Hard threshold may lead to an information loss. If 2 genes are correlated with score 0.79, then they are disconnected with regard to a threshold of 0.8

Adjacency matrix A network can be represented by an adjacency matrix, A=[aij], that encodes whether/how a pair of nodes is connected A is a symmetric matrix with entries in [0,1] For unweighted network, entries are 1 or 0 depending on whether or not 2 nodes are adjacent (connected) For weighted networks, the adjacency matrix reports the connection strength between gene pairs

Generalized connectivity Gene connectivity = row sum of the adjacency matrix For unweighted networks, it is the number of direct neighbors For weighted networks, it is the sum of connection strengths to other nodes:

Adjacency matrix Measure co-expression with Pearson correlation s(i,j) for gene i & j Define an adjacency matrix A(i,j) with adjacency function AF(s(i,j)). 2 classes of AF Step function AF(s)=I(s>tau) with parameter tau (unweighted network) Power function AF(s)=sb with parameter b The choice of the AF parameters (tau, b) determines the properties of the network AF is a monotonic function

Compare power adjacency functions with step function =connection strength AF(s)=sb Gene Co-expression Similarity

Choosing parameters for adjacency function AF A) Consider only those parameter values that result in approximate scale-free topology B) Select the parameters that result in the highest mean number of connections Motivated by the finding that most biological networks have been found to exhibit a scale free topology Leads to high power for detecting modules (clusters of genes) and hub genes

Trade-off between criterion A and B when varying tau Step Function: I(s>tau) criterion A: fit R^2 criterion B: mean connectivity

Module identification in gene correlation networks One important aim of network analysis is to detect subsets of nodes (modules) that are tightly connected to each other Modules are groups of nodes that have high topological overlap based on the notion of topological overlap Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL (2002) “Hierarchical organization of modularity in metabologic networks”. Science Vol 297 pp1551-1555

Topological Overlap Matrix (TOM) The topological overlap matrix (TOM) Ω= [wij] is a similarity measure for biological networks: Note that wij = 1 if the node with fewer connections satisfies two conditions: (a) all of its neighbors are also neighbors of the other node and (b) it is connected to the other node. In contrast, wij = 0 if i and j are un-connected and the two nodes do not share any neighbors. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL (2002) “Hierarchical organization of modularity in metabologic networks”. Science Vol 297 pp1551-1555

Steps for defining gene modules Define a dissimilarity measure between 2 genes dissim(i,j)=1-abs(correlation) network community=1-Topological Overlap Matrix (TOM) Use the dissimilarity in hierarchical clustering Define modules as branches of the hierarchical clustering tree Visualize the modules and the clustering results in a heatmap plot Heatmap

Using the TOM matrix to cluster genes To group nodes with high topological overlap into modules, use average linkage hierarchical clustering coupled with the TOM distance measure Once a dendrogram is obtained from a hierarchical clustering method, choose a height cutoff to arrive at a clustering Modules correspond to branches of the dendrogram TOM plot Genes correspond to rows and columns TOM matrix Hierarchical clustering dendrogram Module: Correspond to branches

Module-centric view (intramodular connectivity) v. s Module-centric view (intramodular connectivity) v.s. whole network view (whole network connectivity) Traditional view based on whole network connectivity Module view based on within module connectivity In many applications, intramodular connectivity is biologically and mathematically more meaningful than whole network connectivity Mathematical Facts in gene co-expression networks Hub genes are always module genes in co-expression networks. Most module genes have high connectivity

1) Cancer modules can be independently validated Module structure is highly preserved across data sets 55 Brain Tumors VALIDATION DATA: 65 Brain Tumors Messages: 1) Cancer modules can be independently validated 2) Modules in brain cancer tissue can also be found in normal, non-brain tissue --> Insights into the biology of cancer Normal brain (adult + fetal) Normal non-CNS tissues Horvath et al PNAS 2006 vol. 103 no. 46 17402-17407

http://www. genetics. ucla http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Softwares/WGCNA/

Conclusion Gene co-expression network analysis can be interpreted as the study of the Pearson correlation matrix Connectivity can be used to single out important genes Weak relationship with principal or independent component analysis Network methods focus on “local” properties Open questions What is the mathematical meaning of the scale free topology criterion? Alternative connectivity measures, network distance measures Which and how many genes to target to disrupt a disease module?