Presentation is loading. Please wait.

Presentation is loading. Please wait.

Identifying Causal Genes and Dysregulated Pathways in Complex Diseases Discussion leader: Nafisah Islam Scribe: Matthew Computational Network Biology BMI.

Similar presentations


Presentation on theme: "Identifying Causal Genes and Dysregulated Pathways in Complex Diseases Discussion leader: Nafisah Islam Scribe: Matthew Computational Network Biology BMI."— Presentation transcript:

1 Identifying Causal Genes and Dysregulated Pathways in Complex Diseases Discussion leader: Nafisah Islam Scribe: Matthew Computational Network Biology BMI 826/Computer Sciences 838 https://compnetbiocourse.discovery.wisc.edu By Yoo-Ah Kim, Stefan Wuchty, Teresa M. Przytycka* PLOS Computation Biology, 2011

2 Problem overview Different patients might have different combinations of molecular perturbations. These combinations of perturbations dys-regulate same cellular pathway. For complex diseases, information flow from potential causal genes to effected genes is never investigated. Problem: - A set of differentially expressed genes and - Genotype alterations for a particular disease case. Goal: - Find a set of potential causal genes and - Dys-regulated pathways that affect molecular entities.

3 Approach Use of associations between gene expression and genomic alterations. Four main steps to solve the problem: – Selection of expressed target genes that covered underlying disease case – Finding associations between altered genomic loci and changed expression level of target genes (eQTL Mapping) – Identification of potential causal genes – Determination of a subset of causal genes that best explain underlying disease case.

4 Figure 1. (A)Selection of target genes that were differentially expressed in disease cases, using a multi-set cover approach. (B)Detection of genome-wide associations between gene expression changes of target genes and genomic alterations which allows to find potential causal genomic areas. (C)Determination of causal paths from genomic alterations (i.e. causal genes) to target genes by modeling and solving a current flow problem through a circuit of molecular interactions. (D)To select a final set of causal genes, a weighted multi-set cover algorithm is designed. A bipartite graph between candidate causal genes and disease cases is constructed where each edge is labeled with the associated set of target genes that were affected by the causal gene and were differentially expressed in the corresponding disease case. In the final set-cover, causal genes in boxes covered each disease case with at least two target genes, allowing one exception. Outline of the method

5 Step 1: Selecting Target Genes Minimum multi-set cover problem and solved using greedy algorithm. Determination of a set of genes that were differentially expressed in 158 glioblastoma (GBM) cases compared to 32 non-tumor control cases. A gene is differentially expressed if the normalized gene expression value of the gene had a p-value is less than 0.01 in the given case using a Z-test. Bipartite graph B(Ƭ, S) between genes Ƭ and disease cases S by adding edges between genes g and cases s if and only if gene g was differentially expressed in case s. Multi-set cover instance SC = {B(Ƭ, S), α, β} where, α represented the number of times that a case needed to be covered β was the maximum number of outliers.

6 Algorithm 1: Pseudocode for selecting target genes. 1.Construct a bipartite graph B(T, S) of genes T and disease cases S by adding edges between gene g and case s if gene g is differentially expressed in case s. Let S(g) denote the set of cases to which gene g has edges. 2.Create a multi-set cover instance SC = {B(T, S) α. β} 3.U = a set of cases covered less than α. 4.TG = a set of selected genes 5.Repeat the following until |U| ≤ β : a.Select a gene with maximum |U ∩ S(g)| b.Include the selected gene in TG c.Update U They obtain 74 target genes

7 Step 2: Finding Association A set of loci L = {l 1, l 2,…, l m } where each locus l i was characterized by the corresponding copy number cn i,j in each case j, CN i = {cn i,1, cn i,2,…, cn i,n } Identify a potential tag locus tl k that satisfy Pearson’s correlation between CN k and Cn i. Given, a set of loci ƬL = {tl 1,tl 2,..tl m } and a set of target genes ƬG = {tg 1,tg 2 …,tg n }. Find, candidate causal loci using eQTL association analysis. For each tg i select tag loci with p-value less than 0.01.

8 Algorithm 2: Pseudocode of eQTL mapping. 1.For each chromsome chr, let L chr be the set of loci on the chromosome, sorted in increasing order of their genomic locations. 2. tl = L chr [0] \\ the first locus 3.Add tl to TL \\ TL … set of TAG loci 4.Consider loci i in sorted order: a.If corr (tl, L chr [i])   TL : (corr (x,y) … Pearson’s correlation coefficient) i.right(tl) = L chr [i-1] \\ set the right boundary of the old TAG locus ii.tl = L chr [i] and tl  TL \\ select a new TAG locus iii.Consider loci in reverse sorted order starting from j = i – 1 iv.if corr(tl, L chr [j])   TL : left (tl) = L chr [j+1] \\ set the left boundary of the new ta Go to 4.a 5.For each target gene dg i : a.TL(i) = [] \\set of target loci associated with disease gene i b.For each tag locus tl j : Run linear regression between E(dg i ) and CN(tl j ) and compute p-value If p <  eqtl : tl j  TL(i)

9 Step 3: Identification of Candidate Causal Genes Let G = (N, E) a gene network where N is a set of genes and E is a set of molecular interactions. I = [I(e) for e є E], current passing through the edges and V = [V(n) for n є N], holds voltage at the nodes For an edge e = (u,v) connecting genes u and v, the conductance of edge e, w(e) as the mean of corr(u, tg) and corr(v, tg).

10 Assumption: Direct regulation activity on the expression of target genes is mediated by transcription factors. Heuristic approach to solve the problem. Remove edges until a small number of directed edges are there. Uses empirical p-value less than 0.05

11 Algorithm 3: Pseudocode for selecting candidate causal genes. 1.For each disease gene dg i, a.CG(dgi) ← b.For each tag locus tl j є TL(i) and associated region R(tlj) i.Compute C(tl j ), a set of genes located in R(tlj) ii.Repeat the following: Construct an electric circuit G= {N, E} Compute current I(g) to each gene in C(tl j ) If |{e in E| e in reverse direction}| < θ r : Go to 1.a.iii Else: Remove the edges and repeat 1.a.ii iii.Compute current in random networks and p-values iv. Algorithm 4: Finding Dysregulated Pathway from a causal gene c to target gene d. 1.r max (c, d) ← the region in r(c) for which c has the most significant p-value where r(c) is the regions that contain a causal gene c) 2.tl max (c, d) ← the corresponding tag locus 3.G’ ← subgraph of G consisting of nodes with p-value > 0.05 in Sol(d, tl max (c, d)) 4.For each gene g i in G’: I(g i ) ← the total current passing through the gene g i in Sol(d, tl max (c, d)) 5.P max (d, c) ← paths from d to c with max p ∈ P(d, c) (min gi in p I(g i )) 6.Choose the shortest path in P max (d, c)

12 Step 4: Finalizing Causal Genes A weighted bipartite graph WB(C, S) Edges between gene cg k and case s i if and only if gene cg k explains a case s i. W(C 0, s) be the total number of target genes covering s by the genes in C 0 if the total weight covering the case exceeds a certain threshold. Minimum weighted multi-set cover problem

13 Algorithm 5: Pseudocode for the selection of final causal genes. 1.Create a weighted multi-set cover instance WSC = {B, ,  } 2.U = a set of cases covered less than . 3. MCG = a set of selected causal genes 4. Repeat the following until |U| ≤  a.Select a gene with maximum b.Include the selected gene in MCG c.Update U

14 Validation/Evaluation In the early steps of the algorithm, they determined associations between copy number variations and expression of target genes, yielding 16,056 associated genes. Next step reduced this set to 701 candidate causal genes with a significant enrichment of 10 GBM specific and 25 Glioma related genes. Also, obtain 1,763 pairs. Using the weighted set cover approach and information from DAVID database gives consistent results – 128 causal genes that harbored 6 GBM relevant genes [CDKN2A, EGFR, ERBB4, PTEN, RB1 and TP53]. According to AceView 280 causal genes were obtained, including only 4 GBM related genes.

15 Chromosomal Analysis of Causal Genes Genomic alternation in GBM gives A genomic amplification in chromosome 7 Deletion on chromosome 10 These alternations occur at genomic locations of EGFR and PTEN. Final 128 causal genes and corresponding target genes pair Causal genes at chromosome 7 and 10 were connected to target genes.

16

17 Dysregulated Pathways and Subnetworks Figure: The network of causal paths from PTEN.

18

19 Literature-Based Validation RHOBOTB2, a recently discovered gene, small genomic alterations GBAS and CEBPA were not included by AceView or DAVID

20 (A) Dysregulated pathways from causal gene CDC2. Genes in this larger network were significantly enriched in the cell cycle, pathways in cancer, chronic myeloid leukemia, prostate, pancreatic, bladder, colorectal and lung cancers (P < 0.01).

21 (B) Dysregulated pathways from causal gene GBAS. Genes that appear in this network are enriched in bladder cancer and cancer pathways in general (P < 0.01). (C) Pooling all genes that appear in dys-regulated pathways as presented in the main text, we found that a small number of genes appeared in many causal paths. The hub genes were significantly enriched in cancer pathways, chronic and acute myeloid leukemia, prostate and pancreatic cancers, cell cycle, neurotrophin signaling pathway, renal cell carcinoma, TGF signaling and T-cell receptor signaling pathways (P < 0.001).

22 Discussion Each individual step can be used separately, depending on a specific application. Thus, method can be applied to any disease system where genetic variations play a fundamental, causal role.


Download ppt "Identifying Causal Genes and Dysregulated Pathways in Complex Diseases Discussion leader: Nafisah Islam Scribe: Matthew Computational Network Biology BMI."

Similar presentations


Ads by Google