Presentation is loading. Please wait.

Presentation is loading. Please wait.

IDENTIFYING CAUSAL GENES AND DYSREGULATED PATHWAYS IN COMPLEX DISEASES Nov. 6 th, 2010 YOO-AH KIM NIH / NLM / NCBI.

Similar presentations


Presentation on theme: "IDENTIFYING CAUSAL GENES AND DYSREGULATED PATHWAYS IN COMPLEX DISEASES Nov. 6 th, 2010 YOO-AH KIM NIH / NLM / NCBI."— Presentation transcript:

1 IDENTIFYING CAUSAL GENES AND DYSREGULATED PATHWAYS IN COMPLEX DISEASES Nov. 6 th, 2010 YOO-AH KIM NIH / NLM / NCBI

2 Complex Diseases Associated with the effects of multiple genes As opposed to single gene diseases The combination of genomic alteration may vary strongly among different patients Dysregulating the same components, thus often leading to the same disease phenotype Difficult to study and Treat Cancer, Heart diseases, Diabetes, etc.

3 Copy Number Variations Two copies of each gene are generally assumed to be present in a genome Genomic regions may be deleted or duplicated causing CNV Some CNVs are associated with susceptibility or resistance to diseases such as cancer Copy Number Variations in 158 Glioblastoma patients

4 Identifying Genomic Causes in Complex Diseases Identify genotypic causes in individual patients as well as dysregulated pathways Systems biology approach Genome-wide search Graph theoretic algorithms Circuit flow Set cover 158 Glioblastoma multiforme patients

5 Glioblastoma multiforme (GBM) the most common and most aggressive type of primary brain tumor in humans

6 Expression as Quantitative Trait Genotype: Copy number variations Phenotype: Gene expression

7 eQTL (expression Quantitative Trait Loci) Analysis While we assume that the genetic variation is the cause and expression change is the effect, we dont know molecular pathways behind the relation Putative target gene Putative causal gene/loci

8 Method Outline A.Target gene selection Gene expression B.eQTL Find association between expression and copy number C.Circuit flow algorithm Molecular interactions Candidate causal genes D.Causal gene selection Weighted multiset cover cases target genes gmgmgmgm g3g3g3g3 g2g2g2g2 g1g1g1g1 tag loci snsnsnsn s3s3s3s3 s2s2s2s2 s1s1s1s1 s4s4s4s4 cases causalgenes cases target Gene g m tag SNP s n causalgenes + - A C TF-DNA phosphoryl.event protein-protein D B

9 Target Gene Selection Select a representative set of disease genes Filter differentially expressed genes for each case Multi-set cover Gene 1 Gene 2 Gene 3. ControlsDisease Cases Gene Expression

10 Target Gene Selection (Continued) Minimum multi-set cover a gene covers a particular disease case if the gene is differentially expressed in the case Find a smallest set of genes that covers (almost) all cases at least k times selected 74 target genes Genes Disease Cases case1 case2case3case4 CDBAE case5case6case7

11 Associations between the expression of target genes and copy number variations of genomic loci Linear regression For every pair of tag loci and target genes eQTL cases target genes tag Loci cases

12 Method Outline A.Target gene selection Gene expression B.eQTL Find association between expression and copy number C.Circuit flow algorithm Molecular interactions Candidate causal genes D.Causal gene selection Weighted multiset cover cases target genes gmgmgmgm g3g3g3g3 g2g2g2g2 g1g1g1g1 tag loci snsnsnsn s3s3s3s3 s2s2s2s2 s1s1s1s1 s4s4s4s4 cases causalgenes cases target Gene g m tag SNP s n causalgenes + - A C TF-DNA phosphoryl.event protein-protein D B

13 Finding Candidate Causal Genes Genotypic Variations Target Genes

14 Finding Candidate Causal Genes ? Genotypic Variations Target Genes C1 C2 C3 C4 C5 Candidate Genes

15 Finding Candidate Causal Genes Genotypic Variations Target Genes C1 C2 C3 C4 C5 Candidate Genes Interaction Network protein-protein interactions phosphorylation events transcription factor interactions.

16 Finding Candidate Causal Genes Genotypic Variations Target Genes C1 C2 C3 C4 C5 Candidate Genes u v Current flow + - Resistance (u, v) is set to be reversely proportional to (|corr (expr(u), expr(D))| + |corr(expr(v), expr(D))|)/2 Interaction Network

17 Finding Candidate Causal Genes Genotypic Variations Target Genes C1 C2 C3 C4 C5 Candidate Genes Current flow + - Compute the amount of current entering each causal gene by solving a system of linear equations Interaction Network

18 Method Outline A.Target gene selection Gene expression B.eQTL Find association between expression and copy number C.Circuit flow algorithm Molecular interactions Candidate causal genes D.Causal gene selection Weighted multiset cover cases target genes gmgmgmgm g3g3g3g3 g2g2g2g2 g1g1g1g1 tag loci snsnsnsn s3s3s3s3 s2s2s2s2 s1s1s1s1 s4s4s4s4 cases causalgenes cases target Gene g m tag SNP s n causalgenes + - A C TF-DNA phosphoryl.event protein-protein D B

19 Final Causal Gene Selection cases causal genes A putative causal gene explains a disease case if its corresponding tag locus has a copy number alteration its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case

20 Final Causal Gene Selection cases causal genes A putative causal gene explains a disease case if its corresponding tag locus has a copy number alteration its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case

21 Final Causal Gene Selection cases causal genes A putative causal gene explains a disease case if its corresponding tag locus has a copy number alteration its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case WEIGHT

22 Final Causal Gene Selection Find a smallest set of genes covering (almost) all cases at least k times minimum weighted multi-set cover

23 Dysregulated Pathways Causal paths between a target and a causal gene a maximum current path C1 C2 C3 C4 C5

24 Results 158 GBM patient samples 32 non-tumor control samples 74 target genes 128 causal genes Disease hubs – genes frequently appearing on causal paths

25 Selected Causal Genes Number of GenesOverlap with GBM genes Step B: eQTL 160560.56 (75) Step C: Circuit flow 7010.045 (10) Step D: Set cover 128 4.7 10 -4 (6)

26 Results 128 causal genes from set cover (STEP D) 701 candidate causal gene from circuit flow algorithm (STEP C)

27 Causal Genes BSOSC Review, November 2008 P-valueGenes Glioma0.008PRKCA,EGFR,AKT1,CDKN2A,CAMK2G,TP53,RB1,PTEN Cell cycle0.028MCM7,CDKN2A,CDC2,TP53,ORC5L,RB1,ATR,BUB3,CUL1 p53 signaling pathway0.030CDKN2A,CDC2,TP53,ATR,FAS,THBS1,PTEN Proteasome0.026PSMA1,PSMC6,PSMB1,PSMC3,PSMA5,PSMA4 Functional analysis using DAVID The selected causal gene set includes many known cancer implicated genes

28 PTEN as causal gene fold change - 0 + TF-DNA protein-protein kinase TF causalgenes

29 EGFR as causal and target gene fold change - 0 + kinase TF causalgenes TF-DNA protein-protein phosphorylation Causal EGFR Target EGFR

30 Conclusion A novel computational method to simultaneously identify causal genes and dys-regulated pathways Circuit flow algorithm Multi-set cover Augmentation of eQTL evidence with interaction information resulted in a very powerful approach uncover potential causal genes as well as intermediate nodes on molecular pathways Our method can be applied to any disease system where genetic variations play a fundamental causal role

31 Acknowledgements Teresa M. Przytycka Stefan Wuchty Other group members Dong Yeon Cho Yang Huang Damian Wojtowicz Jie Zheng

32 Method Outline A.Target gene selection Gene expression B.eQTL Find association between expression and copy number C.Circuit flow algorithm Molecular interactions Candidate causal genes D.Causal gene selection Weighted multiset cover cases target genes gmgmgmgm g3g3g3g3 g2g2g2g2 g1g1g1g1 tag loci snsnsnsn s3s3s3s3 s2s2s2s2 s1s1s1s1 s4s4s4s4 cases causalgenes cases target Gene g m tag SNP s n causalgenes + - A C TF-DNA phosphoryl.event protein-protein D B

33

34 EGFR as causal and target gene C AUSAL P ATHS fold change - 0 + kinase TF causalgenes TF-DNA protein-protein phosphorylation causal EGFR target EGFR

35 PTEN as causal gene C AUSAL P ATHS fold change - 0 + TF-DNA protein-protein kinase TF causalgenes

36 Our Method Integrate several types of data Gene expression Copy number variations Molecular interactions

37 Methods and Results Method model the expression change of disease genes as a function of genomic alterations translated the propagation of information from a potential causal to a disease gene as the flow of electric current through a network of molecular interactions. multi-set cover: select most prominent genes Validated our approach by testing the enrichment of selected causal genes with known GBM/Glioma related genes disease gene g m tagSNP s n causalgenes + -


Download ppt "IDENTIFYING CAUSAL GENES AND DYSREGULATED PATHWAYS IN COMPLEX DISEASES Nov. 6 th, 2010 YOO-AH KIM NIH / NLM / NCBI."

Similar presentations


Ads by Google