Download presentation
Presentation is loading. Please wait.
Published byJacob Whitty Modified over 10 years ago
1
IDENTIFYING CAUSAL GENES AND DYSREGULATED PATHWAYS IN COMPLEX DISEASES Nov. 6 th, 2010 YOO-AH KIM NIH / NLM / NCBI
2
Complex Diseases Associated with the effects of multiple genes As opposed to single gene diseases The combination of genomic alteration may vary strongly among different patients Dysregulating the same components, thus often leading to the same disease phenotype Difficult to study and Treat Cancer, Heart diseases, Diabetes, etc.
3
Copy Number Variations Two copies of each gene are generally assumed to be present in a genome Genomic regions may be deleted or duplicated causing CNV Some CNVs are associated with susceptibility or resistance to diseases such as cancer Copy Number Variations in 158 Glioblastoma patients
4
Identifying Genomic Causes in Complex Diseases Identify genotypic causes in individual patients as well as dysregulated pathways Systems biology approach Genome-wide search Graph theoretic algorithms Circuit flow Set cover 158 Glioblastoma multiforme patients
5
Glioblastoma multiforme (GBM) the most common and most aggressive type of primary brain tumor in humans
6
Expression as Quantitative Trait Genotype: Copy number variations Phenotype: Gene expression
7
eQTL (expression Quantitative Trait Loci) Analysis While we assume that the genetic variation is the cause and expression change is the effect, we dont know molecular pathways behind the relation Putative target gene Putative causal gene/loci
8
Method Outline A.Target gene selection Gene expression B.eQTL Find association between expression and copy number C.Circuit flow algorithm Molecular interactions Candidate causal genes D.Causal gene selection Weighted multiset cover cases target genes gmgmgmgm g3g3g3g3 g2g2g2g2 g1g1g1g1 tag loci snsnsnsn s3s3s3s3 s2s2s2s2 s1s1s1s1 s4s4s4s4 cases causalgenes cases target Gene g m tag SNP s n causalgenes + - A C TF-DNA phosphoryl.event protein-protein D B
9
Target Gene Selection Select a representative set of disease genes Filter differentially expressed genes for each case Multi-set cover Gene 1 Gene 2 Gene 3. ControlsDisease Cases Gene Expression
10
Target Gene Selection (Continued) Minimum multi-set cover a gene covers a particular disease case if the gene is differentially expressed in the case Find a smallest set of genes that covers (almost) all cases at least k times selected 74 target genes Genes Disease Cases case1 case2case3case4 CDBAE case5case6case7
11
Associations between the expression of target genes and copy number variations of genomic loci Linear regression For every pair of tag loci and target genes eQTL cases target genes tag Loci cases
12
Method Outline A.Target gene selection Gene expression B.eQTL Find association between expression and copy number C.Circuit flow algorithm Molecular interactions Candidate causal genes D.Causal gene selection Weighted multiset cover cases target genes gmgmgmgm g3g3g3g3 g2g2g2g2 g1g1g1g1 tag loci snsnsnsn s3s3s3s3 s2s2s2s2 s1s1s1s1 s4s4s4s4 cases causalgenes cases target Gene g m tag SNP s n causalgenes + - A C TF-DNA phosphoryl.event protein-protein D B
13
Finding Candidate Causal Genes Genotypic Variations Target Genes
14
Finding Candidate Causal Genes ? Genotypic Variations Target Genes C1 C2 C3 C4 C5 Candidate Genes
15
Finding Candidate Causal Genes Genotypic Variations Target Genes C1 C2 C3 C4 C5 Candidate Genes Interaction Network protein-protein interactions phosphorylation events transcription factor interactions.
16
Finding Candidate Causal Genes Genotypic Variations Target Genes C1 C2 C3 C4 C5 Candidate Genes u v Current flow + - Resistance (u, v) is set to be reversely proportional to (|corr (expr(u), expr(D))| + |corr(expr(v), expr(D))|)/2 Interaction Network
17
Finding Candidate Causal Genes Genotypic Variations Target Genes C1 C2 C3 C4 C5 Candidate Genes Current flow + - Compute the amount of current entering each causal gene by solving a system of linear equations Interaction Network
18
Method Outline A.Target gene selection Gene expression B.eQTL Find association between expression and copy number C.Circuit flow algorithm Molecular interactions Candidate causal genes D.Causal gene selection Weighted multiset cover cases target genes gmgmgmgm g3g3g3g3 g2g2g2g2 g1g1g1g1 tag loci snsnsnsn s3s3s3s3 s2s2s2s2 s1s1s1s1 s4s4s4s4 cases causalgenes cases target Gene g m tag SNP s n causalgenes + - A C TF-DNA phosphoryl.event protein-protein D B
19
Final Causal Gene Selection cases causal genes A putative causal gene explains a disease case if its corresponding tag locus has a copy number alteration its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case
20
Final Causal Gene Selection cases causal genes A putative causal gene explains a disease case if its corresponding tag locus has a copy number alteration its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case
21
Final Causal Gene Selection cases causal genes A putative causal gene explains a disease case if its corresponding tag locus has a copy number alteration its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case WEIGHT
22
Final Causal Gene Selection Find a smallest set of genes covering (almost) all cases at least k times minimum weighted multi-set cover
23
Dysregulated Pathways Causal paths between a target and a causal gene a maximum current path C1 C2 C3 C4 C5
24
Results 158 GBM patient samples 32 non-tumor control samples 74 target genes 128 causal genes Disease hubs – genes frequently appearing on causal paths
25
Selected Causal Genes Number of GenesOverlap with GBM genes Step B: eQTL 160560.56 (75) Step C: Circuit flow 7010.045 (10) Step D: Set cover 128 4.7 10 -4 (6)
26
Results 128 causal genes from set cover (STEP D) 701 candidate causal gene from circuit flow algorithm (STEP C)
27
Causal Genes BSOSC Review, November 2008 P-valueGenes Glioma0.008PRKCA,EGFR,AKT1,CDKN2A,CAMK2G,TP53,RB1,PTEN Cell cycle0.028MCM7,CDKN2A,CDC2,TP53,ORC5L,RB1,ATR,BUB3,CUL1 p53 signaling pathway0.030CDKN2A,CDC2,TP53,ATR,FAS,THBS1,PTEN Proteasome0.026PSMA1,PSMC6,PSMB1,PSMC3,PSMA5,PSMA4 Functional analysis using DAVID The selected causal gene set includes many known cancer implicated genes
28
PTEN as causal gene fold change - 0 + TF-DNA protein-protein kinase TF causalgenes
29
EGFR as causal and target gene fold change - 0 + kinase TF causalgenes TF-DNA protein-protein phosphorylation Causal EGFR Target EGFR
30
Conclusion A novel computational method to simultaneously identify causal genes and dys-regulated pathways Circuit flow algorithm Multi-set cover Augmentation of eQTL evidence with interaction information resulted in a very powerful approach uncover potential causal genes as well as intermediate nodes on molecular pathways Our method can be applied to any disease system where genetic variations play a fundamental causal role
31
Acknowledgements Teresa M. Przytycka Stefan Wuchty Other group members Dong Yeon Cho Yang Huang Damian Wojtowicz Jie Zheng
32
Method Outline A.Target gene selection Gene expression B.eQTL Find association between expression and copy number C.Circuit flow algorithm Molecular interactions Candidate causal genes D.Causal gene selection Weighted multiset cover cases target genes gmgmgmgm g3g3g3g3 g2g2g2g2 g1g1g1g1 tag loci snsnsnsn s3s3s3s3 s2s2s2s2 s1s1s1s1 s4s4s4s4 cases causalgenes cases target Gene g m tag SNP s n causalgenes + - A C TF-DNA phosphoryl.event protein-protein D B
34
EGFR as causal and target gene C AUSAL P ATHS fold change - 0 + kinase TF causalgenes TF-DNA protein-protein phosphorylation causal EGFR target EGFR
35
PTEN as causal gene C AUSAL P ATHS fold change - 0 + TF-DNA protein-protein kinase TF causalgenes
36
Our Method Integrate several types of data Gene expression Copy number variations Molecular interactions
37
Methods and Results Method model the expression change of disease genes as a function of genomic alterations translated the propagation of information from a potential causal to a disease gene as the flow of electric current through a network of molecular interactions. multi-set cover: select most prominent genes Validated our approach by testing the enrichment of selected causal genes with known GBM/Glioma related genes disease gene g m tagSNP s n causalgenes + -
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.