IDENTIFYING CAUSAL GENES AND DYSREGULATED PATHWAYS IN COMPLEX DISEASES Nov. 6 th, 2010 YOO-AH KIM NIH / NLM / NCBI.

Slides:



Advertisements
Similar presentations
Evidence for Complex Causes
Advertisements

Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Introduction to genomes & genome browsers
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Teresa Przytycka NIH / NLM / NCBI RECOMB 2010 Bridging the genotype and phenotype.
By: Katie Adolphsen, Robin Aldrich, Brandon Hu, Nate Havko.
GENIE – GEne Network Inference with Ensemble of trees Van Anh Huynh-Thu Department of Electrical Engineering and Computer Science, Systems and Modeling,
Linking Proteomic and Transcriptional Data through the Interactome and Epigenome Reveals a Map of Oncogene- induced Signaling Anthony Gitter Cancer Bioinformatics.
Class activity: What are my asthma variants doing? In the subset of individuals for whom expression data are available, the T nucleotide allele at rs
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Genetics and the Organism 10 Jan, Genetics Experimental science of heredity Grew out of need of plant and animal breeders for greater understanding.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
The Central Dogma of Molecular Biology (Things are not really this simple) Genetic information is stored in our DNA (~ 3 billion bp) The DNA of a.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Office hours Wednesday 3-4pm 304A Stanley Hall Review session 5pm Thursday, Dec. 11 GPB100.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Overview of Basic Genetic Science Dr. Mike Dougherty Department of Biology Hampden-Sydney College.
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Radiogenomics in glioblastoma multiforme
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Natural Variation in Arabidopsis ecotypes. Using natural variation to understand diversity Correlation of phenotype with environment (selective pressure?)
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Christopher Dickman Cathie Garnis Experimental Medicine Research Day
Computational biology of cancer cell pathways Modelling of cancer cell function and response to therapy.
Chapter 5 Outline 5.1 Dominance Is Interaction between Genes at the Same Locus, Penetrance and Expressivity Describe How Genes Are Expressed as.
Identification of Copy Number Variants using Genome Graphs
COMPUTATIONAL ANALYSIS OF MULTILEVEL OMICS DATA FOR THE ELUCIDATION OF MOLECULAR MECHANISMS OF CANCER Presented by Azeez Ayomide Fatai Supervisor: Junaid.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
ICNCT-16, June 2014, Helsinki Glioma heterogeneity and the L-Amino acid transporter-1 (LAT1): A first step to stratified BPA-based BNCT? D. Ngoga 1 ; C.
Identifying Causal Genes and Dysregulated Pathways in Complex Diseases Discussion leader: Nafisah Islam Scribe: Matthew Computational Network Biology BMI.
Genetics Review Honors Human Anatomy & Physiology Mr. Mazza
HIT’nDRIVE: Multi-driver Gene Prioritization Based on Hitting Time Raunak Shrestha, Ermin Hodzic, Jake Yeung, Kendric Wang, Thomas Sauerwald, Phuong Dao,
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Stitching the Tutorials Together Introduction to Systems Biology Course Chris Plaisier Institute for Systems Biology.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Simultaneous identification of causal genes and dys-regulated pathways in complex diseases Yoo-Ah Kim, Stefan Wuchty and Teresa M Przytycka Paper to be.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
EQTLs.
A graph-based integration of multiple layers of cancer genomics data (Progress Report) Do Kyoon Kim 1.
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Songjian Lu, PhD Assistant Professor
Quantitative traits Lecture 13 By Ms. Shumaila Azam
Dept of Biomedical Informatics University of Pittsburgh
Gene Hunting: Design and statistics
Post-GWAS and Mechanistic Analyses
Beyond GWAS Erik Fransen.
Linking Genetic Variation to Important Phenotypes
Loyola Marymount University
Schedule for the Afternoon
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Genetic variation in DREs could be a causative factor in dysregulation of distal target gene expression. Genetic variation in DREs could be a causative.
Volume 58, Issue 4, Pages (May 2015)
An Expanded View of Complex Traits: From Polygenic to Omnigenic
Chen Yao, Roby Joehanes, Andrew D
Loyola Marymount University
Volume 26, Issue 12, Pages e5 (March 2019)
SNPs and CNPs By: David Wendel.
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Presentation transcript:

IDENTIFYING CAUSAL GENES AND DYSREGULATED PATHWAYS IN COMPLEX DISEASES Nov. 6 th, 2010 YOO-AH KIM NIH / NLM / NCBI

Complex Diseases Associated with the effects of multiple genes As opposed to single gene diseases The combination of genomic alteration may vary strongly among different patients Dysregulating the same components, thus often leading to the same disease phenotype Difficult to study and Treat Cancer, Heart diseases, Diabetes, etc.

Copy Number Variations Two copies of each gene are generally assumed to be present in a genome Genomic regions may be deleted or duplicated causing CNV Some CNVs are associated with susceptibility or resistance to diseases such as cancer Copy Number Variations in 158 Glioblastoma patients

Identifying Genomic Causes in Complex Diseases Identify genotypic causes in individual patients as well as dysregulated pathways Systems biology approach Genome-wide search Graph theoretic algorithms Circuit flow Set cover 158 Glioblastoma multiforme patients

Glioblastoma multiforme (GBM) the most common and most aggressive type of primary brain tumor in humans

Expression as Quantitative Trait Genotype: Copy number variations Phenotype: Gene expression

eQTL (expression Quantitative Trait Loci) Analysis While we assume that the genetic variation is the cause and expression change is the effect, we dont know molecular pathways behind the relation Putative target gene Putative causal gene/loci

Method Outline A.Target gene selection Gene expression B.eQTL Find association between expression and copy number C.Circuit flow algorithm Molecular interactions Candidate causal genes D.Causal gene selection Weighted multiset cover cases target genes gmgmgmgm g3g3g3g3 g2g2g2g2 g1g1g1g1 tag loci snsnsnsn s3s3s3s3 s2s2s2s2 s1s1s1s1 s4s4s4s4 cases causalgenes cases target Gene g m tag SNP s n causalgenes + - A C TF-DNA phosphoryl.event protein-protein D B

Target Gene Selection Select a representative set of disease genes Filter differentially expressed genes for each case Multi-set cover Gene 1 Gene 2 Gene 3. ControlsDisease Cases Gene Expression

Target Gene Selection (Continued) Minimum multi-set cover a gene covers a particular disease case if the gene is differentially expressed in the case Find a smallest set of genes that covers (almost) all cases at least k times selected 74 target genes Genes Disease Cases case1 case2case3case4 CDBAE case5case6case7

Associations between the expression of target genes and copy number variations of genomic loci Linear regression For every pair of tag loci and target genes eQTL cases target genes tag Loci cases

Method Outline A.Target gene selection Gene expression B.eQTL Find association between expression and copy number C.Circuit flow algorithm Molecular interactions Candidate causal genes D.Causal gene selection Weighted multiset cover cases target genes gmgmgmgm g3g3g3g3 g2g2g2g2 g1g1g1g1 tag loci snsnsnsn s3s3s3s3 s2s2s2s2 s1s1s1s1 s4s4s4s4 cases causalgenes cases target Gene g m tag SNP s n causalgenes + - A C TF-DNA phosphoryl.event protein-protein D B

Finding Candidate Causal Genes Genotypic Variations Target Genes

Finding Candidate Causal Genes ? Genotypic Variations Target Genes C1 C2 C3 C4 C5 Candidate Genes

Finding Candidate Causal Genes Genotypic Variations Target Genes C1 C2 C3 C4 C5 Candidate Genes Interaction Network protein-protein interactions phosphorylation events transcription factor interactions.

Finding Candidate Causal Genes Genotypic Variations Target Genes C1 C2 C3 C4 C5 Candidate Genes u v Current flow + - Resistance (u, v) is set to be reversely proportional to (|corr (expr(u), expr(D))| + |corr(expr(v), expr(D))|)/2 Interaction Network

Finding Candidate Causal Genes Genotypic Variations Target Genes C1 C2 C3 C4 C5 Candidate Genes Current flow + - Compute the amount of current entering each causal gene by solving a system of linear equations Interaction Network

Method Outline A.Target gene selection Gene expression B.eQTL Find association between expression and copy number C.Circuit flow algorithm Molecular interactions Candidate causal genes D.Causal gene selection Weighted multiset cover cases target genes gmgmgmgm g3g3g3g3 g2g2g2g2 g1g1g1g1 tag loci snsnsnsn s3s3s3s3 s2s2s2s2 s1s1s1s1 s4s4s4s4 cases causalgenes cases target Gene g m tag SNP s n causalgenes + - A C TF-DNA phosphoryl.event protein-protein D B

Final Causal Gene Selection cases causal genes A putative causal gene explains a disease case if its corresponding tag locus has a copy number alteration its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case

Final Causal Gene Selection cases causal genes A putative causal gene explains a disease case if its corresponding tag locus has a copy number alteration its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case

Final Causal Gene Selection cases causal genes A putative causal gene explains a disease case if its corresponding tag locus has a copy number alteration its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case WEIGHT

Final Causal Gene Selection Find a smallest set of genes covering (almost) all cases at least k times minimum weighted multi-set cover

Dysregulated Pathways Causal paths between a target and a causal gene a maximum current path C1 C2 C3 C4 C5

Results 158 GBM patient samples 32 non-tumor control samples 74 target genes 128 causal genes Disease hubs – genes frequently appearing on causal paths

Selected Causal Genes Number of GenesOverlap with GBM genes Step B: eQTL (75) Step C: Circuit flow (10) Step D: Set cover (6)

Results 128 causal genes from set cover (STEP D) 701 candidate causal gene from circuit flow algorithm (STEP C)

Causal Genes BSOSC Review, November 2008 P-valueGenes Glioma0.008PRKCA,EGFR,AKT1,CDKN2A,CAMK2G,TP53,RB1,PTEN Cell cycle0.028MCM7,CDKN2A,CDC2,TP53,ORC5L,RB1,ATR,BUB3,CUL1 p53 signaling pathway0.030CDKN2A,CDC2,TP53,ATR,FAS,THBS1,PTEN Proteasome0.026PSMA1,PSMC6,PSMB1,PSMC3,PSMA5,PSMA4 Functional analysis using DAVID The selected causal gene set includes many known cancer implicated genes

PTEN as causal gene fold change TF-DNA protein-protein kinase TF causalgenes

EGFR as causal and target gene fold change kinase TF causalgenes TF-DNA protein-protein phosphorylation Causal EGFR Target EGFR

Conclusion A novel computational method to simultaneously identify causal genes and dys-regulated pathways Circuit flow algorithm Multi-set cover Augmentation of eQTL evidence with interaction information resulted in a very powerful approach uncover potential causal genes as well as intermediate nodes on molecular pathways Our method can be applied to any disease system where genetic variations play a fundamental causal role

Acknowledgements Teresa M. Przytycka Stefan Wuchty Other group members Dong Yeon Cho Yang Huang Damian Wojtowicz Jie Zheng

Method Outline A.Target gene selection Gene expression B.eQTL Find association between expression and copy number C.Circuit flow algorithm Molecular interactions Candidate causal genes D.Causal gene selection Weighted multiset cover cases target genes gmgmgmgm g3g3g3g3 g2g2g2g2 g1g1g1g1 tag loci snsnsnsn s3s3s3s3 s2s2s2s2 s1s1s1s1 s4s4s4s4 cases causalgenes cases target Gene g m tag SNP s n causalgenes + - A C TF-DNA phosphoryl.event protein-protein D B

EGFR as causal and target gene C AUSAL P ATHS fold change kinase TF causalgenes TF-DNA protein-protein phosphorylation causal EGFR target EGFR

PTEN as causal gene C AUSAL P ATHS fold change TF-DNA protein-protein kinase TF causalgenes

Our Method Integrate several types of data Gene expression Copy number variations Molecular interactions

Methods and Results Method model the expression change of disease genes as a function of genomic alterations translated the propagation of information from a potential causal to a disease gene as the flow of electric current through a network of molecular interactions. multi-set cover: select most prominent genes Validated our approach by testing the enrichment of selected causal genes with known GBM/Glioma related genes disease gene g m tagSNP s n causalgenes + -