Gene-set analysis Danielle Posthuma & Christiaan de Leeuw

Slides:



Advertisements
Similar presentations
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Advertisements

Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.
Stat 301 – Day 17 Tests of Significance. Last Time – Sampling cont. Different types of sampling and nonsampling errors  Can only judge sampling bias.
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
EnrichNet: network-based gene set enrichment analysis Presenter: Lu Liu.
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Networks and Interactions Boo Virk v1.0.
Evaluating a Research Report
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Interpreting exomes and genomes: a beginner’s guide
SNPs and complex traits: where is the hidden heritability?
Genomic Analysis: GWAS
Networks and Interactions
Exploring and Presenting Results
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Clustering Manpreet S. Katari.
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
GO : the Gene Ontology & Functional enrichment analysis
Functional Mapping and Annotation of GWAS: FUMA
Statistical Testing with Genes
Genome Wide Association Studies using SNP
Gene Hunting: Design and statistics
Structure of proximal and distant regulatory elements in the human genome Ivan Ovcharenko Computational Biology Branch National Center for Biotechnology.
Inferential statistics,
University of California at San Diego
Large Scale Annotation of Genomic Datasets with Genephony
1 Department of Engineering, 2 Department of Mathematics,
Genome-wide Associations
1 Department of Engineering, 2 Department of Mathematics,
Beyond GWAS Erik Fransen.
1 Department of Engineering, 2 Department of Mathematics,
Hypotheses A hypothesis (plural hypotheses) is a precise, testable statement of what the researchers predict will be the outcome of the study. There are.
Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits  Nicholas Mancuso, Huwenbo Shi, Pagé.
Chapter 7 Multifactorial Traits
Nature of Science.
Huwenbo Shi, Nicholas Mancuso, Sarah Spendlove, Bogdan Pasaniuc 
Genetics and genomics of psychiatric disease
Statistical Analysis and Design of Experiments for Large Data Sets
Functional Gene Group Analysis Reveals a Role of Synaptic Heterotrimeric G Proteins in Cognitive Ability  Dina Ruano, Gonçalo R. Abecasis, Beate Glaser,
Pierre Nahon, Jessica Zucman-Rossi  Journal of Hepatology 
Towfique Raj, Manik Kuchroo, Joseph M
Medical genomics BI420 Department of Biology, Boston College
One SNP at a Time: Moving beyond GWAS in Psoriasis
Five Years of GWAS Discovery
Diego Calderon, Anand Bhaskar, David A
Medical genomics BI420 Department of Biology, Boston College
Evan G. Williams, Johan Auwerx  Cell 
An Expanded View of Complex Traits: From Polygenic to Omnigenic
Introduction to Bioinformatics
Statistical Testing with Genes
Figure 1 Relationships between genetic variants, quantitative traits and diseases Figure 1 | Relationships between genetic variants, quantitative traits.
Global analysis of the chemical–genetic interaction map.
Amanda L. Tapia Department of Biostatistics
Presentation transcript:

Gene-set analysis Danielle Posthuma & Christiaan de Leeuw Dept. Complex Trait Genetics, VU University Amsterdam //danielle/2017/PW_dp.ppt Boulder, TC31, March 8 2017

SNP associations SNP SNP SNP Gene Function SNP Gene Function SNP Gene

SNP associations SNP Gene Gene Gene Gene Gene Gene Are all associated SNPs randomly distributed or do they cluster in genes?

SNP associations SNP Gene Gene Are all associated SNPs randomly distributed or do they cluster in genes?

SNP associations SNP Gene Gene Gene Gene Gene Gene Function Function Function Function Function Function Do all implicated genes have different functions or are they functionally related?

SNP associations SNP Gene Gene Gene Gene Gene Gene Function Function Do all implicated genes have different functions or are they functionally related?

Testing for functional clustering of SNP associations Single SNP analysis - GWAS - single (candidate) SNPs SNP-set analysis with gene as unit of analysis - whole genome - candidate gene Gene-based analysis SNP-set analysis with sets of genes as unit of analysis - targeted gene-sets/pathways - all known gene-sets/pathways Gene-set analysis

Testing for functional clustering of SNP associations Single SNP analysis Gene-based analysis Using quantitative characteristics of genes e.g. expression levels or probability of being a member of a gene-set Gene-set analysis Gene-property analysis

Gene based analysis Instead of testing single SNPs and annotating GWAS-significant ones to genes, we test for the joint association effect of all SNPs in a gene, taking into account LD (correlation between SNPs) No single SNP needs to reach genome-wide significance, yet if multiple SNPs in the same gene have a lower P-value than expected under the null, the gene-based test can results in low P

SNP Manhattan plot Gene Manhattan plot

Gene based analysis Unit of analysis is the gene Pro’s: reduce multiple testing (from 2.5M SNPs to 23k genes) accounts for heterogeneity in gene Immediate gene-level interpretation Cons: disregards regulatory (often non-genic) information when based on location based annotation Still a lot of tests

Gene-set analysis Unit of analysis is a set of functionally related genes Pro’s: Reduce multiple testing by prioritizing genes in biological pathways or in groups of (functionally) related genes Increases statistical power Deals with genic heterogeneity Provides immediate biological insight

Gene-set analysis Cons Crucial to select reliable sets of genes! Different levels of information Different quality of information

Choosing gene-sets Gene-sets can be based on e.g. protein-protein interaction co-expression transcription regulatory network biological pathway Use public or commercial databases: e.g. KEGG, Gene Ontolog, Ingenuit, Biocart, String database, Human Protein Interaction database Or: Create manually, expert curated lists

Online databases vs. manual Information in online databases tends to be somewhat biased not all genes included, disease genes tend to be investigated more often genes that are investigated more often will have more interactions not always reliable interactions often not validated, sometimes only predicted. If experimentally seen, unknown how reliable that experiment was

Statistical issues in gene-set analyses Self-contained vs. competitive tests Different statistical algorithms test different alternative hypotheses Different statistical algorithms have different sensitivity to LD, ngenes, nSNPs, background h2

Self-contained vs. competitive tests Null hypothesis: Self-contained: H0: The gene-sets are not associated with the trait Competitive: H0: The genes in the gene-set are not more strongly associated with the trait than the genes not in the gene-set

Why use competitive tests Polygenic traits influenced by thousands of SNPs in hundreds of genes Very likely that many combinations (i.e. gene-sets) of causal genes are significantly related Competitive tests define which combinations are biologically most interpretable

Polygenicity and number of significant gene-sets in self-contained versus competitive testing De Leeuw, Neale, Heskes, Posthuma. Nat Rev Genet, 2016 For self-contained methods, rates increase with heritability, whereas they are constant for competitive methods. Rates are deflated for the binomial and hypergeometric methods because of their discrete test statistic.

Different statistical algorithms test different alternative hypotheses Strategy Alternative hypothesis Minimal P-value At least one SNP in the gene or gene-set is associated with the trait Combined P-value The combined pattern of individual P-values provides evidence for association with the trait

Different algorithms: LD & Ngenes De Leeuw, Neale, Heskes, Posthuma. Nat Rev Genet, 2016

Gene-set analysis: Practical