Genetics of gene expression

Slides:



Advertisements
Similar presentations
Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Advertisements

Geuvadis RNAseq UNIGE Genetic regulatory variants
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Genetic Analysis in Human Disease
Genetics of gene expression Stephen Montgomery montgomerylab.stanford.edu Stanford University School of Medicine.
Class activity: What are my asthma variants doing? In the subset of individuals for whom expression data are available, the T nucleotide allele at rs
Introduction to Medical Genetics Fadel A. Sharif.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Identification of obesity-associated intergenic long noncoding RNAs
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Geuvadis RNAseq analysis at UNIGE Analysis plans
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Genetics Journal Club Sumeet A. Khetarpal 10 December 2015.
Interpreting exomes and genomes: a beginner’s guide
Single Nucleotide Polymorphisms (SNPs
EQTLs.
Genomic Analysis: GWAS
Common variation, GWAS & PLINK
Nucleotide variation in the human genome
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
Genetic Testing for the Clinician
Functional Mapping and Annotation of GWAS: FUMA
Sunday, Tuesday & Thursday 2-3
Unit 3.
Genome Wide Association Studies using SNP
Detection of genes causing Fibromyalgia
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Gene Hunting: Design and statistics
Case Study #2 Session 1, Day 3, Liu

The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans Science Volume 348(6235): May 8, 2015 Published by AAAS.
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Polymorphisms GWAS traits.
Genome-wide Associations
Beyond GWAS Erik Fransen.
Linking Genetic Variation to Important Phenotypes
by Manuel A. Rivas, Matti Pirinen, Donald F
Type 2 Diabetes With type 2 diabetes, your body either resists the effects of insulin — a hormone that regulates the movement of sugar into your cells.
Genetic-Variation-Driven Gene-Expression Changes Highlight Genes with Important Functions for Kidney Disease  Yi-An Ko, Huiguang Yi, Chengxiang Qiu, Shizheng.
Eliza Congdon, Russell A. Poldrack, Nelson B. Freimer  Neuron 
Psychiatric Disorders: Diagnosis to Therapy
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Polymorphisms GWAS traits.
Exercise: Effect of the IL6R gene on IL-6R concentration
Identification and Validation of Genetic Variants that Influence Transcription Factor and Cell Signaling Protein Levels  Ronald J. Hause, Amy L. Stark,
Rajiv C. McCoy, Jon Wakefield, Joshua M. Akey  Cell 
Genetic Regulatory Mechanisms of Smooth Muscle Cells Map to Coronary Artery Disease Risk Loci  Boxiang Liu, Milos Pjanic, Ting Wang, Trieu Nguyen, Michael.
Medical genomics BI420 Department of Biology, Boston College
Genetics of Human Cardiovascular Disease
Xin Li, Alexis Battle, Konrad J. Karczewski, Zach Zappala, David A
Psychiatric Disorders: Diagnosis to Therapy
Medical genomics BI420 Department of Biology, Boston College
Presentation by: Hannah Mays UCF - BSC 4434 Professor Xiaoman Li
GWAS-eQTL signal colocalisation methods
SNPs and CNPs By: David Wendel.
Analysis of protein-coding genetic variation in 60,706 humans
Genetic and Epigenetic Regulation of Human lincRNA Gene Expression
by Manuel A. Rivas, Matti Pirinen, Donald F
Amanda L. Tapia Department of Biostatistics
Presentation transcript:

Genetics of gene expression Stephen Montgomery smontgom@stanford.edu montgomerylab.stanford.edu @sbmontgom

Challenges in human genetics Identifying the genes and variants responsible for disease. Monogenic to polygenic diseases Rare to common variants Unifying variation of different types SNPs, indels, SVs Regulatory and coding variation Translating impact to health Individual risk factors Population risk factors

Disease starts at a cellular level Individual Organs and tissues Cells DNA Understanding the influence of genetics on cells will improve our ability to predict disease risk

Genetic studies of gene expression Explore impact of genetic variation on transcriptome diversity SNP A Expression of nearby genes Cellular processes Gene splicing/isoforms Disease risk Expression of distant genes

Canonical model

Genetic association can pinpoint regulatory haplotypes Population Sample C Expression CC CG GG G We can identify genetic variants impacting gene expression (eQTLs)

The landscape of regulatory variation Chr1 Chr2 Chr3... Chr1 Chr2 Chr3 ... trans-effects Transcription factor cis-effects Location of genetic variants by the gene’s whose expression they impact

Advantages to studying the genetics of gene expression Can rapidly evaluate 1000s of quantitative traits Can identify genetic regulatory networks Can easily transform or perturb the system. Variants are directly connected to cellular mechanism.

eQTL can aid in identifying candidate genes for GWAS variants Sharing of association implicates genes and type of effect Montgomery et al, Nat Rev Genetics, 2011

Discovery of eQTL depends on Biological factors Technological factors

Biological factors influencing eQTL discovery Trait biology Dimas et al. Science, 2009 Ancestry Environment Stranger, PLoS Genetics, 2012

Cell or tissue type How ubiquitous are eQTLs and potential disease mechanisms in different tissues.

Sharing dependent on tissues compared and sample sizes 69-80% of cis associations are cell type-specific Dimas et al Science 2009 50% specific (adipose and blood) Emilsson et al Nature 2008 >50% specific (cortical tissue and peripheral blood) GTEx Consortium, Science, 2015 Heinzen et al PloS Biology 2008 However, all estimates depend on eQTL discovery FDR and method for assessing sharing

Tissue-shared and specific eQTLs GTEx Project (v6): 44 tissues; ~9000 samples Aguet et al., bioRxiv, 2016

What are migraine variants doing in different tissues? We identified the minor allele of rs1835740 on chromosome 8q22.1 to be associated with migraine (P = 5.38 × 10−9, odds ratio = 1.23, 95% CI 1.150–1.324) in a genome-wide association study of 2,731 migraine cases ascertained from three European headache clinics and 10,747 population-matched controls. In an expression quantitative trait study in lymphoblastoid cell lines, transcript levels of the MTDH were found to have a significant correlation to rs1835740 (P = 3.96 × 10−5, permuted threshold for genome-wide significance 7.7 × 10−5). Anttila et al, Nature Genetics, 2011

Many existing hypotheses could be in potentially unrelated tissues

Lots of eQTL data means that significant associations are the norm

eQTL/GWAS interpretation needs to be reviewed cautiously New approaches based on colocalization of eQTL and GWAS signals are an active area of methods development

Studied population How ubiquitous are eQTLs and potential disease mechanism in different populations.

Not all eQTL shared across populations “We have reported that many genes showing cis associations at the 0.001 permutation threshold are shared (about 37%) in at least two populations … In 95–97% of the shared associations, the direction of the allelic effect was the same across populations, and the discordant 3–5% was of the same order as the FDR.” Stranger et al, Nat Genetics, 2007 If we know the genetic basis of a disease can we predict its population frequency from cellular models of that disease? Stranger et al, PLoS Genetics, 2012

What are BMI variants doing in different populations? rs713586 explained 0.06% of BMI variance Speliotes, Nature Genetics, 2010

Combining populations has advantages Multiple populations do well at fine-mapping causal variants; however their design results in a reduction of power 1000 Genomes, Nature, 2015 Zaitlen, AJHG, ; 86(1): 23–33. 2010

Environment studies Determining how eQTLs behave under stimulus

Answer: GxE discoveries have been study dependent “We carried out large-scale induction experiments using primary human bone cells derived from unrelated donors of Swedish origin treated with 18 different stimuli (7 treatments and 2 controls, each assessed at 2 time points). … We found that 93% of cis-eQTLs at 1% FDR were observed in at least one additional treatment, and in fact, on average, only 1.4% of the cis-eQTLs were considered as treatment-specific at high confidence. “ - Grundberg PloS Genetics 7(1). 2011

LPS response eQTLs Orozco et al, Cell, 2012

LPS, influenza, and interferon-β (IFN-β) response-eQTLs DCs in 534 people Lee, Science, 2014 Approach reveals common alleles that explain inter-individual variation in pathogen sensing and provides functional annotation for genetic variants that alter susceptibility to inflammatory diseases.

Detecting GxE in a population sample Depression Genes and Networks study 922 individuals Detailed records of behavior, environment, and medical history Genotype 720K autosomal SNPs (Illumina Omni1-Quad) RNA-sequencing from primary tissue RNA from whole blood 70,000,000 reads average, single ended, 50bp reads A Battle et al., Genome Research, 2014; Mostafavi et al., Mol. Psychiatry, 2014

Gene-by-environment effects with RNA-seq Allele specific expression quantified from RNA-seq Compare environmental impact on different genetic backgrounds within each individual Single sample provides a highly controlled comparison Environment Copy from parent 1 Copy from parent 2 DNA T C Over-expression Normal expression RNA-seq T T C T T T Suggests environment affects the two alleles differently David Knowles, bioarxiv

Gene-by-environment effects with RNA-seq Model of allele-specific read counts Account for (structured) over-dispersion Incorporate confounders Optimized in large cohort

Improved power than standard test 23 genes with significant evidence of G x E effects on cis-regulation of gene expression Only 3 from standard linear model for QTL interaction testing Sources of improved power: Integrate over entire cis-regulatory landscape (potentially including rare variants) Controlled within individual test Directly modeling read counts

Example associations Exercise and DYSF: skeletal muscle protein, involved in contraction and muscle repair BP meds and NPRL3: related to genes involved in homeostasis of fluid volume Dysferlin. Vanin 1, interleukin receptor.

Sex as an environment influences eQTL discovery Kukurba et al, Genome Res, 2016

Discovery of eQTL depends on technological factors Gene expression technology PCR-based, array-based, sequencing-based Genotyping technology array-based, sequencing-based Sample size More individuals and/or families yields more power to detect association with particular effect sizes. (Lowers FDR). Early studies used 18-30 families or 45-60 unrelated individuals.

The biases we don’t know about: Hidden factors can cause false associations Hidden technical and biological variables. i.e. population, sex, date of processing However, correcting these factors can remove true signals (i.e. master regulators)

Methods to correct hidden factors Factor analysis on 40 global factors has tripled eQTL discovery. Surrogate variable analysis, has increased by 20% eQTL discovery - Stegle, PLoS Computational Biology, 2010 - Leek, PLoS Genetics, 2007

Next generation sequencing has increased our ability to survey the transcriptome. Lots more quantitative traits RNA-Seq Montgomery, Nature 2010 Pickrell, Nature 2010 ChIP-Seq McDaniell, Science 2010

Increased resolution of transcriptome through RNA- sequencing Hybrid transcripts Fusion genes Quantification Alternative splicing Transcript termination Exons Transcripts* Genes* .GAG... x50 .GAG... x50 ..GGGU .GAG GTG.. .GAG GTG.. ..GGGTAGGA.. .TAG GTC.. .TAG GTC.. ..GGGCAGGA.. .UAG... x25 .UAG... x25 ..GGGCAGGA.. Sequencing read Unannotated structure ..GGGU.. x50 ..GGGTAGGA.. ..GGGUAGGA.. ..GGGCAGGA.. ..GGGCAGGA.. ..GGGC.. x25 Allele-specific expression, Escape from X-inactivation RNA Editing

Splicing eQTL Can investigate relative transcript ratios or reads across junctions. Splicing also affected for many genes cis eQTLs 10914 sQTLs Number eQTLs 6738 2851 1158 200 400 600 800 1000 Number individuals Battle et al, Genome Research, 2014 Katz et al, Nature Methods, 2010

Advantages of ASE Test within an individual allelic imbalance, given one has sufficient reads.

Using ASE to detect GWAS signals driven by multiple causal variants GWAS variant genotype LACK OF ASE FOR HOMS ABUNDANT ASE FOR HETS ASE LACK OF ASE FOR HOMS Tests functional differences between alleles in population Lucia Conde et al, AJHG, 2013

prSNPs that are also eQTL are enriched in functional annotations Intersection of ASE-QTL and eQTL is more likely to localize a causal variant Tuuli Lappalainen et al., Nature, 2013

ASE allows finer interpretation of coding alleles 18.2% (1502 of 8233) Dimas, 2008 46.2% nonsynonymous sites where ASE can be detected are significant in 1 indiv. Montgomery et al., PLoS Genetics, 2011 Lappalainen et al., AJHG, 2011

Compound inheritance of regulatory and coding polymorphism causes disease The exon-junction complex (EJC) performs essential RNA processing tasks1–5. Here, we describe the first human disorder, thrombocytopenia with absent radii (TAR)6, caused by deficiency in one of the four EJC subunits. The thrombocytopenia with absent radii (TAR) syndrome is characterized by a reduction in the number of platelets (the cells that make blood clot) Albers, Nature Genetics, 2012

eQTL discovery doesn’t capture rare variant effects Nelson et al. Science, 2012 202 genes sequenced in 14,002 people

Challenges with studying the impact of rare non-coding variants No code for non-coding regions Lack of cohort with both high quality genomes and transcriptomes Power challenges with unrelated individuals (N=1) Few cohort with large numbers of individuals (N>1000, 10000) Which rare non-coding variants are meaningful?

Outlier gene expression can be used to identify rare non-coding variants In large families In many small families Across tissues Li et al. AJHG 2014 In SardiNIA

Sardinia as a unique study population

Identifying outliers in Sardinia trios Using LRT FDR Under Over Total 0.05 466 249 715 0.10 529 280 809 Parent Z-Score *** binomial exact p-value < 1e-16 Child Z-Score over-expression under-expression

Allelic expression in gene expression outliers implicates heterozygous regulatory effects

Using family outliers and genome sequencing to identify candidate rare variants

Outlier gene expression can be used to identify rare non-coding variants In large families In many small families Across tissues Li et al. AJHG 2014 In GTEx

Rare variants in cross-tissue gene expression outliers

Can we use outlier based approaches to help diagnose rare, unsolved genetic diseases?

Identify extreme expression outliers Exomes Transcriptomes Genomes + / = Diagnoses Goal: Identify extreme expression outliers Measure impact of potential LoF mutations Identify aberrant splicing outliers

Case study: unexplained hypomyelination Child presented with attention and motor skills deficits at age 5 MRI gave no specific diagnosis but showed delayed myelination ?

Advantages to genetic studies of gene expression Cost-effective Build cellular models of disease Survey diagnostic responses to treatments Identify diverse disease mechanisms; move us beyond protein coding mutations alone Identify pathological tissues Allow us to identify effects (or transferability) in different populations Classify undiagnosed conditions

montgomerylab.stanford.edu