Regulatory variation and its functional consequences Chris Cotsapas

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Regulomics II: Epigenetics and the histone code Jim Noonan GENE760.
Geuvadis RNAseq UNIGE Genetic regulatory variants
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Gene Expression Levels Are a Target of Recent Natural Selection in the Human Genome Mol. Biol. Evol. 26(3):649– Journal Club
Causes of regulatory variation in the human genome
Regulatory variation and eQTLs Chris Cotsapas
Speaker: HU Xue-Jia Supervisor: WU Yun-Dong Date: 19/12/2013.
Identifying promoters and regulatory elements for DNA variation
[Bejerano Fall10/11] 1 Thank you for the midterm feedback! Projects will be assigned shortly.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
Identification of obesity-associated intergenic long noncoding RNAs
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Comparative Genomics II: Functional comparisons Caterino and Hayes, 2007.
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Department of Biomedical Informatics Bioinformatics and Genetics Kun Huang Department of Biomedical Informatics OSUCCC Biomedical Informatics Shared Resource.
ENCODE enhancers 12/13/2013 Yao Fu Gerstein lab. ‘Supervised’ enhancer prediction Yip et al., Genome Biology (2012) Get enhancer list away to genes DNase.
1 1 - Lectures.GersteinLab.org Overview of ENCODE Elements Mark Gerstein for the "ENCODE TEAM"
Nuevas perspectivas en análisis genomico: implicaciones del proyecto ENCODE 1 Rory Johnson Bioinformatics and Genomics Centre for Genomic Regulation AEEH.
Geuvadis RNAseq analysis at UNIGE Analysis plans
Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques Min Wenwen
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Comments on Rare Variants Analyses Ryo Yamada Kyoto University 2012/08/27 Japan.
Current Topics in Genomics and Epigenomics – Lecture 2.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Small RNAs and their regulatory roles. Presented by: Chirag Nepal.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Supplemental Figure 1A. A small fraction of genes were mapped to >=20 SNPs. Supplemental Figure 1B. The density of distance from the position of an associated.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Thoughts on ENCODE Annotations Mark Gerstein. Simplified Comprehensive (published annotation, mostly in '12 & '14 rollouts)
Lecture 15 Regulatory variation and eQTLs Chris Cotsapas 6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution.
1 Before considering selection, it’s important to characterize how gene expression varies within and between species. What evolutionary forces act on gene.
Overview of ENCODE Elements
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
Supplemental Figure 1. False trans association due to probe cross-hybridization and genetic polymorphism at single base extension site. (A) The Infinium.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
Genomics 2015/16 Silvia del Burgo. + Same genome for all cells that arise from single fertilized egg, Identity?  Epigenomic signatures + Epigenomics:
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Transcriptional Enhancers Looking out for the genes and each other Sridhar Hannenhalli Department of Cell Biology and Molecular Genetics Center for Bioinformatics.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Integrative Genomics. Double-helix DNA strands are separated in the gene coding region Which enzyme detects the beginning of a gene ? RNA Polymerase (multi-subunit.
EQTLs.
Functional Elements in the Human Genome
Functional Mapping and Annotation of GWAS: FUMA
Gene Hunting: Design and statistics
Interpreting noncoding variants
Linking Genetic Variation to Important Phenotypes
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Xin Li, Alexis Battle, Konrad J. Karczewski, Zach Zappala, David A
Are Interactions between cis-Regulatory Variants Evidence for Biological Epistasis or Statistical Artifacts?  Alexandra E. Fish, John A. Capra, William.
GWAS-eQTL signal colocalisation methods
Genetic and Epigenetic Regulation of Human lincRNA Gene Expression
Large-Scale trans-eQTLs Affect Hundreds of Transcripts and Mediate Patterns of Transcriptional Co-regulation  Boel Brynedal, JinMyung Choi, Towfique Raj,
Presentation transcript:

Regulatory variation and its functional consequences Chris Cotsapas

Motivating questions How do phenotypes vary across individuals? – Regulatory changes drive cellular and organismal traits – Likely also drive evolutionary differences How are genes (co)regulated? – Pathways, processes, contexts

Regulatory variation What do “interesting” variants do? Genetic changes to: – Coding sequence ** – Gene expression levels – Splice isomer levels – Methylation patterns – Chromatin accessibility – Transcription factor binding kinetics – Cell signaling – Protein-protein interactions ~88% of GWAS hits are regulatory

Genetic variation alters regulation Protein levels – Maize (Damerval 94) Expression levels – Yeast, maize, mouse, humans (Brem 02, Schadt 03, Stranger 05, Stranger 07) RNA splicing – Humans (Pickrell 12, Lappalainen 13) Methylation and Dnase I peak strength – Humans (Degner 12; Gibbs 12)

cis-eQTL –The position of the eQTL maps near the physical position of the gene. –Promoter polymorphism? –Insertion/Deletion? –Methylation, chromatin conformation? trans-eQTL –The position of the eQTL does not map near the physical position of the gene. –Regulator? –Direct or indirect? Modified from Cheung and Spielman 2009 Nat Gen Genetics of gene expression (eQTL)

Cis- eQTL analysis: Test SNPs within a pre-defined distance of gene 1Mb SNPs gene probe 1Mb window

QT association Analysis of the relationship between a dependent or outcome variable (phenotype) with one or more independent or predictor variables (SNP genotype) Y i =   +   X i +  i Number of A1 Alleles 012 Continuous Trait Value  Slope:   Linear Regression Equation Logistic Regression Equation =  + Xi + i=  + Xi + i ln ( ) pipi (1-p i )

gene 3 eQTL analysis: a GWAS for every gene gene 2gene N gene 5 gene 4 gene 1

cis-eQTLs are rather common Nica et al PLoS Genet 2011

Cis-eQTLs cluster around TSS Stranger et al PLoS Genet 2012

trans hotspots (yeast) Brem et al Science 2002

Yvert et al Nat Genet 2003

DOES REGULATORY VARIATION ALTER PHENOTYPE? APPLICATION TO GWAS Candidate genes, perturbations underlying organismal phenotypes

Rationale How do disease/trait variants actually alter biology? If they change regulation, then: – Change in gene expression/isoform use – Phenotypic consequence*

Compare patterns of association GWAS peak eQTL for gene 1 eQTL for gene 2

Pearson’s covariance for windows of 51 SNPs between –log(p) in 2 traits CD GWAS p eQTL p Detect a peak when effect is the same No peak when there are independent hits near each other

Crohn’s/eQTL analysis CD meta analysis (GWAS only) CEU Hapmap LCL eQTL data Overlapping SNPs only (eQTL data has 610K SNPs, most in CD meta-analysis) Test 133 associations (total 1054 tests) GWAS peak eQTL for gene 1 eQTL for gene 2

Crohn’s/eQTL analysis SNPCHRGene rs PTGER4 rs ATG16L1 rs SPNS1 rs INPP5E rs C22orf29 A peak implies that the same effect drives GWAS and eQTL

MS/eQTL analysis SNPCHRGene rs PTGER4 rs CDK2AP rs CISD2 rs GOLGB1 & EAF2 rs METTL1 & TSFM rs ORMDL3, STARD3 & ZPBP2 rs PPM1F rs SLC30A7 rs SLC44A2 A peak implies that the same effect drives GWAS and eQTL

DOES REGVAR REVEAL CO-REGULATION? A.K.A. WHERE ARE THE TRANS eQTLS? Open question

gene 3 Whole-genome eQTL analysis is an independent GWAS for expression of each gene gene 2gene N gene 5 gene 4 gene 1

Issues with trans mapping Power – Genome-wide significance is 5e -8 – Multiple testing on ~20K genes – Sample sizes clearly inadequate Data structure – Bias corrections deflate variance – Non-normal distributions Sample sizes – Far too small

But… Assume that trans eQTLs affect many genes… …and you can use cross-trait methods!

Association data Z 1,1 Z 1,2 ……Z 1,p Z 2,1 : : Z s,1 Z s,p

Cross-phenotype meta-analysis S CPMA ~ L(data | λ≠1) L(data | λ=1) Cotsapas et al, PLoS Genetics

CPMA for correlated traits Empirical assessment to account for correlation Simulate Z scores under covariance, recalculate CPMA Construct distribution of CPMA for dataset, call significance with Ben Voight, U Penn

Experimental design 610,180 SNPs MAF >0.15 CEU and YRI LD pruned (r 2 < 0.2) 8368 transcripts Detectable on Illumina arrays 108 CEU individuals* 109 YRI individuals* * Stranger et al Nat Genet 2007 (LCL data; publicly available) CEU p-values Transcript ~ SNP, sex YRI p-values Transcript ~ SNP, sex plink CPMA CEU CPMA scores YRI CPMA scores >95%ile sim CPMA

Target sets of genes trans-acting variant: SNP with CPMA evidence Target genes: genes affected by trans-acting variant (i.e. regulon)

Prediction 1 Allelic effects should be conserved between two populations – Binomial test on paired observations for all genes P < 0.05 in at least one population True for 1124/1311 SNPs (binomial p < 0.05) Genes p CEU < 0.05 Genes p YRI < 0.05 CEU++--+ YRI

Prediction 2 Target genes should overlap – Identify by mixture of gaussians classification – Empirical p from distribution of overlaps between N CEU and N YRI genes across SNPs. True for 600/1311 SNPs (empirical p < 0.05) Genes p CEU < 0.05 Genes p YRI < 0.05

What about the target genes? Regulons: – Encode proteins more connected than expected by chance Rossin et al 2011 PLoS Genetics

What about the target genes? Regulons: – Encode proteins enriched for TF targets (ENCODE LCL data) – 24/67 filtered TFs significant – Binomial overlap test TFp-value CEBPB3.7 x HDAC87.8 x FOS2.5 x JUND3.7 x NFYB3.3 x ETS13.8 x FAM48A2.1 x FOXA11.4 x GATA14.6 x HEY17.8 x trans target genes CHiPseq LCL target genes

Summary Regulatory variation is common It affects gene expression levels Likely many other types: – DNA accessibility, chromatin states – Transcript splicing, processing, turnover Has phenotypic consequences – GWAS – Some cellular assays (not discussed here)

Open questions Discover regulatory elements (cis) – Promoters, enhancers etc Gene regulatory circuits (trans) Dynamics of regulation – Splicing variation, processing, degradation Phenotypic consequences – Cellular assays required Tie in to organismal phenotype

NEXT-GEN SEQUENCING DATA RNAseq, GTEx

GTEx – Genotype-Tissue EXpression An NIH common fund project Current: 35 tissues from 50 donors Scale up: 20K tissues from 900 donors. Novel methods groups: 5 current + RFA

How can we make RNAseq useful? Standard eQTLs – Montgomery et al, Pickrell et al Nature 2010 Isoform eQTLs – Depth of sequence! Long genes are preferentially sequenced Abundant genes/isoforms ditto Power!? Mapping biases due to SNPs

RNAseq combined with other techs Regulons: TF gene sets via CHiP/seq – Look for trans effects Open chromatin states (Dnase I; methylation) – Find active genes – Changes in epigenetic marks correlated to RNA – Genetic effects RNA/DNA comparisons – Simultaneous SNP detection/genotyping – RNA editing ???