Lecture 15 Regulatory variation and eQTLs Chris Cotsapas 6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution.

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Geuvadis RNAseq UNIGE Genetic regulatory variants
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Causes of regulatory variation in the human genome
Regulatory variation and its functional consequences Chris Cotsapas
Genetics of gene expression Stephen Montgomery montgomerylab.stanford.edu Stanford University School of Medicine.
An integrative genomics approach to infer causal associations between gene expression and disease Schadt, E. E., Lamb, J., Yang, X., Zhu, J., Edwards,
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Regulatory variation and eQTLs Chris Cotsapas
Speaker: HU Xue-Jia Supervisor: WU Yun-Dong Date: 19/12/2013.
Teresa Przytycka NIH / NLM / NCBI RECOMB 2010 Bridging the genotype and phenotype.
Identifying promoters and regulatory elements for DNA variation
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Genome-Wide Association Studies Xiaole Shirley Liu Stat 115/215.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Comparative Genomics II: Functional comparisons Caterino and Hayes, 2007.
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Manolis Kellis Broad Institute of MIT and Harvard
Department of Biomedical Informatics Bioinformatics and Genetics Kun Huang Department of Biomedical Informatics OSUCCC Biomedical Informatics Shared Resource.
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Geuvadis RNAseq analysis at UNIGE Analysis plans
Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques Min Wenwen
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Comments on Rare Variants Analyses Ryo Yamada Kyoto University 2012/08/27 Japan.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage.
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
1 Before considering selection, it’s important to characterize how gene expression varies within and between species. What evolutionary forces act on gene.
No reference available
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
Supplemental Figure 1. False trans association due to probe cross-hybridization and genetic polymorphism at single base extension site. (A) The Infinium.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
(1) Genotype-Tissue Expression (GTEx) Largest systematic study of genetic regulation in multiple tissues to date 53 tissues, 500+ donors, 9K samples, 180M.
Transcriptional Enhancers Looking out for the genes and each other Sridhar Hannenhalli Department of Cell Biology and Molecular Genetics Center for Bioinformatics.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
EQTLs.
Gil McVean Department of Statistics
Functional Mapping and Annotation of GWAS: FUMA
Gene Hunting: Design and statistics
Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison.
Linking Genetic Variation to Important Phenotypes
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
A systems view of genetics in chronic kidney disease
GWAS-eQTL signal colocalisation methods
Genetic and Epigenetic Regulation of Human lincRNA Gene Expression
Presentation transcript:

Lecture 15 Regulatory variation and eQTLs Chris Cotsapas 6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution

Module 4: Population / Evolution / Phylogeny L15/16: Association mapping for disease and molecular traits –Statistical genetics: disease mapping in populations (Mark Daly) –Quantitative traits and molecular variation: eQTLs, cQTLs L17/18: Phylogenetics / Phylogenomics –Phylogenetics: Evolutionary models, Tree building, Phylo inference –Phylogenomics: gene/species trees, coalescent models, populations L19/20: Human history, Missing heritability –Measuring natural selection in human populations –The missing heritability in genome-wide associations And done! Last pset Nov 11 (no lab), In-class quiz on Nov 20 –No lab 4! Then entire focus shifts to projects, Thanksgiving, Frontiers

Today: Regulatory variation and eQTLs 1.Quantitative Trait Loci (QTLs), Regulatory Variation – Molecular phenotypes as QTs: expression, chromatin… – Discretization: a GWAS for each gene. Cis-/Trans-eQTLs – Underlying regulatory variation: eQTLs, GWAS, cis-eQTL 2.Finding trans-eQTLs (distal from gene that varies) – Challenges: Power, structure, sample size – Cross-phenotype analysis: trans QTLs affect many genes 3.Identifying underlying regulatory mechanisms – Cis-eQTLs: TSS-distance, cell type specificity – eQTLs vs. GWAS: Expression as intermediate trait 4.Population differences, emerging efforts – Shared associations, SNP-gene pairs, allelic direction – Confound: environment, preparation, batch, ancestry

Quantitative traits - weight, height - anything measurable - today: gene expression QTLs (QT Loci) - The loci that control quantitative traits

Regulatory variation What do trait-associated variants do? Genetic changes to: – Coding sequence ** – Gene expression levels – Splice isomer levels – Methylation patterns – Chromatin accessibility – Transcription factor binding kinetics – Cell signaling – Protein-protein interactions Regulatory

BASIC CONCEPTS History, eQTL, mQTL, others

Within a population Damerval et al /72 protein levels differ in maize 2D electrophoresis, eyeball spot quantitation Problems: – genome coverage – quantitation – post-translational modifications Solution: use expression levels instead!

Usual mapping tools available Discretization approach

gene 3 Whole-genome eQTL analysis is an independent GWAS for expression of each gene gene 2gene N gene 5 gene 4 gene 1

cis-eQTL –The position of the eQTL maps near the physical position of the gene. –Promoter polymorphism? –Insertion/Deletion? –Methylation, chromatin conformation? trans-eQTL –The position of the eQTL does not map near the physical position of the gene. –Regulator? –Direct or indirect? Modified from Cheung and Spielman 2009 Nat Gen Genetics of gene expression (eQTL)

eQTL – THE ARRAY ERA yeast, mouse, maize, human

Yeast Brem et al Science 2002 Linkage in 40 offspring of lab x wild strain cross 1528/6215 DE between parents 570 map in cross – multiple QTLs – 32% of 570 have cis linkage 262 not DE in parents also map

trans hotspots Brem et al Science 2002

Yvert et al Nat Genet 2003

Mammals I F2 mice on atherogenic diet Expression arrays; WG linkage Schadt et al Nature 2003

Mammals II Chesler et al Nat Genet % !!

Mammals III No major trans loci in humans – Cheung et al Nature 2003 – Monks et al AJHG 2004 – Stranger et al PLoS Genet 2005, Science 2007

Today: Regulatory variation and eQTLs 1.Quantitative Trait Loci (QTLs), Regulatory Variation – Molecular phenotypes as QTs: expression, chromatin… – Discretization: a GWAS for each gene. Cis-/Trans-eQTLs – Underlying regulatory variation: eQTLs, GWAS, cis-eQTL 2.Finding trans-eQTLs (distal from gene that varies) – Challenges: Power, structure, sample size – Cross-phenotype analysis: trans QTLs affect many genes 3.Identifying underlying regulatory mechanisms – Cis-eQTLs: TSS-distance, cell type specificity – eQTLs vs. GWAS: Expression as intermediate trait 4.Population differences, emerging efforts – Shared associations, SNP-gene pairs, allelic direction – Confound: environment, preparation, batch, ancestry

WHERE ARE THE TRANS eQTLS? Open question

gene 3 Whole-genome eQTL analysis is an independent GWAS for expression of each gene gene 2gene N gene 5 gene 4 gene 1

Issues with trans mapping Power – Genome-wide significance is 5e -8 – Multiple testing on ~20K genes – Sample sizes clearly inadequate Data structure – Bias corrections deflate variance – Non-normal distributions Sample sizes – Far too small

But… Assume that trans eQTLs affect many genes… …and you can use cross-trait methods!

Association data Z 1,1 Z 1,2 ……Z 1,p Z 2,1 : : Z s,1 Z s,p

Cross-phenotype meta-analysis S CPMA ~ L(data | λ≠1) L(data | λ=1) Cotsapas et al, PLoS Genetics

CPMA detects trans mixtures

Open research questions Do trans effects exist? – Yes – heritability estimates suggest so. – Can we detect them? Larger cohorts? – Most eQTL studies ~ individuals – See later, GTEx Project Better methods? – Collapsing data? – PCA, summary statistics, modeling?

Today: Regulatory variation and eQTLs 1.Quantitative Trait Loci (QTLs), Regulatory Variation – Molecular phenotypes as QTs: expression, chromatin… – Discretization: a GWAS for each gene. Cis-/Trans-eQTLs – Underlying regulatory variation: eQTLs, GWAS, cis-eQTL 2.Finding trans-eQTLs (distal from gene that varies) – Challenges: Power, structure, sample size – Cross-phenotype analysis: trans QTLs affect many genes 3.Identifying underlying regulatory mechanisms – Cis-eQTLs: TSS-distance, cell type specificity – eQTLs vs. GWAS: Expression as intermediate trait 4.Population differences, emerging efforts – Shared associations, SNP-gene pairs, allelic direction – Confound: environment, preparation, batch, ancestry

CAN WE LEARN REGULATORY VARIATION FROM eQTL?

First, let’s define the question Can we use genetic perturbations as a way to understand how genes are regulated? In what groups, in which tissues? To what stimuli/signaling events? Do cis eQTLs perturb promoter elements? Do trans perturb TFs? Signaling cascades?

Most significant SNP per gene permutation threshold Significant associations are symmetrically distributed around TSS Stranger et al., PLoS Gen 2012

Cell type-specific and cell type-shared gene associations (0.001 permutation threshold) cell type No. of cell types with gene association 69-80% of cis associations are cell type-specific cis association sharing increases slightly when significance thresholds are relaxed Cell type specificity verified experimentally for subset of eQTLs Dimas et al Science 2009 Slide courtesy Antigone Dimas

Open research questions Do cis eQTLs perturb functional elements? – Given each is independent, how can we know? Do tissue-specific effects correlate with the expression of a gene across tissues? Or a regulator? – Perhaps a gene is expressed, but in response to different regulators across tissues? If we ever find trans eQTLs… – Common regulators of coregulated genes? – Tissue specificity? – Mechanisms?

APPLICATION TO GWAS Candidate genes, perturbations underlying organismal phenotypes

eQTLs as intermediate traits Schadt et al Nat Genet 2005

Modified from Nica and Dermitzakis Hum Mol Genet 2008 Exploring eQTLs in the relevant cell type is important for disease association studies cell type not relevant for disease relevant cell type for disease Importance of cataloguing regulatory variation in multiple cell types Slide courtesy Antigone Dimas

Barrett et al 2008 de Jager et al 2007

Franke et al 2010 Anderson et al 2011

Today: Regulatory variation and eQTLs 1.Quantitative Trait Loci (QTLs), Regulatory Variation – Molecular phenotypes as QTs: expression, chromatin… – Discretization: a GWAS for each gene. Cis-/Trans-eQTLs – Underlying regulatory variation: eQTLs, GWAS, cis-eQTL 2.Finding trans-eQTLs (distal from gene that varies) – Challenges: Power, structure, sample size – Cross-phenotype analysis: trans QTLs affect many genes 3.Identifying underlying regulatory mechanisms – Cis-eQTLs: TSS-distance, cell type specificity – eQTLs vs. GWAS: Expression as intermediate trait 4.Population differences, emerging efforts – Shared associations, SNP-gene pairs, allelic direction – Confound: environment, preparation, batch, ancestry

POPULATION DIFFERENCES

Shared association in 8 HapMap populations APOH: apolipoprotein H Stranger et al., PLoS Gen 2012

Number of genes with cis-eQTL associations 8 extended HapMap populations SRC: permutation threshold Stranger et al., PLoS Gen 2012

Direction of allelic effect same SNP-gene combination across populations AGREEMENT OPPOSITE Population 1Population 2 log 2 expression Stranger et al., PLoS Gen 2012

Slide courtesy Alkes Price

Population differences could have non-genetic basis Differences due to environment? (Idaghdour et al. 2008)‏ Differences in cell line preparation? (Stranger et al. 2007)‏ Differences due to batch effects? (Akey et al. 2007)‏ (Reviewed in Gilad et al. 2008)‏ Slide courtesy Alkes Price

Gene expression experiment Does gene expression in 60 CEU + 60 YRI vary with ancestry? Does gene expression in 89 AA vary with % Eur ancestry? 60 CEU + 60 YRI from HapMap, 89 AA from Coriell HD100AA Gene expression measurements at 4,197 genes obtained using Affymetrix Focus array c Slide courtesy Alkes Price

Gene expression differences in African Americans validate CEU-YRI differences c = 0.43 (± 0.02)‏ (P-value < )‏ 12% ± 3% in cis Slide courtesy Alkes Price

EMERGING EFFORTS RNAseq, GTEx

RNAseq questions Standard eQTLs – Montgomery et al, Pickrell et al Nature 2010 Isoform eQTLs – Depth of sequence! Long genes are preferentially sequenced Abundant genes/isoforms ditto Power!? Mapping biases due to SNPs

Strategies for transcript assembly Garber et al. Nat Methods 8:469 (2011)

GTEx – Genotype-Tissue EXpression An NIH common fund project Current: 35 tissues from 50 donors Scale up: 20K tissues from 900 donors. Novel methods groups: 5 current + RFA

RNAseq combined with other techs Regulons: TF gene sets via CHiP/seq – Look for trans effects Open chromatin states (Dnase I; methylation) – Find active genes – Changes in epigenetic marks correlated to RNA – Genetic effects RNA/DNA comparisons – Simultaneous SNP detection/genotyping – RNA editing ???

Today: Regulatory variation and eQTLs 1.Quantitative Trait Loci (QTLs), Regulatory Variation – Molecular phenotypes as QTs: expression, chromatin… – Discretization: a GWAS for each gene. Cis-/Trans-eQTLs – Underlying regulatory variation: eQTLs, GWAS, cis-eQTL 2.Finding trans-eQTLs (distal from gene that varies) – Challenges: Power, structure, sample size – Cross-phenotype analysis: trans QTLs affect many genes 3.Identifying underlying regulatory mechanisms – Cis-eQTLs: TSS-distance, cell type specificity – eQTLs vs. GWAS: Expression as intermediate trait 4.Population differences, emerging efforts – Shared associations, SNP-gene pairs, allelic direction – Confound: environment, preparation, batch, ancestry