Causes of regulatory variation in the human genome

Slides:



Advertisements
Similar presentations
Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
Geuvadis RNAseq UNIGE Genetic regulatory variants
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Inference of cis and trans regulatory variation in the human genome
Gene Expression Levels Are a Target of Recent Natural Selection in the Human Genome Mol. Biol. Evol. 26(3):649– Journal Club
Regulatory variation and its functional consequences Chris Cotsapas
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Regulatory variation and eQTLs Chris Cotsapas
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Teresa Przytycka NIH / NLM / NCBI RECOMB 2010 Bridging the genotype and phenotype.
Introduction to Computational Biology Topics. Molecular Data Definition of data  DNA/RNA  Protein  Expression Basics of programming in Matlab  Vectors.
Comparative Genomic Hybridization (CGH). Outline Introduction to gene copy numbers and CGH technology DNA copy number alterations in breast cancer (Pollack.
Whole Genome Polymorphism Analysis of Regulatory Elements in Breast Cancer AAGTCGGTGATGATTGGGACTGCTCT[C/T]AACACAAGCGAGATGAAGAAACTGA Jacob Biesinger Dr.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang.
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Genome Variations & GWAS
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Geuvadis RNAseq analysis at UNIGE Analysis plans
Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques Min Wenwen
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
CDNA Microarrays MB206.
IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*,
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Supplemental Figure 1A. A small fraction of genes were mapped to >=20 SNPs. Supplemental Figure 1B. The density of distance from the position of an associated.
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Quantitative Genetics
Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Methods in genome wide association studies. Norú Moreno
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Other genomic arrays: Methylation, chIP on chip… UBio Training Courses.
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Lecture 15 Regulatory variation and eQTLs Chris Cotsapas 6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution.
The International Consortium. The International HapMap Project.
1 Before considering selection, it’s important to characterize how gene expression varies within and between species. What evolutionary forces act on gene.
Motivations to study human genetic variation
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
Supplemental Figure 1. False trans association due to probe cross-hybridization and genetic polymorphism at single base extension site. (A) The Infinium.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Signals of natural selection in the HapMap project data The International HapMap Consortium Gil McVean Department of Statistics, Oxford University.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
NCSU Summer Institute of Statistical Genetics, Raleigh 2004: Genome Science Session 3: Genomic Variation.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Global Variation in Copy Number in the Human Genome
Gene Hunting: Design and statistics
Relationship between Genotype and Phenotype
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Gene-Expression Variation Within and Among Human Populations
Presentation transcript:

Causes of regulatory variation in the human genome Manolis Dermitzakis The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Cambridge, UK md4@sanger.ac.uk

Human Genome: ~25,000 genes 1-1.5% of the human DNA is coding Is the remaining 98.5% “junk”

Gene expression as a phenotype Altered patterns of gene expression  disease. e.g., Type 1 diabetes, Burkitt’s lymphomas. Widespread intraspecific variation. Heritable genetic variation for transcript levels. Familial aggregation of expression profiles (Cheung et al. 2003). In humans, ~30% of surveyed loci exhibited a genetic component for expression differences (Monks et al. 2004; Schadt et al. 2003). Much of the influential variation is located cis- to the coding locus. In humans, mouse, and maize, 35%-50% of the genetic basis for intraspecific differences in transcription level are cis- to the coding locus (e.g. Morley et al. 2004; Schadt et al. 2003; Stranger et al. 2005; Cheung et al. 2005, etc.). As an introduction, I’d like to give you a couple of quick facts about gene expression: In general, normal cell function and many aspects of development are highly dependent on having the right genes transcribed at the right time and place…and certainly some diseases are associated with altered patterns of gene expression. Extreme effects or subtle changes in expression. However, in many species there is also quite a lot of variation among individuals with respect to gene expression patterns. Much of this variation has a genetic component, for example in humans, nearly 30% of surveyed loci exhibited a genetic component for expression differences. And more studies are showing that much of the genetic component influencing expression is located cis- to the coding locus, for example a survey of humans mouse and maize estimates that approx 30-50% is attributable to cis-located variants Stranger and Dermitzakis 2006

Why study gene expression Describe and dissect regulatory variation Annotate regulatory elements in the human genome Support disease studies to interpret statistical signals Distribution of molecular effects in the genome Natural selection

Outline Gene expression variation – recent studies Analysis of gene expression with HapMap phase II SNPs Update on CNV-expression associations Natural selection and cis regulatory effects

Nature of regulatory variation DNA REG GENE i) Pre-mRNA ii) mRNA iii) Protein iv) DNA Expression Stranger and Dermitzakis, Human Genomics 2005

Effects of Copy Number Variation on gene expression Copy number variation kb to Mb size variable DNA copy number contribute to disease heritable common in humans/recently appreciated

Gene expression association mapping AA AG GG Quantitative phenotype Stranger et al. PLoS Genet 2005

Whole-genome gene expression ~48,000 transcripts 24,000 RefSeq 24,000 other transcripts 270 HapMap individuals: CEU: 30 trios, 90 total CHB: 45 unrelated JPT: 45 unrelated YRI: 30 trios, 90 total 2 IVTs each person 2 replicate hybridizations each IVT Quantile normalization of all replicates of each individual. Median normalization across all individuals of a population. illumina Human 6 x 2 gene GEX arrays Cell line RNA IVT1 IVT2 rep1 rep2 rep3 rep4

HapMap SNPs 60 CEU 45 CHB 44 JPT 60 YRI 14,072 genes Phase I HapMap; MAF > 0.05 CEU: 762,447 SNPs CHB: 695,601 JPT: 689,295 YRI: 799,242 ~1/5kb The number of expression phenotypes is not a direct correlation to the number of genes in these regions because there were 2 probes per gene

Copy Number Variation dataset Genome Structural Variation Consortium Redon et al. Nature Nov 22, 2006 Array-CGH using a whole genome tile path array Median clone size ~170 kb All 270 HapMap individuals Quantitative values (log2 ratios) representing diploid genome copy number, not genotypes. 1117 CNVs called from log2 ratios Calls based on standard deviation of log2 ratios Many CNVs experimentally verified 26,563 clones 93.7% euchromatic genome

Linear regression for SNPs CNV and expression Clone signal (log2 ratio)

SNP cis-analysis: SNPs within 1Mb of probe midpoint 2Mb window probe gene SNPs

CNV cis-analysis: clone midpoint within 2Mb of probe midpoint 4Mb window probe gene clones

Permutation GENOTYPES GENE EXPRESSION g11 g12 g13 g14 … g1n gi1 gi2 gi3 gi4 … gin Exp1 Exp2 Exp3 … Expi permute - 10,000 permutations – each time keep lowest p-value - Null distribution of 10,000 extreme p-values - Compare observed p-values to the tails of the null Doerge and Churchill 1996

CNV vs. SNP associations Stranger et al. Science 2007

CNVs and SNPs mostly capture different effects Relative impact on gene expression: 82% SNPs 18% CNVs Only 13% of genes with CNV association also had a SNP association in the same population biased toward large effect size. CNV and SNP variation are highly correlated (p-value 0.001).

2 batches of 60 CEU individuals Custom vs. Genome-wide [Stranger et al. 2005 PLoS Genet and Stranger et al. 2007 Science] 2 batches of 60 CEU individuals grown independently at two different labs RNA extraction and labelling by different labs and people Run in custom and gw illumina arrays 97% of associations at the 0.05 permutation threshold from the custom array analysis were also detected in gw analysis

HapMap phase II analysis ~ 4 million SNP genotypes made publicly available for the 270 HapMap individuals. Density: 1 SNP/ 700 bps Includes ~50% of expected common SNPs in these populations. 2.2 million SNPs analyzed (MAF>0.05)

Phase I vs. Phase II cis- significant genes (0.001) phase I HapMap both phase II HapMap CEU 286 258 299 CHB 317 269 318 JPT 337 297 341 YRI 356 310 394 90% 86% 85% 85% 87% 87% 87% 79%

Phase I vs. Phase II

Population sharing of cis- associations

Associated SNP position relative to TSS

Distribution of regulatory elements around the TSS ENCODE Nature 2007

Direction of allelic effect same SNP-gene combination across populations AGREEMENT log2 expression log2 expression OPPOSITE log2 expression log2 expression

Direction of allelic effect

Pooling populations Spurious associations Pop1 Pop2 Pop1 Pop2

Conditional permutations Permute data within each pop separately then perform test X 4

Multi-population analysis

Figure 2A Proportion of single pop cis associated genes detected in multi-population analysis Number of populations sharing association in cis: single population analysis

SGPP2

Trans- phase II HapMap association Biological hypotheses: functional categories Regulatory SNPs identified from cis- analysis (52%) Non-synonymous SNPs (39%) Splice site SNPs (7%) miRNA SNPs (1%) DNA REG GENE rSNPs nsSNPs spliceSNPs miRNA SNPs Genome-wide associations Network analysis GENE ~ 25,000 SNPs per population x 14,072 genes

Trans- associations 10-3 threshold correction at 0.001 15 genes estimated false positives FDR = 33%-39% correction at 0.01 150 genes estimated false positives FDR = 60%-75% 14,072 genes tested

Enrichment of regulatory SNPs and deficit of nsSNPs in trans- associations regulatory SNPs (cis 0.001) ns SNPs splice SNPs miRNA SNPs ratio p-value Ratio CEU 6.05 3.23E-24 0.15 1.22E-21 0.49 0.07 1 CHB 3.69 7.90E-10 0.24 1.91E-09 0.76 0.71 JPT 3.15 2.06E-07 0.31 8.82E-07 0.55 ! 3-6x more likely that a cis regulatory effect explains a trans regulatory effect

Multi-pop CNV analysis Combined 4 populations: 193 genes at 0.001 (48 overlap with the 99 from single population analysis) Combined 3 populations: 173 genes at 0.001 (42 overlap with the 99 from single population analysis)

CNV trans effects Variable expression Biological pathway Copy number variation kb to Mb size variable DNA copy number contribute to disease heritable common in humans/recently appreciated Biological pathway

Trans-position Copy number variation kb to Mb size variable DNA copy number contribute to disease heritable common in humans/recently appreciated

Trans effects - CEU

Trans effects - YRI

Gene expression and natural selection -logpval TSS TSS With Sridhar Kudaravalli and Jonathan Pritchard (unpublished)

Gene expression and natural selection With Sridhar Kudaravalli and Jonathan Pritchard (unpublished)

Co-segregating regulatory variants can drive differential isoform expression

SUMMARY Cis- and trans- acting genetic variation influencing mRNA levels. CNV effects detected are largely not captured by SNPs Structural variation (copy number polymorphism) influences transcript level variation. Many detected associations are shared across human populations – replication of effects Signal concentrated within 100 Kb from the promoter symmetrically Trans-acting effects of CNVs - interpretation Primary effects of trans associations are largely cis regulatory effects Cis regulatory effects under positive selection

Acknowledgements Cambridge University Stanford illumina Mark Dunning Natalie Thorne Simon Tavaré Barbara Stranger Alexandra Nica Antigone Dimas Christine Bird Matthew Forrest Catherine Ingle Claude Beazley Panos Deloukas Matt Hurles Stanford Daphne Koller illumina Jill Orwick Mark Gibbs Genome Structural Variation Consortium Richard Redon, Nigel Carter, Charles Lee, Chris Tyler-Smith, Stephen Scherer, The HapMap Consortium Wellcome Trust for funding