Download presentation
Published byDonald Snow Modified over 9 years ago
1
Causes of regulatory variation in the human genome
Manolis Dermitzakis The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Cambridge, UK
2
Human Genome: ~25,000 genes 1-1.5% of the human DNA is coding Is the remaining 98.5% “junk”
3
Gene expression as a phenotype
Altered patterns of gene expression disease. e.g., Type 1 diabetes, Burkitt’s lymphomas. Widespread intraspecific variation. Heritable genetic variation for transcript levels. Familial aggregation of expression profiles (Cheung et al. 2003). In humans, ~30% of surveyed loci exhibited a genetic component for expression differences (Monks et al. 2004; Schadt et al. 2003). Much of the influential variation is located cis- to the coding locus. In humans, mouse, and maize, 35%-50% of the genetic basis for intraspecific differences in transcription level are cis- to the coding locus (e.g. Morley et al. 2004; Schadt et al. 2003; Stranger et al. 2005; Cheung et al. 2005, etc.). As an introduction, I’d like to give you a couple of quick facts about gene expression: In general, normal cell function and many aspects of development are highly dependent on having the right genes transcribed at the right time and place…and certainly some diseases are associated with altered patterns of gene expression. Extreme effects or subtle changes in expression. However, in many species there is also quite a lot of variation among individuals with respect to gene expression patterns. Much of this variation has a genetic component, for example in humans, nearly 30% of surveyed loci exhibited a genetic component for expression differences. And more studies are showing that much of the genetic component influencing expression is located cis- to the coding locus, for example a survey of humans mouse and maize estimates that approx 30-50% is attributable to cis-located variants Stranger and Dermitzakis 2006
4
Why study gene expression
Describe and dissect regulatory variation Annotate regulatory elements in the human genome Support disease studies to interpret statistical signals Distribution of molecular effects in the genome Natural selection
5
Outline Gene expression variation – recent studies
Analysis of gene expression with HapMap phase II SNPs Update on CNV-expression associations Natural selection and cis regulatory effects
6
Nature of regulatory variation
DNA REG GENE i) Pre-mRNA ii) mRNA iii) Protein iv) DNA Expression Stranger and Dermitzakis, Human Genomics 2005
7
Effects of Copy Number Variation on
gene expression Copy number variation kb to Mb size variable DNA copy number contribute to disease heritable common in humans/recently appreciated
8
Gene expression association mapping
AA AG GG Quantitative phenotype Stranger et al. PLoS Genet 2005
9
Whole-genome gene expression
~48,000 transcripts 24,000 RefSeq 24,000 other transcripts 270 HapMap individuals: CEU: 30 trios, 90 total CHB: 45 unrelated JPT: 45 unrelated YRI: 30 trios, 90 total 2 IVTs each person 2 replicate hybridizations each IVT Quantile normalization of all replicates of each individual. Median normalization across all individuals of a population. illumina Human 6 x 2 gene GEX arrays Cell line RNA IVT1 IVT2 rep1 rep2 rep3 rep4
10
HapMap SNPs 60 CEU 45 CHB 44 JPT 60 YRI 14,072 genes
Phase I HapMap; MAF > 0.05 CEU: 762,447 SNPs CHB: 695,601 JPT: 689,295 YRI: 799,242 ~1/5kb The number of expression phenotypes is not a direct correlation to the number of genes in these regions because there were 2 probes per gene
11
Copy Number Variation dataset
Genome Structural Variation Consortium Redon et al. Nature Nov 22, 2006 Array-CGH using a whole genome tile path array Median clone size ~170 kb All 270 HapMap individuals Quantitative values (log2 ratios) representing diploid genome copy number, not genotypes. 1117 CNVs called from log2 ratios Calls based on standard deviation of log2 ratios Many CNVs experimentally verified 26,563 clones 93.7% euchromatic genome
12
Linear regression for SNPs CNV and expression
Clone signal (log2 ratio)
13
SNP cis-analysis: SNPs within 1Mb of probe midpoint
2Mb window probe gene SNPs
14
CNV cis-analysis: clone midpoint within 2Mb of probe midpoint
4Mb window probe gene clones
15
Permutation GENOTYPES GENE EXPRESSION g11 g12 g13 g14 … g1n
gi1 gi2 gi3 gi4 … gin Exp1 Exp2 Exp3 … Expi permute - 10,000 permutations – each time keep lowest p-value - Null distribution of 10,000 extreme p-values - Compare observed p-values to the tails of the null Doerge and Churchill 1996
16
CNV vs. SNP associations
Stranger et al. Science 2007
18
CNVs and SNPs mostly capture different effects
Relative impact on gene expression: 82% SNPs 18% CNVs Only 13% of genes with CNV association also had a SNP association in the same population biased toward large effect size. CNV and SNP variation are highly correlated (p-value 0.001).
19
2 batches of 60 CEU individuals
Custom vs. Genome-wide [Stranger et al PLoS Genet and Stranger et al Science] 2 batches of 60 CEU individuals grown independently at two different labs RNA extraction and labelling by different labs and people Run in custom and gw illumina arrays 97% of associations at the 0.05 permutation threshold from the custom array analysis were also detected in gw analysis
20
HapMap phase II analysis
~ 4 million SNP genotypes made publicly available for the 270 HapMap individuals. Density: 1 SNP/ 700 bps Includes ~50% of expected common SNPs in these populations. 2.2 million SNPs analyzed (MAF>0.05)
21
Phase I vs. Phase II cis- significant genes (0.001) phase I HapMap
both phase II HapMap CEU 286 258 299 CHB 317 269 318 JPT 337 297 341 YRI 356 310 394 90% 86% 85% 85% 87% 87% 87% 79%
22
Phase I vs. Phase II
23
Population sharing of cis- associations
24
Associated SNP position relative to TSS
25
Distribution of regulatory elements around the TSS
ENCODE Nature 2007
26
Direction of allelic effect same SNP-gene combination across populations
AGREEMENT log2 expression log2 expression OPPOSITE log2 expression log2 expression
27
Direction of allelic effect
28
Pooling populations Spurious associations Pop1 Pop2 Pop1 Pop2
29
Conditional permutations
Permute data within each pop separately then perform test X 4
30
Multi-population analysis
31
Figure 2A Proportion of single pop cis associated genes detected in multi-population analysis Number of populations sharing association in cis: single population analysis
32
SGPP2
33
Trans- phase II HapMap association
Biological hypotheses: functional categories Regulatory SNPs identified from cis- analysis (52%) Non-synonymous SNPs (39%) Splice site SNPs (7%) miRNA SNPs (1%) DNA REG GENE rSNPs nsSNPs spliceSNPs miRNA SNPs Genome-wide associations Network analysis GENE ~ 25,000 SNPs per population x 14,072 genes
34
Trans- associations 10-3 threshold
correction at genes estimated false positives FDR = 33%-39% correction at genes estimated false positives FDR = 60%-75% 14,072 genes tested
35
Enrichment of regulatory SNPs and deficit of nsSNPs in trans- associations
regulatory SNPs (cis 0.001) ns SNPs splice SNPs miRNA SNPs ratio p-value Ratio CEU 6.05 3.23E-24 0.15 1.22E-21 0.49 0.07 1 CHB 3.69 7.90E-10 0.24 1.91E-09 0.76 0.71 JPT 3.15 2.06E-07 0.31 8.82E-07 0.55 ! 3-6x more likely that a cis regulatory effect explains a trans regulatory effect
36
Multi-pop CNV analysis
Combined 4 populations: 193 genes at (48 overlap with the 99 from single population analysis) Combined 3 populations: 173 genes at (42 overlap with the 99 from single population analysis)
37
CNV trans effects Variable expression Biological pathway
Copy number variation kb to Mb size variable DNA copy number contribute to disease heritable common in humans/recently appreciated Biological pathway
38
Trans-position Copy number variation
kb to Mb size variable DNA copy number contribute to disease heritable common in humans/recently appreciated
39
Trans effects - CEU
40
Trans effects - YRI
41
Gene expression and natural selection
-logpval TSS TSS With Sridhar Kudaravalli and Jonathan Pritchard (unpublished)
42
Gene expression and natural selection
With Sridhar Kudaravalli and Jonathan Pritchard (unpublished)
43
Co-segregating regulatory variants can drive differential isoform expression
44
SUMMARY Cis- and trans- acting genetic variation influencing mRNA levels. CNV effects detected are largely not captured by SNPs Structural variation (copy number polymorphism) influences transcript level variation. Many detected associations are shared across human populations – replication of effects Signal concentrated within 100 Kb from the promoter symmetrically Trans-acting effects of CNVs - interpretation Primary effects of trans associations are largely cis regulatory effects Cis regulatory effects under positive selection
45
Acknowledgements Cambridge University Stanford illumina
Mark Dunning Natalie Thorne Simon Tavaré Barbara Stranger Alexandra Nica Antigone Dimas Christine Bird Matthew Forrest Catherine Ingle Claude Beazley Panos Deloukas Matt Hurles Stanford Daphne Koller illumina Jill Orwick Mark Gibbs Genome Structural Variation Consortium Richard Redon, Nigel Carter, Charles Lee, Chris Tyler-Smith, Stephen Scherer, The HapMap Consortium Wellcome Trust for funding
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.