Presentation is loading. Please wait.

Presentation is loading. Please wait.

The tangled genome Gil McVean. The real heroes.

Similar presentations


Presentation on theme: "The tangled genome Gil McVean. The real heroes."— Presentation transcript:

1 The tangled genome Gil McVean

2

3 The real heroes

4 PanMap – Genome sequencing of 10 Western Chimpanzees Patterns of small insertion and deletion are quite different and reveal details of DNA repair pathways Patterns of recombination in humans and chimpanzees are highly diverged at the fine-scale, but largely conserved at broad scales There are a surprising number (6+ now ‘confirmed)’) of trans- specific polymorphisms, probably maintained through host- pathogen interactions

5 A tangle of sequence

6

7 Difficulties of working with an incomplete reference

8 Using de novo assembly to find variants

9 Entire population

10 Sample 1

11 Sample 2

12 Chromosome 1

13 Using Cortex leads to a high quality set of variants

14 Diversity in Western Chimpanzees Similar diversity as humans of European origin (0.06%-0.08%) Excess of common variants 1% variants shared with humans

15 Non-slippage indels are strongly biased to deletions 13:1 bias toward deletions. Unexpected peak at 4bp

16 Indels as indicators of DNA repair processes Insertions deletions 510201525 5 10 20 15 25 5 10 20 15 25 510201525 Indel size Longest word agreement

17 TGACGAACTTAT ACTGCTTGAATA TGACGA AC AT TGAATA TGAC--AT ACTGAATA TGACTTAT Losing GAAC

18 A tangle of trees

19 Myers et al. 2005

20 The zinc-finger protein PRDM9 determines hotspot location Myers et al. 2010

21 PRDM9 Zinc fingers are radically different between humans and chimps Perhaps the most diverged gene between humans and chimpanzees Repeatedly hit by adaptive evolution across mammals Only known ‘speciation gene’ in mammals Polymorphic in humans – leads to variation in hotspots and genome instability

22 Questions We know from previous work in a few regions that hotspot locations tend not to be shared between humans and chimpanzees Calculations suggested that only 40% of human hotspots were driven by PRDM9 binding But.. –Is there any hotspot sharing? –Do we conservation of recombination rates at any scale? –What features determine hotspot location in chimpanzees?

23 The first genome-wide fine-scale map of recombination for a non-reference organism Auton et al. 2012

24

25 Chimpanzee recombination is dominated by hotspots in a manner similar to humans

26 But the hotspots are not in the same locations

27 Fine-scale profiles around genes are similar

28 As is rate variation around CpG islands

29 Substantial PRDM9 diversity, but overlap in predicted binding sequences

30 No signal for predicted binding sequences

31 Similarities at 1Mb scale

32 Human and chimp recombination rates are correlated at the chromosomal scale

33 Human and chimp recombination rates are only correlated at broad scales

34 Lower correlation in structural rearrangements All, bar one, of the inverted regions are pericentric so change in position wrt to centromere does not contribute Change in proximity to telomere is important

35 chimphuman C.A. 2a 2b 2a 2b 2 t A natural experiment: chromosomal fusion

36 Fusion region shows 3-fold decrease in recombination rate

37

38 A tangle of histories

39 Distribution of sickle allele Of malaria

40

41 How many variants are shared through descent?

42 SNPs shared by humans and chimpanzees (33,906 autosomal and 527 X chromosome) Human polymorphism 9.4 million autosomal and 261,000 X chromosome SNPs from 1000 genomes Pilot 1 YRI (59 individuals) Chimpanzee polymorphism 3.8 million autosomal and 102,000 X chromosome SNPs from PanMap Pan troglogdytes verus (10 individuals) Human-chimpanzee shared haplotypes At least two shared SNPs in 4kb with the same LD reduce recurrent mutation Human-chimpanzee shared coding SNPs identify potentially functional coding variants reduce artifactual sharing due to known or cryptic paralogs by filtering out SNPs with low 50 bp mappability, with high read depth, or not found in 1000 Genomes Phase 1 130 regions with shared haplotypes outside the MHC 135 shared non-synonymous SNPs 1 shared premature stop SNP 200 shared synonymous SNPs outside the MHC 7 resequenced using Sanger sequencing 8 with more than two pairs in LD

43 Outside of the MHC, six clear-cut cases of trans-species polymorphisms All non-coding and putatively regulatory FREM3/GYPEMTRRIGFBP7

44 In intron of IGFBP7 TFBS conserved in human/mouse/rat Chromatin state segmentation by HMM DNaseI hypersensitive sites Human-Chimpanzee shared SNPs Primate phastCons score TFBS identified by ChIP-seq IGFBP7 gene structure RelACUTL1 4kb Regulatory region in HUVEC Regulatory region in NHEK and HMEC Weak enhancer Strong enhancer SRF Bach1 STAT3 GATA-2 ISGF-3 Weak enhancer 20kb Average pairwise differences Open chromatin by FAIRE

45 In total, 130 regions with shared human-chimpanzee haplotypes. Six clear-cut cases of ancient balanced polymorphisms. None are protein-coding. Eleven occur in non-coding genes (e.g., 7 in lincRNAs). Eleven compelling cases of regulatory regions. What do these regions have in common?

46 SNPs shared by humans and chimpanzees Shared haplotypes Shared coding SNPs Closest gene within 20 kb of a human-chimp shared haplotype (n=26, p=2x10 -5, FDR=0.03) Genes human-chimp coding shared SNP (n=99, p=0.017, FDR=0.20) Enrichment of membrane glycoproteins -> host-pathogen interactions Glycoproteins

47 Project Participants University of Oxford Adam Auton Rory Bowden Peter Humburg Zam Iqbal Gerton Lunter Julian Maller Simon Myers Susanne Pfeifer Isaac Turner Oliver Venn Peter Donnelly (PI) Gil McVean (PI) Biomedical Primate Research Centre Ronald Bontrop University of Chicago Adi Fledel-Alon Ryan Hernandez (UCSF) Ellen Leffler Cord Melton Laure Segurel Molly Przeworski (PI) Funders Howard Hughes Medical Institute National Institute of Health Royal Society Wellcome Trust

48 Where next?

49 Remarkable structural and sequence diversity in chimp PRDM9

50 Variation greater than in human populations

51 Little correlation in fine-scale structure around DNA repeat elements

52 No activating motif discovered in chimp CCTCCCT


Download ppt "The tangled genome Gil McVean. The real heroes."

Similar presentations


Ads by Google