Download presentation
Published byChristiana Matilda Lester Modified over 8 years ago
1
PanMap Mapping Genomic Variation in Western Chimpanzees
PanMap Mapping Genomic Variation in Western Chimpanzees Susanne Pfeifer SMBE, Walter Fitch Student Award 07. July 2010 I would like to thank the organising commitee for giving me the possibility to talk to you about an exciting collaboration project looking at diversity in Western chimpanzees which are one out of 3 or 4 chimpanzee subspecies. 1
2
Project Participants University of Oxford Biomedical Primate
Project Participants University of Oxford Adam Auton Rory Bowden Peter Humburg Zam Iqbal Gerton Lunter Julian Maller Simon Myers Susanne Pfeifer Oliver Venn Peter Donnelly (PI) Gil McVean (PI) Biomedical Primate Research Centre Ronald Bontrop University of Chicago Adi Fledel-Alon Ryan Hernandez (now UCSF) Ellen Leffler Cord Melton Laure Segurel Molly Przeworski (PI) Funders Howard Hughes Medical Institute National Institute of Health Royal Society Wellcome Trust
3
Why are we interested in Chimpanzee Diversity?
Why are we interested in Chimpanzee Diversity? Our closest living evolutionary relative Role of selection in shaping diversity Chimp-specific adaptations Effects of chromosomal changes on evolution Structural variation Mutation mechanisms Recombination I will try to motivate why we are interested in this and more importantly, why you might be interested in this at least for the next min. The reasons why we are interested in studying variation and diversity in chimpanzee are twofold: First, chimps are a natural population which is in some sense closest to humans so we may learn things that might be important from a population genetics point of view and population genetics is fundamentally interesting endeavor. It is attempt to understand the relative importance of forces shaping diversity in natural populations. That is important when we try to understand and interpret diversity data itself of which is more and more around – particularly for humans. It’s also important in giving us insights into the fundamentally evolutionary processes in particular selection and recombination and how they might be working. The second reason is that direct comparisons of genomic patterns of diversity between chimp and humans might – so we hope – shed light on aspects of human evolution and potentially the function of human genes
4
PanMap Project 10 Western Chimpanzees (Pan troglodytes versus)
PanMap Project 10 Western Chimpanzees (Pan troglodytes versus) Sequenced on an Illumina GAII 8-10X coverage 50 bp paired-end sequencing Aligned to PanTro2 reference using Stampy Data to be made freely available In the PanMap project, we resequenced 10 Western chimpanzees at 8-10 X coverage using 50 bp paired-end Illumina sequencing. The sequences were aligned to the PanTro2 reference genome and in the next few minutes, I will describe a number of analysis which followed from that. But before this, I just want to briefly mention that it is our intention to make all the data freely available. Tradeoff between coverage and sample size: We have selected 10 individuals at 10X coverage because it gives accurate genotype calls on enough samples to get useful LD information to learn about recombination rates and recombination hotspots.
5
Quality of the Reference Chromosome 21 – bin size: 100kb
Quality of the Reference Chromosome 21 – bin size: 100kb
6
Experimental Status Chimpanzee Coverage Annaclara 9.60 Frits 9.52 Gina
Experimental Status Chimpanzee Coverage Annaclara 9.60 Frits 9.52 Gina 9.96 Lady 8.48 Liesbeth 9.37 Pearl 8.87 Regina 10.88 Renee 5.37 Susie 9.76 Yvonne 9.28 How far have we got to? We finished sequencing 9 of the 10 chimpanzees and only one of them, Renee, is lacking a bit behind. Thus, the analysis I will talk about focus on the other 9 chimps.
7
SNP Calling Genome Analysis Toolkit (GATK) Chr #dbSNP #SNPs
SNP Calling Genome Analysis Toolkit (GATK) Chr #dbSNP #SNPs % dbSNP SNPs in call set Ts/Tv 1 104,212 517,068 0.57 2.07 2a 54,729 260,182 0.56 1.97 2b 64,006 298,290 2.01 3 103,478 476,177 1.96 4 96,353 468,879 1.92 5 90,328 423,112 6 87,834 416,916 2.03 7 76,140 394,687 2.02 8 73,558 352,525 1.94 9 56,640 277,826 10 64,726 309,777 11 12 13 Based on the sequence data, we have called SNPs genome-wide using the GATK and as you can see, we found a lot of them and rather more than there are in the chimp version of dbSNP or particular in the bit specifically ascertained in Western chimp. About 80% of the SNPs that we find are not in the chimp dbSNP. On the other hand, most of the SNPs that are in dbSNP, we do find although not all of them. We do not fully understand the issue but it might either be due to problems with our data – but we don’t think that is the case and I will say something encouraging about that in a minute – or due to false positives in dbSNP or due to demographic effects in Western chimpanzees.
8
Frequency Spectra dbSNP vs novel SNPs
Frequency Spectra dbSNP vs novel SNPs Just to give you an idea: on the left side, we see the frequency spectrum for the dbSNP SNPs in chimp coloured in red. You can see that it is relatively uniform which is exactely what we would expect in chimpanzee which are acertained in a sample of only two chromosomes, in other words only one individual. This is not really surprising knowing that much of the chimpanzee genome project was based on one individual chimp called Clint. In contrast resequencing of a larger sample of individuals (here shown in blue) allows us to find more of the low frequency variants. Thanks to Adam Auton
9
Frequency Spectra Chimpanzee vs Human
Frequency Spectra Chimpanzee vs Human If you compare the frequency spectrum of chimp with one obtained from 9 CEU and YRI individuals from the 1000 G project who had the highest coverage which typically lies around 5,6 or 7X which is not exactly the same but it’s a reasonable comparison for our project, we can see that we detect a lower number of rare variants which are consistent with the lower effective population size of chimpanzees. Thanks to Adam Auton
10
Quality of the Data Comparison of sequencing data to genotype data (Myers et al 2009): 81 SNPs from Chromosome 2a and 2b are segregating in the sample and discovered in sequencing with high quality genotype calls Chimpanzee Coverage Concordance Annaclara 9.60 0.975 Frits 9.52 0.963 Gina 9.96 1.000 Lady 8.48 0.988 Liesbeth 9.37 Pearl 8.87 Regina 10.88 Susie 9.76 Yvonne 9.28
11
Diversity Chromosome 21 I would now like to talk about two broad features of the data which are based on preliminary analysis as we have only got the data for a few weeks. First, I want to talk about large-scale patterns of diversity in this case chromosomal scales. The picture indicates the SNP density – which is just the number of SNPs as you move along the chromosome - along chromosome 21 for CEUs in blue and for chimp in red. What you notice when you look at this is that patterns of diversity in humans and chimp track each other broadly along the chromosome with rather more SNPs in the chimp data. We discover more SNPs in chimps as a consequence of the slightly higher coverage in chimps compared to the 9 selected CEU individuals as I mentioned earlier. Thanks to Adam Auton
12
Diversity Chromosome 21 – bin size: 100 kb
Diversity Chromosome 21 – bin size: 100 kb SNP density is one way of looking at diversity another one is a quantity call diversity which is just the average pair-wise difference that you obtain by looking at the differences for every pair of chromosomes. Thus, its a way of summarising polymorphisms within a population. In the plot, you see three lines: black for the diversity in chimps, red for CEU and green for YRI which show more diversity than CEU as you would expect as this is a well-known feature of African populations and for CEUs as a consequence of the bottleneck when our ancestors left Africa. The chimp patterns broadly track those of both human populations as we saw with the SNP density. One of the questions is what causes these large scale variations across the chromosome and the most obvious answer are changes in mutation rates. A natural way of assessing that is not only to compare diversity but to correct for an estimate of mutation rate most naturally obtained by comparing chimp and human sequences and looking for fixed differences. Thanks to Cord Melton
13
Diversity vs Divergence Chromosome 21 – bin size: 100kb
Diversity vs Divergence Chromosome 21 – bin size: 100kb This plot shows diversity – so within population polymorphism levels – on the y-axis and divergence – between species variation – a surrogate for mutation rate – on the x- axis. What you see is that there is more polymorphism in regions with higher mutation rates. However, this is not deterministic and when you fit lines, you see a lot of variation. The YRI have the highest slope which is consistent with having a larger effective population size and more divergence. The chimp line lies in between YRI and CEU. So, the changes in mutation rate along the chromosomes is one reason for the difference in diversity between humans and chimpanzees. Thanks to Cord Melton
14
Diversity vs Recombination Rate Chromosome 21 – bin size: 100kb
Diversity vs Recombination Rate Chromosome 21 – bin size: 100kb Another less obvious one are recombination rates. Recombination rates are important for at least two reasons, one of them is that recombination itself is a mutagenic process which introduces mutations due to biased gene conversion as we have heard on Monday. The other reason is that indirect methods of selection have effects which depend on recombination. Due to selective sweeps and background selection, we expect less diversity in regions of low recombination. In regions with low recombination, background selection is slightly more important which explains the dip here (~0.0) that we have seen before in human populations. Thanks to Cord Melton
15
Recombination Chromosome 21
Recombination Chromosome 21 As you can see, both in humans and in chimpanzees, 80% of the recombination happens in small regions or in hotspots (about 10% of the sequence). Thanks to Adam Auton and Oliver Venn
16
Fine-scale Recombination Landscape Chromosome 21
Fine-scale Recombination Landscape Chromosome 21 If you look at the fine-scale recombination rate in humans and chimpanzees, there is extreme variation and lots of peakyness and in fact the fine-scale landscape chimpanzee and human look rather different which might be caused by the fact that human and chimpanzees use different hotspots and that a motif causing hotspots in humans are not active in chimps. Thanks to Adam Auton and Oliver Venn
17
Chimpanzee rates relative to human hotspots Chromosome 21
Chimpanzee rates relative to human hotspots Chromosome 21
18
Recombination Rates around Genes Chromosome 2/2a/2b
Recombination Rates around Genes Chromosome 2/2a/2b Human Chimp HapMap CEU YRI The first thing, I want to mention is that recombination rates change around genes. In the pictures, the x-axis determines for each position on the chromosome whether you are in a gene and if not how far away you are from the nearest one. For all positions that are a certain distance away, you average the local recombination rates. For humans, there are three lines: black: HapMap, green YRI and red CEU. The lines are quite similar suggesting that we get estimations of recombination rates from only 9 individuals. So, what we see is that recombination rates are low within genes and they increase as you move away from the genes until they peak some 10, 20 or 30 kb away. In chimps, the recombination rate is also much lower within genes. If you look downstream of a gene, you see the same broad picture than in humans: you see recombination rates increasing and peaking some distance away from the gene. Based on the chimp chromosomes 2a and 2b which are jointly homologous of human chromosome 2, upstream, you interestingly see a rather different picture: whilst the human peak is about 20kb upstream, the peak in chimps is in a kb or so immediately upstream a gene right where the promotors are. At the moment, we are a bit cautious about this result and we need to investigate the whole genome first but if this turns out to be real that would be a huge change. Thanks to Simon Myers
19
Chromosomal Fusion Chimp Chromosome 2a Chimp Chromosome 2b
Chromosomal Fusion Chimp Chromosome 2a Chimp Chromosome 2b Human Chromosome 2
20
Genetic map Chromosome 2/2a/2b - bin size: 2.5Mb
Genetic map Chromosome 2/2a/2b - bin size: 2.5Mb As chr2a and 2b have fused in humans, they can be used to assess how chromosomal position can effect recombination rates. On the human chromosome 2 there is a centromere but also on chr2 is a position that used to be a centromere when it was a separate chimp chromosome. So, we can asked how this high-scale organisation effects things. This plot shows the genetic map around the fusion point and what you see is that local from the fusion point is quite a change and in fact it is exactly where the two chimp centromeres were. Thanks to Adam Auton and Oliver Venn
21
Recombination Landscape Chromosome 2/2a/2b – bin size: 2.5Mb
Recombination Landscape Chromosome 2/2a/2b – bin size: 2.5Mb Here is the same picture in terms of recombination rates. You see that the rates are high on the short arms and increase towards the telomers as we would expect. What has not been well understood so far is whether this effect is due to sequence composition or some kind of chromosomal organisation. However, if we look at humans where the two short arms were fused and are now part of the long arm, we see that recombination rates are much lower which makes clear that it is not only sequence itself that is determining the increase of the rates at the ends of the arms. This change in recombination rates is local what you can see if you look somewhere away from where the chimp centromers as the rates are broadly similar apart from the regions just outside the centromers. Thanks to Adam Auton and Oliver Venn
22
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.