PanMap Mapping Genomic Variation in Western Chimpanzees

Slides:



Advertisements
Similar presentations
Why this paper Causal genetic variants at loci contributing to complex phenotypes unknown Rat/mice model organisms in physiology and diseases Relevant.
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
Recombination and genetic variation – models and inference
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
The bonobo genome compared with the chimpanzee and human genomes Kay Pruüfer et al. Nature (June,2012) Presenter: Chia-Ying Chen.
The role of variation in finding functional genetic elements Andy Clark – Cornell Dave Begun – UC Davis.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
What has variation data taught us about the biology of recombination? Rory Bowden, Afidalina Tumian, Ronald Bontrop, Colin Freeman, Tammie MacFie, Gil.
Lessons learnt from the 1000 Genomes Project about sequencing in populations Gil McVean Wellcome Trust Centre for Human Genetics and Department of Statistics,
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
The tangled genome Gil McVean. The real heroes.
Exploring the behavioral genetics of Trade and Cooperation Arcadi Navarro and Elodie Gazave July 5th 2007.
Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ? Laurent Duret, Nicolas Galtier, Peter Arndt ACI-IMPBIO 4-5.
RExPrimer Pongsakorn Wangkumhang, M.Sc. Biostatistics and Informatics Laboratory, Genome Institute, National Center for Genetic Engineering and Biotechnology.
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium.
Sequencing TRAF1 in patients with rheumatoid arthritis Bruce C. Jobse Medical and Population Genetics Broad Institute.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis.
NEW NEWS of HUMAN FROM MOUSE and CHIMP Nature 420 (6915), 5 Dec 2002 Genome Research 13(3), March 2003.
Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford.
Gil McVean Department of Statistics, Oxford Approximate genealogical inference.
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
Rates and Fitness Effects of Mutations Adam Eyre-Walker (University of Sussex)
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
Simon Myers, Gil McVean Department of Statistics, Oxford Recombination and genetic variation – models and inference.
Gerton Lunter Wellcome Trust Centre for Human Genetics From calling bases to calling variants: Experiences with Illumina data.
Vervet Monkey Genomics: Genome Canada and Génome Québec Physical Map Project J. Wasserscheid, G. Leveque, C. Nagy, C. Pinsonnault, and K. Dewar, McGill.
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Comparative analyses of the potato and tomato transcriptomes
Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The HapMap Project and Haploview
CHROMOSOMAL INVERSIONS IN HUMAN POPULATIONS Andrea González Morales.
CASE7——RAD-seq for Grape genetic map construction
Motivations to study human genetic variation
Determine the sequence of genes along a chromosome based on the following recombination frequencies A-C 20% A-D 10% B-C 15% B-D 5%
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
Signals of natural selection in the HapMap project data The International HapMap Consortium Gil McVean Department of Statistics, Oxford University.
1 Comparative analyses of the potato and tomato transcriptomes David Francis, AllenVan Deynze, John Hamilton, Walter De Jong, David Douches, Sanwen Huang,
Canadian Bioinformatics Workshops
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
The genomic democracy of sex. Genetic variability Mutation Gene flow Sex.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Common variation, GWAS & PLINK
Gil McVean Department of Statistics
Detection of the footprint of natural selection in the genome
Linking Genetic Variation to Important Phenotypes
Genomic alterations in breast cancer cell line MDA-MB-231.
Genetic Variation in the Human Androgen Receptor Gene Is the Major Determinant of Common Early-Onset Androgenetic Alopecia  Axel M. Hillmer, Sandra Hanneken,
Deep Whole-Genome Sequencing of 100 Southeast Asian Malays
Population Genetic Inference from Personal Genome Data: Impact of Ancestry and Admixture on Human Genomic Variation  Jeffrey M. Kidd, Simon Gravel, Jake.
The first two principal components for the islet gene expression data for the 181 microarray probes that map to the chromosome 6 trans-eQTL hotspot with.
The genomic landscape of a HeLa cell line.
Mapping of srt1 by BSA-seq.
Presentation transcript:

PanMap Mapping Genomic Variation in Western Chimpanzees 02.03.10 02.03.10 PanMap Mapping Genomic Variation in Western Chimpanzees Susanne Pfeifer SMBE, Walter Fitch Student Award 07. July 2010 I would like to thank the organising commitee for giving me the possibility to talk to you about an exciting collaboration project looking at diversity in Western chimpanzees which are one out of 3 or 4 chimpanzee subspecies. 1

Project Participants University of Oxford Biomedical Primate 02.03.10 Project Participants University of Oxford Adam Auton Rory Bowden Peter Humburg Zam Iqbal Gerton Lunter Julian Maller Simon Myers Susanne Pfeifer Oliver Venn Peter Donnelly (PI) Gil McVean (PI) Biomedical Primate Research Centre Ronald Bontrop University of Chicago Adi Fledel-Alon Ryan Hernandez (now UCSF) Ellen Leffler Cord Melton Laure Segurel Molly Przeworski (PI) Funders Howard Hughes Medical Institute National Institute of Health Royal Society Wellcome Trust

Why are we interested in Chimpanzee Diversity? 02.03.10 Why are we interested in Chimpanzee Diversity? Our closest living evolutionary relative Role of selection in shaping diversity Chimp-specific adaptations Effects of chromosomal changes on evolution Structural variation Mutation mechanisms Recombination I will try to motivate why we are interested in this and more importantly, why you might be interested in this at least for the next 10-15 min. The reasons why we are interested in studying variation and diversity in chimpanzee are twofold: First, chimps are a natural population which is in some sense closest to humans so we may learn things that might be important from a population genetics point of view and population genetics is fundamentally interesting endeavor. It is attempt to understand the relative importance of forces shaping diversity in natural populations. That is important when we try to understand and interpret diversity data itself of which is more and more around – particularly for humans. It’s also important in giving us insights into the fundamentally evolutionary processes in particular selection and recombination and how they might be working. The second reason is that direct comparisons of genomic patterns of diversity between chimp and humans might – so we hope – shed light on aspects of human evolution and potentially the function of human genes

PanMap Project 10 Western Chimpanzees (Pan troglodytes versus) 02.03.10 PanMap Project 10 Western Chimpanzees (Pan troglodytes versus) Sequenced on an Illumina GAII 8-10X coverage 50 bp paired-end sequencing Aligned to PanTro2 reference using Stampy Data to be made freely available In the PanMap project, we resequenced 10 Western chimpanzees at 8-10 X coverage using 50 bp paired-end Illumina sequencing. The sequences were aligned to the PanTro2 reference genome and in the next few minutes, I will describe a number of analysis which followed from that. But before this, I just want to briefly mention that it is our intention to make all the data freely available. Tradeoff between coverage and sample size: We have selected 10 individuals at 10X coverage because it gives accurate genotype calls on enough samples to get useful LD information to learn about recombination rates and recombination hotspots.

Quality of the Reference Chromosome 21 – bin size: 100kb 02.03.10 Quality of the Reference Chromosome 21 – bin size: 100kb

Experimental Status Chimpanzee Coverage Annaclara 9.60 Frits 9.52 Gina 02.03.10 Experimental Status Chimpanzee Coverage Annaclara 9.60 Frits 9.52 Gina 9.96 Lady 8.48 Liesbeth 9.37 Pearl 8.87 Regina 10.88 Renee 5.37 Susie 9.76 Yvonne 9.28 How far have we got to? We finished sequencing 9 of the 10 chimpanzees and only one of them, Renee, is lacking a bit behind. Thus, the analysis I will talk about focus on the other 9 chimps.

SNP Calling Genome Analysis Toolkit (GATK) Chr #dbSNP #SNPs 02.03.10 SNP Calling Genome Analysis Toolkit (GATK) Chr #dbSNP #SNPs % dbSNP SNPs in call set Ts/Tv 1 104,212 517,068 0.57 2.07 2a 54,729 260,182 0.56 1.97 2b 64,006 298,290 2.01 3 103,478 476,177 1.96 4 96,353 468,879 1.92 5 90,328 423,112 6 87,834 416,916 2.03 7 76,140 394,687 2.02 8 73,558 352,525 1.94 9 56,640 277,826 10 64,726 309,777 11 12 13 Based on the sequence data, we have called SNPs genome-wide using the GATK and as you can see, we found a lot of them and rather more than there are in the chimp version of dbSNP or particular in the bit specifically ascertained in Western chimp. About 80% of the SNPs that we find are not in the chimp dbSNP. On the other hand, most of the SNPs that are in dbSNP, we do find although not all of them. We do not fully understand the issue but it might either be due to problems with our data – but we don’t think that is the case and I will say something encouraging about that in a minute – or due to false positives in dbSNP or due to demographic effects in Western chimpanzees.

Frequency Spectra dbSNP vs novel SNPs 02.03.10 Frequency Spectra dbSNP vs novel SNPs Just to give you an idea: on the left side, we see the frequency spectrum for the dbSNP SNPs in chimp coloured in red. You can see that it is relatively uniform which is exactely what we would expect in chimpanzee which are acertained in a sample of only two chromosomes, in other words only one individual. This is not really surprising knowing that much of the chimpanzee genome project was based on one individual chimp called Clint. In contrast resequencing of a larger sample of individuals (here shown in blue) allows us to find more of the low frequency variants. Thanks to Adam Auton

Frequency Spectra Chimpanzee vs Human 02.03.10 Frequency Spectra Chimpanzee vs Human If you compare the frequency spectrum of chimp with one obtained from 9 CEU and YRI individuals from the 1000 G project who had the highest coverage which typically lies around 5,6 or 7X which is not exactly the same but it’s a reasonable comparison for our project, we can see that we detect a lower number of rare variants which are consistent with the lower effective population size of chimpanzees. Thanks to Adam Auton

02.03.10 Quality of the Data Comparison of sequencing data to genotype data (Myers et al 2009): 81 SNPs from Chromosome 2a and 2b are segregating in the sample and discovered in sequencing with high quality genotype calls Chimpanzee Coverage Concordance Annaclara 9.60 0.975 Frits 9.52 0.963 Gina 9.96 1.000 Lady 8.48 0.988 Liesbeth 9.37 Pearl 8.87 Regina 10.88 Susie 9.76 Yvonne 9.28

02.03.10 Diversity Chromosome 21 I would now like to talk about two broad features of the data which are based on preliminary analysis as we have only got the data for a few weeks. First, I want to talk about large-scale patterns of diversity in this case chromosomal scales. The picture indicates the SNP density – which is just the number of SNPs as you move along the chromosome - along chromosome 21 for CEUs in blue and for chimp in red. What you notice when you look at this is that patterns of diversity in humans and chimp track each other broadly along the chromosome with rather more SNPs in the chimp data. We discover more SNPs in chimps as a consequence of the slightly higher coverage in chimps compared to the 9 selected CEU individuals as I mentioned earlier. Thanks to Adam Auton

Diversity Chromosome 21 – bin size: 100 kb 02.03.10 Diversity Chromosome 21 – bin size: 100 kb SNP density is one way of looking at diversity another one is a quantity call diversity which is just the average pair-wise difference that you obtain by looking at the differences for every pair of chromosomes. Thus, its a way of summarising polymorphisms within a population. In the plot, you see three lines: black for the diversity in chimps, red for CEU and green for YRI which show more diversity than CEU as you would expect as this is a well-known feature of African populations and for CEUs as a consequence of the bottleneck when our ancestors left Africa. The chimp patterns broadly track those of both human populations as we saw with the SNP density. One of the questions is what causes these large scale variations across the chromosome and the most obvious answer are changes in mutation rates. A natural way of assessing that is not only to compare diversity but to correct for an estimate of mutation rate most naturally obtained by comparing chimp and human sequences and looking for fixed differences. Thanks to Cord Melton

Diversity vs Divergence Chromosome 21 – bin size: 100kb 02.03.10 Diversity vs Divergence Chromosome 21 – bin size: 100kb This plot shows diversity – so within population polymorphism levels – on the y-axis and divergence – between species variation – a surrogate for mutation rate – on the x- axis. What you see is that there is more polymorphism in regions with higher mutation rates. However, this is not deterministic and when you fit lines, you see a lot of variation. The YRI have the highest slope which is consistent with having a larger effective population size and more divergence. The chimp line lies in between YRI and CEU. So, the changes in mutation rate along the chromosomes is one reason for the difference in diversity between humans and chimpanzees. Thanks to Cord Melton

Diversity vs Recombination Rate Chromosome 21 – bin size: 100kb 02.03.10 Diversity vs Recombination Rate Chromosome 21 – bin size: 100kb Another less obvious one are recombination rates. Recombination rates are important for at least two reasons, one of them is that recombination itself is a mutagenic process which introduces mutations due to biased gene conversion as we have heard on Monday. The other reason is that indirect methods of selection have effects which depend on recombination. Due to selective sweeps and background selection, we expect less diversity in regions of low recombination. In regions with low recombination, background selection is slightly more important which explains the dip here (~0.0) that we have seen before in human populations. Thanks to Cord Melton

Recombination Chromosome 21 02.03.10 Recombination Chromosome 21 As you can see, both in humans and in chimpanzees, 80% of the recombination happens in small regions or in hotspots (about 10% of the sequence). Thanks to Adam Auton and Oliver Venn

Fine-scale Recombination Landscape Chromosome 21 02.03.10 Fine-scale Recombination Landscape Chromosome 21 If you look at the fine-scale recombination rate in humans and chimpanzees, there is extreme variation and lots of peakyness and in fact the fine-scale landscape chimpanzee and human look rather different which might be caused by the fact that human and chimpanzees use different hotspots and that a motif causing hotspots in humans are not active in chimps. Thanks to Adam Auton and Oliver Venn

Chimpanzee rates relative to human hotspots Chromosome 21 02.03.10 Chimpanzee rates relative to human hotspots Chromosome 21

Recombination Rates around Genes Chromosome 2/2a/2b 02.03.10 Recombination Rates around Genes Chromosome 2/2a/2b Human Chimp HapMap CEU YRI The first thing, I want to mention is that recombination rates change around genes. In the pictures, the x-axis determines for each position on the chromosome whether you are in a gene and if not how far away you are from the nearest one. For all positions that are a certain distance away, you average the local recombination rates. For humans, there are three lines: black: HapMap, green YRI and red CEU. The lines are quite similar suggesting that we get estimations of recombination rates from only 9 individuals. So, what we see is that recombination rates are low within genes and they increase as you move away from the genes until they peak some 10, 20 or 30 kb away. In chimps, the recombination rate is also much lower within genes. If you look downstream of a gene, you see the same broad picture than in humans: you see recombination rates increasing and peaking some distance away from the gene. Based on the chimp chromosomes 2a and 2b which are jointly homologous of human chromosome 2, upstream, you interestingly see a rather different picture: whilst the human peak is about 20kb upstream, the peak in chimps is in a kb or so immediately upstream a gene right where the promotors are. At the moment, we are a bit cautious about this result and we need to investigate the whole genome first but if this turns out to be real that would be a huge change. Thanks to Simon Myers

Chromosomal Fusion Chimp Chromosome 2a Chimp Chromosome 2b 02.03.10 Chromosomal Fusion Chimp Chromosome 2a Chimp Chromosome 2b Human Chromosome 2

Genetic map Chromosome 2/2a/2b - bin size: 2.5Mb 02.03.10 Genetic map Chromosome 2/2a/2b - bin size: 2.5Mb As chr2a and 2b have fused in humans, they can be used to assess how chromosomal position can effect recombination rates. On the human chromosome 2 there is a centromere but also on chr2 is a position that used to be a centromere when it was a separate chimp chromosome. So, we can asked how this high-scale organisation effects things. This plot shows the genetic map around the fusion point and what you see is that local from the fusion point is quite a change and in fact it is exactly where the two chimp centromeres were. Thanks to Adam Auton and Oliver Venn

Recombination Landscape Chromosome 2/2a/2b – bin size: 2.5Mb 02.03.10 Recombination Landscape Chromosome 2/2a/2b – bin size: 2.5Mb Here is the same picture in terms of recombination rates. You see that the rates are high on the short arms and increase towards the telomers as we would expect. What has not been well understood so far is whether this effect is due to sequence composition or some kind of chromosomal organisation. However, if we look at humans where the two short arms were fused and are now part of the long arm, we see that recombination rates are much lower which makes clear that it is not only sequence itself that is determining the increase of the rates at the ends of the arms. This change in recombination rates is local what you can see if you look somewhere away from where the chimp centromers as the rates are broadly similar apart from the regions just outside the centromers. Thanks to Adam Auton and Oliver Venn

02.03.10 Thank you!