The tangled genome Gil McVean. The real heroes.

Slides:



Advertisements
Similar presentations
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Advertisements

Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
Introduction to genomes & genome browsers
Major insights from the HGP on Nature (2001) 15 th Feb Vol 409 special issue; pgs 814 & )Gene content 2)Proteome content 3)SNP identification.
Supplementary Figure S1 Distribution of observed (blue) and Poisson expected (red) standard deviation of human-chimpanzee divergence over different window.
Recombination and genetic variation – models and inference
Speaker: HU Xue-Jia Supervisor: WU Yun-Dong Date: 19/12/2013.
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
What has variation data taught us about the biology of recombination? Rory Bowden, Afidalina Tumian, Ronald Bontrop, Colin Freeman, Tammie MacFie, Gil.
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
ENCODE enhancers 12/13/2013 Yao Fu Gerstein lab. ‘Supervised’ enhancer prediction Yip et al., Genome Biology (2012) Get enhancer list away to genes DNase.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Analyse comparative des génomes de primates: mais où est donc passée la sélection naturelle ? Laurent Duret, Nicolas Galtier, Peter Arndt ACI-IMPBIO 4-5.
1 Genetic Variability. 2 A population is monomorphic at a locus if there exists only one allele at the locus. A population is polymorphic at a locus if.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Gil McVean Department of Statistics, Oxford Approximate genealogical inference.
Large Scale Variation Among Human and Great Ape Genomes Determined by Array Comparative Genomic Hybridization Devin P. Locke, Richard Segraves, Lucia Carbone,
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Rates and Fitness Effects of Mutations Adam Eyre-Walker (University of Sussex)
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Click to edit Master title style Click to edit Master subtitle style CLICKER QUESTIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry,
Simon Myers, Gil McVean Department of Statistics, Oxford Recombination and genetic variation – models and inference.
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Detection of positive selection in humane genome.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The influence of population size on patterns of natural selection in mammals Carolin Kosiol Cornell University 21 st December 2007 Isaac.
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
PanMap Mapping Genomic Variation in Western Chimpanzees
A genetic polymorphism in the Drosophila insulin receptor suggests adaptation to climate variation across continents Annalise Paaby a, Mark Blacket b,
Single Nucleotide Polymorphisms (SNPs) By Amira Jhelum Rahul Shweta.
Signals of natural selection in the HapMap project data The International HapMap Consortium Gil McVean Department of Statistics, Oxford University.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Published primate genome sequences - I Published primate genome sequences - II.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
BB30055: Genes and genomes Major insights from the HGP.
Gil McVean Department of Statistics
Functional Mapping and Annotation of GWAS: FUMA
The Heritage of Pathogen Pressures and Ancient Demography in the Human Innate- Immunity CD209/CD209L Region  Luis B. Barreiro, Etienne Patin, Olivier Neyrolles,
SGN23 The Organization of the Human Genome
CYP3A Variation and the Evolution of Salt-Sensitivity Variants
Volume 18, Issue 9, Pages (February 2017)
Volume 21, Issue 3, Pages (October 2017)
Model of segmental duplication Acceptor regions of the genome acquire segments of genomic material that range from 1–200 kb from disparate regions.
Revisiting the Thrifty Gene Hypothesis via 65 Loci Associated with Susceptibility to Type 2 Diabetes  Qasim Ayub, Loukas Moutsianas, Yuan Chen, Kalliope.
Volume 21, Issue 3, Pages (October 2017)
CYP3A Variation and the Evolution of Salt-Sensitivity Variants
Presented by, Jeremy Logue.
Jeffrey A. Fawcett, Hideki Innan  Trends in Genetics 
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Presented by, Jeremy Logue.
Identifying Darwinian Selection Acting on Different Human APOL1 Variants among Diverse African Populations  Wen-Ya Ko, Prianka Rajan, Felicia Gomez, Laura.
Analysis of protein-coding genetic variation in 60,706 humans
The Heritage of Pathogen Pressures and Ancient Demography in the Human Innate- Immunity CD209/CD209L Region  Luis B. Barreiro, Etienne Patin, Olivier Neyrolles,
Presentation transcript:

The tangled genome Gil McVean

The real heroes

PanMap – Genome sequencing of 10 Western Chimpanzees Patterns of small insertion and deletion are quite different and reveal details of DNA repair pathways Patterns of recombination in humans and chimpanzees are highly diverged at the fine-scale, but largely conserved at broad scales There are a surprising number (6+ now ‘confirmed)’) of trans- specific polymorphisms, probably maintained through host- pathogen interactions

A tangle of sequence

Difficulties of working with an incomplete reference

Using de novo assembly to find variants

Entire population

Sample 1

Sample 2

Chromosome 1

Using Cortex leads to a high quality set of variants

Diversity in Western Chimpanzees Similar diversity as humans of European origin (0.06%-0.08%) Excess of common variants 1% variants shared with humans

Non-slippage indels are strongly biased to deletions 13:1 bias toward deletions. Unexpected peak at 4bp

Indels as indicators of DNA repair processes Insertions deletions Indel size Longest word agreement

TGACGAACTTAT ACTGCTTGAATA TGACGA AC AT TGAATA TGAC--AT ACTGAATA TGACTTAT Losing GAAC

A tangle of trees

Myers et al. 2005

The zinc-finger protein PRDM9 determines hotspot location Myers et al. 2010

PRDM9 Zinc fingers are radically different between humans and chimps Perhaps the most diverged gene between humans and chimpanzees Repeatedly hit by adaptive evolution across mammals Only known ‘speciation gene’ in mammals Polymorphic in humans – leads to variation in hotspots and genome instability

Questions We know from previous work in a few regions that hotspot locations tend not to be shared between humans and chimpanzees Calculations suggested that only 40% of human hotspots were driven by PRDM9 binding But.. –Is there any hotspot sharing? –Do we conservation of recombination rates at any scale? –What features determine hotspot location in chimpanzees?

The first genome-wide fine-scale map of recombination for a non-reference organism Auton et al. 2012

Chimpanzee recombination is dominated by hotspots in a manner similar to humans

But the hotspots are not in the same locations

Fine-scale profiles around genes are similar

As is rate variation around CpG islands

Substantial PRDM9 diversity, but overlap in predicted binding sequences

No signal for predicted binding sequences

Similarities at 1Mb scale

Human and chimp recombination rates are correlated at the chromosomal scale

Human and chimp recombination rates are only correlated at broad scales

Lower correlation in structural rearrangements All, bar one, of the inverted regions are pericentric so change in position wrt to centromere does not contribute Change in proximity to telomere is important

chimphuman C.A. 2a 2b 2a 2b 2 t A natural experiment: chromosomal fusion

Fusion region shows 3-fold decrease in recombination rate

A tangle of histories

Distribution of sickle allele Of malaria

How many variants are shared through descent?

SNPs shared by humans and chimpanzees (33,906 autosomal and 527 X chromosome) Human polymorphism 9.4 million autosomal and 261,000 X chromosome SNPs from 1000 genomes Pilot 1 YRI (59 individuals) Chimpanzee polymorphism 3.8 million autosomal and 102,000 X chromosome SNPs from PanMap Pan troglogdytes verus (10 individuals) Human-chimpanzee shared haplotypes At least two shared SNPs in 4kb with the same LD reduce recurrent mutation Human-chimpanzee shared coding SNPs identify potentially functional coding variants reduce artifactual sharing due to known or cryptic paralogs by filtering out SNPs with low 50 bp mappability, with high read depth, or not found in 1000 Genomes Phase regions with shared haplotypes outside the MHC 135 shared non-synonymous SNPs 1 shared premature stop SNP 200 shared synonymous SNPs outside the MHC 7 resequenced using Sanger sequencing 8 with more than two pairs in LD

Outside of the MHC, six clear-cut cases of trans-species polymorphisms All non-coding and putatively regulatory FREM3/GYPEMTRRIGFBP7

In intron of IGFBP7 TFBS conserved in human/mouse/rat Chromatin state segmentation by HMM DNaseI hypersensitive sites Human-Chimpanzee shared SNPs Primate phastCons score TFBS identified by ChIP-seq IGFBP7 gene structure RelACUTL1 4kb Regulatory region in HUVEC Regulatory region in NHEK and HMEC Weak enhancer Strong enhancer SRF Bach1 STAT3 GATA-2 ISGF-3 Weak enhancer 20kb Average pairwise differences Open chromatin by FAIRE

In total, 130 regions with shared human-chimpanzee haplotypes. Six clear-cut cases of ancient balanced polymorphisms. None are protein-coding. Eleven occur in non-coding genes (e.g., 7 in lincRNAs). Eleven compelling cases of regulatory regions. What do these regions have in common?

SNPs shared by humans and chimpanzees Shared haplotypes Shared coding SNPs Closest gene within 20 kb of a human-chimp shared haplotype (n=26, p=2x10 -5, FDR=0.03) Genes human-chimp coding shared SNP (n=99, p=0.017, FDR=0.20) Enrichment of membrane glycoproteins -> host-pathogen interactions Glycoproteins

Project Participants University of Oxford Adam Auton Rory Bowden Peter Humburg Zam Iqbal Gerton Lunter Julian Maller Simon Myers Susanne Pfeifer Isaac Turner Oliver Venn Peter Donnelly (PI) Gil McVean (PI) Biomedical Primate Research Centre Ronald Bontrop University of Chicago Adi Fledel-Alon Ryan Hernandez (UCSF) Ellen Leffler Cord Melton Laure Segurel Molly Przeworski (PI) Funders Howard Hughes Medical Institute National Institute of Health Royal Society Wellcome Trust

Where next?

Remarkable structural and sequence diversity in chimp PRDM9

Variation greater than in human populations

Little correlation in fine-scale structure around DNA repeat elements

No activating motif discovered in chimp CCTCCCT