What has variation data taught us about the biology of recombination? Rory Bowden, Afidalina Tumian, Ronald Bontrop, Colin Freeman, Tammie MacFie, Gil.

Slides:



Advertisements
Similar presentations
Recombination and genetic variation – models and inference
Advertisements

Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity Nicholas M. Luscombe and Janet M. Thornton JMB (2002)
Protein Modules An Introduction to Bioinformatics.
Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
The tangled genome Gil McVean. The real heroes.
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genomics BIT 220 Chapter 21.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
A.J. Pierce MI615 University of Kentucky. Low Copy Repeats in the Human Genome Implications for Genomic Structure MI615 Andrew J. Pierce Microbiology,
CS177 Lecture 10 SNPs and Human Genetic Variation
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Genomes & their evolution Ch 21.4,5. About 1.2% of the human genome is protein coding exons. In 9/2012, in papers in Nature, the ENCODE group has produced.
Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
CHROMOSOMAL INVERSIONS IN HUMAN POPULATIONS Andrea González Morales.
PanMap Mapping Genomic Variation in Western Chimpanzees
Genomics Chapter 18.
Chromosome inversions in human populations Marta Ruiz Fernández Master in Advanced Genetics 17 December 2014.
The Future of Genetics Research Lesson 7. Human Genome Project 13 year project to sequence human genome and other species (fruit fly, mice yeast, nematodes,
Can genes help explain our evolution? - What type of changes (regulatory or structural mutations?) - How many genes are involved?
Finding genes in the genome
Transcription factor binding motifs (part II) 10/22/07.
Meiotic gene conversion in humans: rate, sex ratio, and GC bias Amy L. Williams June 19, 2013 University of Chicago.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Notes: Human Genome (Right side page)
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Basics of Comparative Genomics
Chromosomal Basis of Inheritance Lecture 13 Fall 2008
Gil McVean Department of Statistics, Oxford
Genomes and Their Evolution
Peter John M.Phil, PhD Atta-ur-Rahman School of Applied Biosciences (ASAB) National University of Sciences & Technology (NUST)
Today… Review a few items from last class
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Gene Density and Noncoding DNA
High-Resolution Mapping of Crossovers in Human Sperm Defines a Minisatellite- Associated Recombination Hotspot  Alec J Jeffreys, John Murray, Rita Neumann 
Chapter 6 Clusters and Repeats.
Jeffrey A. Fawcett, Hideki Innan  Trends in Genetics 
Basics of Comparative Genomics
Genome Annotation and the Human Genome
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

What has variation data taught us about the biology of recombination? Rory Bowden, Afidalina Tumian, Ronald Bontrop, Colin Freeman, Tammie MacFie, Gil McVean, Peter Donnelly Simon Myers

Recap: Composite likelihood results Statistical algorithms to estimate historical rates, and identify hotspots – Applied genome-wide – Kilobase scale resolution Model-based inference from linkage disequilibrium data (LD) – coalescent model Individuals Loci (Myers et al. 2005)

Recombination questions Human recombination is poorly understood despite intense work Recombination clusters into 1-2kb “hotspots” – why? Why are hotspots where they are in the genome? – Primary DNA sequence? – Epigenetics? What biological machinery produces hotspots? How are hotspots evolving?

32,996 “HapMap” hotspots These hotspots account for 50-70% of all human recombination Why are they where they are? We can look at the fraction of a genome that is “G” or “C” in a region Also see weak correlation with e.g. positions of genes Are there any stronger predictive features?

Broad scale sequence features and recombination THE1B (LTR of retrotransposon) THE1B: Found in 1196 hotspots versus 606 coldspots (p<< ) AluY: Found in 3635 hotspots versus 3262 coldspots (p=7x10 -5 ) Use >20,000 hotspots localized to within 5kb For each, create a matched “coldspot” Compare sequence features

Compared primary DNA sequence at 30,000 human hotspots and matched coldspots Looked at all “words” of length 5-9 (e.g possible 9-mers), refined results Identified a 13-bp motif, CCNCCNTNNCCNC (Myers et al. 2008) A motif for human hotspots...CTTCCGCTATGATTGTGAGGCCTCCCTAGCCATGTGGAACTGTGAGCCCATT......CTTCCGCCATGATTGTGAGGCCTCCCTAGCCATGTGGAACTGTGAGTCCATT......CTTCCGCCATGATTGTGAGGCCTCCCTAGCCACGTGGAAC-GTGAGTCCATT......CATCCGCCATGATTGTGAGGCCTCCCTAGCCACGTGGAACTGAGAGTCCATT......CTTCCGCCATGATTGTGAGGCCTCCCCAGCCATGGGGAACTGTGAGTCCATT... THE1 repeats in hotspots...CTTCCGCCATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGTGAGTCCATT......CTTCCGTTATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGTGAATCCATT......CTTCCGCCATGATTGTGAGGCCTCCCCAGCCATGTGGGACTGTGAGTCCATT......CTTCCGCC-TGATTCTGAGGCCTCCCCAGCCATGTGGAACTGTGAGTCCATT......CTTTCGCCATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGC-TGTCCATT... THE1 repeats in coldspots

Average rates around the motif Confirmed via sperm studies, revealing disruption of first 7-bp part of motif disrupts hotspot activity (Neumann and Jeffreys 2002) Active on multiple backgrounds (e.g. THE1, L2, Alu repeats and unique DNA…) Plays a role at c. 43% of hotspots identified through LD, or directly through sperm typing Penetrance >60% 3-5% of hotspots Penetrance 7.5% 5% of hotspots

The motif is actually longer Based on examining only non-repeat DNA in hotspots Independent of results on previous slide Region that matters >30bp

Recombination and human disease: X-linked ichthyosis The breakpoint hotspot contains the greatest concentration of the 13-bp motif, within a segmental duplication, anywhere in the entire genome Deletion breakpoint hotspot (Van Esch et al. 2005) (Myers et al. 2008)

The motif is associated with NAHR syndromes Multiple genomic disorders are caused by the same phenomenon: “non- allelic homologous recombination” (NAHR) Rearrangement endpoints are consistently clustered into narrow hotspots: X-linked ichthyosis Charcot-Marie-Tooth disease (CMT1A) NF1 Sotos syndrome Smith-Magenis syndrome Williams-Beuren syndrome The motif is present, close to breakpoint hotspots, in each case (p= )

A ‘common deletion’ in mitochondria occurs at the motif Myers et al (2008)

What binds the motif? 3-bp periodicity suggests by a “zinc finger” (ZF) protein with at least 12 zinc fingers (Myers et al. 2008) For genes coding for ZF proteins, we can predict their binding target bioinformatically (Persikov et al. 2009) Searched systematically – Zinc finger protein database of 691 C2H2 ZF proteins – Perform in silico binding predictions Look for matches to 13-bp motif, degeneracy (Myers et al. 2009)

PRDM9 is unique candidate for the motif binding protein

PRDM9 binding of the motif Motif identified by hotspot- coldspot comparison Bioinformatic prediction of PRDM9 binding “target” ZF part of PRDM9. 13 zinc fingers, one separated (showing four codons in each zinc finger that determine binding target) (Myers et al. 2009)

Details of PRDM9 Independent work by two additional groups confirms that PRDM9 is a gene that directly determines hotspot locations in both humans and mice – Mapped a gene in mice, meaning different inbred strains possess different hotspot positions, to PRDM9 – Baudat et al. (2009), Parvanov et al. (2009), Myers et al. (2009) – Gel shift assays demonstrate PRDM9 really does bind the predicted motif: Baudat et al. (2009) PRDM9 puts an epigenetic mark on the histone DNA packaging – H3K4 trimethylation – The identical mark is used by yeast to mark hotspots (Borde et al. 2009) – Conservation over >1 billion years of evolution In mice – Different PRDM9 types mean different hotspot positions (Buard et al. 2009; Baudat et al. 2009) – Prdm9 expressed only in meiotic prophase (Hayashi et al. 2005) – Prdm9 -/- mutants infertile,fail to repair DSBs (Hayashi et al. 2005)

Baudat et al. (2009) Percent usage of LD hotspots Considerable variation in PRDM9 in humans, which influences the usage of hotspots as defined from LD data Different humans have different hotspot

How are hotspots evolving? Hotspots are radically different between humans and chimps Human Chimp LDhat rate estimates LDhot hotspots Winckler et al. (2005)

PRDM9 is radically different in chimpanzees Sharing between human and chimps: 1 of 13 zinc fingers Least shared of all 544 orthologous ZF protein pairs with at least two distinct zinc fingers in each species Patterns in multiple species indicate positive selection (Oliver et al. 2009) One of the fastest evolving genes in the human genome

Crossover activity at motif is human-specific p= Human motif sitesChimp motif sites THE1 repeats L2 repeats 694 SNPs, 36 western chimpanzees 16 THE1 regions, 6 L2 regions HapMap data, 210 humans Position relative to motif

Conclusions and current directions Why are hotspots where they are in the genome? – PRDM9 has sequence specific binding – Specifies narrow hotspot sites – Targets primary DNA sequence but makes an epigenetic “mark” – Only 40% of hotspots?? – Looking at PRDM9 binding in vivo using Chip-seq PRDM9 is evolving like crazy! – Between species – Within humans – Within mice, chimps,…. – Resequencing data for 10 chimpanzees to define their hotspots PRDM9 is the only mapped speciation gene in any mammal – Hybrid sterility in mouse (Mihola et al. 2009) – What is the link between recombination and speciation? – Does PRDM9 evolution, in general, lead to breeding barriers between species? Recombination and the motif implicated in multiple diseases – PRDM9 variation suggests different people susceptible to different genomic disorders