Meiotic gene conversion in humans: rate, sex ratio, and GC bias Amy L. Williams June 19, 2013 University of Chicago.

Slides:



Advertisements
Similar presentations
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Advertisements

Genetic Linkage and Mapping Notation — ————— A _________ A a Aa Diploid Adult Haploid gametes (single chromatid) — ————— Two homologous chromosomes,
METHODS FOR HAPLOTYPE RECONSTRUCTION
Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Gene Linkage and Genetic Mapping
Genetic linkage analysis Dotan Schreiber According to a series of presentations by M. Fishelson.
Basics of Linkage Analysis
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
University of Connecticut
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Human Gene Mapping & Disease Gene Identification
Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.
You have body cells and gametes.
General Explanation There are 2 input files –The locus file describes the loci being analyzed and parameters for the different analyzing programs. –The.
You have body cells and gametes.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Candidate Gene Studies in Substance-Dependent Adolescents, their Siblings, and Controls S. E. Young, A. Smolen, M. C. Stallings, R. P. Corley, T. J. Crowley.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Question 1___________________________ Question 2___________________________ Question 3 ___________________________ TotalAverage = 44 out of 50 points Important.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
You have body cells and gametes.
Non-Mendelian Genetics
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
Genetics and Inheritance Year 10 Biology Part 1: Genes & Chromosomes.
CS177 Lecture 10 SNPs and Human Genetic Variation
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
Gene Hunting: Linkage and Association
The Inheritance of Traits  Most children are similar to their parents  Children tend to be similar to siblings  Each child is a combination of parental.
Butte Lab Journal Club 4/19/10 Linda Liu. - DNA sequenced by Complete Genomics - Advantages of family-based sequencing as opposed to sequencing unrelated.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Unit 8 Chromosomes Meiosis Genetics. Review What is a chromosome? What is a gamete? When can chromosomes be seen in the nucleus of a cell? What is this.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
1 Balanced Translocation detected by FISH. 2 Red- Chrom. 5 probe Green- Chrom. 8 probe.
Today… Genome 351, 15 April 2013, Lecture 5 Meiosis: how the genetic material is partitioned during the formation of gametes (sperm and eggs) Probability:
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
2007 Paul VanRaden 1, Jeff O’Connell 2, George Wiggans 1, Kent Weigel 3 1 Animal Improvement Programs Lab, USDA, Beltsville, MD, USA 2 University of Maryland.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
1 Human Genetics: Pedigrees. Pedigree Looks at family history and how a trait is inherited over several generations and can help predict inheritance patterns.
Genomics Chapter 18.
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
KEY CONCEPT – Section 6.1 Gametes have half the number of chromosomes that body cells have.
G.R. Wiggans Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD Select Sires‘ Holstein.
Linkage and Mapping Bonus #2 due now. The relationship between genes and traits is often complex Complexities include: Complex relationships between alleles.
You have body cells and gametes.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Timing, rates and spectra of human germline mutation
Recombination and Mapping (cont’d)
Of Sea Urchins, Birds and Men
Constrained Hidden Markov Models for Population-based Haplotyping
Linkage analysis & Homozygosity mapping
Chapter 6.
Error Checking for Linkage Analyses
Meiosis & Mendel Chapter 6
Jianbin Wang, H. Christina Fan, Barry Behr, Stephen R. Quake  Cell 
Homework #4 is due 12/4/07 (only if needed)
High-Resolution Mapping of Crossovers in Human Sperm Defines a Minisatellite- Associated Recombination Hotspot  Alec J Jeffreys, John Murray, Rita Neumann 
IBD Estimation in Pedigrees
Pier Francesco Palamara, Laurent C. Francioli, Peter R
Presentation transcript:

Meiotic gene conversion in humans: rate, sex ratio, and GC bias Amy L. Williams June 19, 2013 University of Chicago

Gene conversion defined Meiosis: produces haploid germ cells with recombinations Gene conversion: short segment copied into given chromosome from other homolog Meiosis Crossover Gene Conversion Two types of recombination:

Number of gene conversions per meiosis? –4-15× # crossovers? Jeffreys and May (2004) Length of gene conversion tracts? – bp? Jeffreys and May (2004) Study question 1: gene conversion rate?

Number of gene conversions per meiosis? –4-15× # crossovers? Jeffreys and May (2004) Length of gene conversion tracts? – bp? Jeffreys and May (2004) Per base-pair rate? Fraction of genome affected –R = (number × tract length) / genome length –2.2×10 -6 to 4.4×10 -5 ? Jeffreys and May (2004) Study question 1: gene conversion rate?

Study question 2: male vs. female rate? Gender differences in rate? –Crossovers: female rate 1.78× male (deCODE)

Study question 3 & 4: GC bias? Localization? GC bias observed in allelic transmissions? Crossover hot spots influence location? Locations of gene conversions independent in a given meiosis? Myers et al., Science 2005

Summary: study questions 1.Genome-wide de novo gene conversion rate? 2.Different rate between males/females? 3.Extent of GC bias in tracts? 4.Localization: Hotspots? Tracts independent?

Outline Background / study questions Study design and methods Results –SNP chip data –Sequence data

Approaches to identify gene conversions Linkage disequilibrium based –Can give rate estimate –Averaged over human history, both genders Sperm-based –Many meiotic products: per-individual estimates –Single molecule: genome-wide assays difficult Pedigree-based –De novo, per-gender events observable –Data for many samples required

Study design: SNP chip data for pedigrees Primary analysis: pedigree SNP chip data Challenge: small tracts –Tracts covered by ≤ 1 SNP –Not all tracts covered, but still obtain overall rate Chip data give per base-pair rate –R = # gene conversions / # informative sites

Datasets for analysis Mexican American pedigrees Data source 1: San Antonio Family Studies –2,490 genotyped samples, 80 pedigrees –SNP chip genotypes (Illumina 1M, 660k) –Can estimate de novo gene conversion rate

Datasets for analysis Mexican American pedigrees Data source 1: San Antonio Family Studies –2,490 genotyped samples, 80 pedigrees –SNP chip genotypes (Illumina 1M, 660k) –Can estimate de novo gene conversion rate Data source 2: T2D-GENES Consortium –607 sequenced samples, 20 pedigrees –Whole genome sequence (Complete Genomics) –Can examine tract length, distribution, etc. Though need deep data on single family to do so

Study design: SNP chip data for pedigrees Pedigree-based haplotypes/phase reveal recombinations –Heterozygous sites: informative for recombination Phasing method: Hapi –Phases nuclear families –Williams et al., Genome Biol. 2010

Family-based phase reveals recombinations Hapi output: paternal haplotype transmissions Crossover: Haplotype 2 Haplotype 1

Family-based phase reveals recombinations Hapi output: paternal haplotype transmissions Crossover:Gene Conversion: Haplotype 2 Haplotype 1

Other pedigree phasing methods Most pedigree phasing methods slow –Runtime complexity for phasing ~O(m 2 2n ) n = # non-founders m = # markers –Example: nuclear family with 11 children 4,194,304 states per marker Can merge exponential class of states Many states extremely unlikely to be optimal

Hapi: efficient phasing of nuclear families Hapi: state space reduction improves efficiency –Merges exponential class of states –Omits states that cannot yield optimal solution Applied to family with 11 children –Average per marker states: 4.2, maximum 48

Hapi: efficient phasing of nuclear families Hapi: state space reduction improves efficiency –Merges exponential class of states –Omits states that cannot yield optimal solution Applied to family with 11 children –Average per marker states: 4.2, maximum 48 Program All families (N=103) RuntimeSpeedup Hapi3.1 s- Merlin1,005 s323× Allegro v27,661 s2,462× Superlink1,393 s*448× * Superlink failed to analyze 11 child family; 8/11 children used

Hapi: efficient phasing of nuclear families Hapi: state space reduction improves efficiency –Merges exponential class of states –Omits states that cannot yield optimal solution Applied to family with 11 children –Average per marker states: 4.2, maximum 48 Program All families (N=103)≤ 3 children (N=86) RuntimeSpeedupRuntimeSpeedup Hapi3.1 s-2.2 s- Merlin1,005 s323×8.7 s3.8× Allegro v27,661 s2,462×14.5 s6.4× Superlink1,393 s*448×38.8 s17.2× * Superlink failed to analyze 11 child family; 8/11 children used

Applying Hapi to multi-generational pedigrees Hapi currently applies to nuclear families –For 3-generation pedigrees analyzed for gene conversions, omit sites with phase conflicts Will not bias results, but data are reduced

Applying Hapi to multi-generational pedigrees Hapi currently applies to nuclear families –For 3-generation pedigrees analyzed for gene conversions, omit sites with phase conflicts Will not bias results, but data are reduced Extension to Hapi possible to efficiently analyze arbitrarily large pedigrees –Most San Antonio Family Studies pedigrees too large to be phased in practical time

Approach to identifying gene conversions 1.Perform QC, phase 3-generation pedigrees 2.Find gene conversions in 2 nd generation: single SNP double crossovers 3.Confirm: –Gene converted allele in 3 rd generation –Other allele in 2 nd generation sibling(s) False positive only if ≥ 2 genotyping errors

Outline Background / study questions Study design and methods Results –SNP chip data –Sequence data

Current analysis dataset Analyzed SNP chip data for 16 pedigrees –Data for both parents, 3+ children, 1+ grandchild –190 samples –42 meioses (21 paternal, 21 maternal) 4.15×10 6 informative sites

Rate: 7.95×10 -6 /bp/generation –Within range of Jeffreys and May (2004) –Close to LD-based estimates Result 1: 33 putative gene conversions, rate Male Female

Rate: 7.95×10 -6 /bp/generation –Within range of Jeffreys and May (2004) –Close to LD-based estimates Result 1: 33 putative gene conversions, rate Male Female Are these real gene conversions?

19 sites sequenced by T2D-GENES Consortium –18/19 gene conversion genotypes verified Differing site looks like sequencing artifact –2 nd generation recipient has genotype mismatch 3 rd generation grandchild shows same genotype –If sequence data correct, gene conversion in grandchild T2D-GENES sequence confirms events

More female gene conversions than male –Females transmit 1.54× males –Difference (yet) not significant – larger sample coming Different rates expected based on crossovers –Female crossover rate 1.78× male (deCODE) Result 2: gene conversion rates by gender

Result 3: gene conversions localize in hotspots 2.71% of genome in ≥10 cM/Mb hotspots

Result 3: gene conversions localize in hotspots 10/33 gene conversions with ≥10 cM/Mb: P=1.1× % of genome in ≥10 cM/Mb hotspots

Result 4: observe extreme GC bias 31 GC informative sites –A/C, A/G T/C, T/G GC transmission in 74% of cases (95% CI 59% – 90%) –GC bias likely (P=5.3×10 -3 )

Outline Background / study questions Study design and methods Results –SNP chip data –Sequence data

Sequence near chip-identified gene conversions Sequence available for 11/33 putative sites

Sequence near chip-identified gene conversions Sequence available for 11/33 putative sites Shortest resolution for tract length ≤ 143 bp

Sequence near chip-identified gene conversions Sequence available for 11/33 putative sites Clustered gene conversions in 4 sequences

Sequence near chip-identified gene conversions Sequence available for 11/33 putative sites Clustered gene conversions in 4 sequences Boxed regions confirmed by Sanger sequencing

Relationship to complex crossover? Haplotype 2 Haplotype 1

Conclusions Estimate of de novo gene conversion rate –7.95×10 -6 /bp/generation –Females: 1.54× gene conversions vs. males Enriched in hotspots: similar mechanism to crossover GC vs AT allele transmitted ~3:1 – GC bias Complex/clustered gene conversions observed in sequence data –Suggests unique correlation within short region

The T2D-GENES Consortium (NIDDK) San Antonio Family Studies (NIDDK, NIMH) NHGRI NRSA Fellowship Acknowledgements Nick PattersonDavid ReichJohn BlangeroGiulio GenoveseTom DyerKati Truax