2007 Paul VanRaden 1, Jeff O’Connell 2, George Wiggans 1, Kent Weigel 3 1 Animal Improvement Programs Lab, USDA, Beltsville, MD, USA 2 University of Maryland School of Medicine, Baltimore, MD, USA 3 University of Wisconsin Dept. Dairy Science, Madison, WI, USA 2010 Fill ing Missing Genotypes Using Haplotypes Fill ing Missing Genotypes Using Haplotypes
ADSA / ASAS annual meeting, Denver, July 2010 (2)Paul VanRaden 2010 Genotypes / Haplotypes Genotypes indicate how many copies of each allele were inherited Haplotypes indicate which alleles are on which chromosome Observed genotypes partitioned into the two unknown haplotypes Pedigree haplotyping uses relatives Population haplotyping finds matching allele patterns
ADSA / ASAS annual meeting, Denver, July 2010 (3)Paul VanRaden 2010 Filling missing genotypes Predict unknown SNP from known Measure 3,000, predict 43,000 SNP Measure 50,000, predict 500,000 Measure each haplotype at highest density only a few times Predict dam from progeny SNP Increase reliabilities for less cost
ADSA / ASAS annual meeting, Denver, July 2010 (4)Paul VanRaden 2010 Haplotyping Program findhap.f90 Begin with population haplotyping Divide chromosomes into segments, ~250 SNP / segment List haplotypes by genotype match Similar to fastPhase, IMPUTE End with pedigree haplotyping Detect crossover, fix noninheritance Impute nongenotyped ancestors
ADSA / ASAS annual meeting, Denver, July 2010 (5)Paul VanRaden 2010 Computer Requirements 500,000 markers, 33,414 animals StepGbytesCPU hours Simulate genotypes391.8 Pop’n haplotypes21.2 Pedigree haplotypes31.8 Store genotypes13- Store haplotypes3- Iterate allele effects (for 5 traits) 830
ADSA / ASAS annual meeting, Denver, July 2010 (6)Paul VanRaden 2010 Recent Program Revisions Improved imputation and GEBV reliability since 9WCGALP paper Changes since January 2010 Use known haplotype if second is unknown Use current instead of base frequency Combine parent haplotypes if crossover is detected Begin search with parent or grandparent haplotypes Store 2 most popular progeny haplotypes
ADSA / ASAS annual meeting, Denver, July 2010 (7)Paul VanRaden 2010 Example Bull: O-Style USA , Sire = O-Man Read genotypes and pedigrees Write haplotype segments found List paternal / maternal inheritance List crossover locations
ADSA / ASAS annual meeting, Denver, July 2010 (8)Paul VanRaden 2010 O-Style Haplotypes Chromosome 15
ADSA / ASAS annual meeting, Denver, July 2010 (9)Paul VanRaden 2010 Pedigree Haplotyping AB allele coding Genotypes: OMan BB,AA,AA,AB,AA,AB,AB,AA,AA,AB Ostyle BB,AA,AA,AB,AB,AA,AA,AA,AA,AB Haplotypes: OStyle (pat) B A A _ A A A A A _ OStyle (mat) B A A _ B A A A A _
ADSA / ASAS annual meeting, Denver, July 2010 (10)Paul VanRaden 2010 Allele and Segment Coding Genotypes 0 = BB, 1 = AB or BA, 2 = AA 5 = missing Haplotypes 0 = B, 1 = not known, 2 = A Segment storage (example) O-Style has haplotype numbers 5 and 8 O-Man has haplotype numbers 8 and 21 O-Style got haplotype number 5 from dam
ADSA / ASAS annual meeting, Denver, July 2010 (11)Paul VanRaden 2010 Most Frequent Haplotypes Most Frequent Haplotypes 1st segment of chromosome % % % % % % % % % % Most frequent haplotype in Holsteins had 4,316 copies =.0516 * 41,822 animals * 2 chromosomes each
ADSA / ASAS annual meeting, Denver, July 2010 (12)Paul VanRaden 2010 Population Haplotyping Steps Put first genotype into haplotype list Check next genotype against list Do any homozygous loci conflict? – If haplotype conflicts, continue search – If match, fill any unknown SNP with homozygote – 2 nd haplotype = genotype minus 1 st haplotype – Search for 2 nd haplotype in rest of list If no match in list, add to end of list Sort list to put frequent haplotypes 1st
ADSA / ASAS annual meeting, Denver, July 2010 (13)Paul VanRaden 2010 Check New Genotype Against List Check New Genotype Against List 1st segment of chromosome % % % % % % % % % % Subtract 1 st haplotype from genotype to get 2 nd : Check genotype:
ADSA / ASAS annual meeting, Denver, July 2010 (14)Paul VanRaden 2010 Conclusions Missing genotypes can be filled easily Population and pedigree haplotyping can both process long segments efficiently Imputing 500,000 SNP for 33,414 Holsteins required 3 Gbyte memory, 3 CPU hours Program findhap.f90 implemented for April 2010 routine evaluation Several recent improvements to accuracy Ready to include lower or higher density genotypes in evaluations