Download presentation
Presentation is loading. Please wait.
Published byEarl Carter Modified over 8 years ago
1
2007 Paul VanRaden 1, Jeff O’Connell 2, George Wiggans 1, Kent Weigel 3 1 Animal Improvement Programs Lab, USDA, Beltsville, MD, USA 2 University of Maryland School of Medicine, Baltimore, MD, USA 3 University of Wisconsin Dept. Dairy Science, Madison, WI, USA Paul.VanRaden@ars.usda.gov 2010 Genomic Evaluation with Many More Genotypes and Phenotypes Genomic Evaluation with Many More Genotypes and Phenotypes
2
9WCGALP, Leipzig, Germany, August 2010 (2)Paul VanRaden 2010 Topics Methods to combine different marker densities and datasets More markers: 500,000 simulation More animals: 3,000 marker subset More breeds: multi-trait markers More traits: same genotype cost
3
9WCGALP, Leipzig, Germany, August 2010 (3)Paul VanRaden 2010 Methods to Trace Inheritance Few markers Pedigree needed Prob (paternal or maternal alleles inherited) computed within families Many markers Can find matching DNA segments without pedigree Prob (haplotypes are identical) mostly near 0 or 1 if segments contain many markers
4
9WCGALP, Leipzig, Germany, August 2010 (4)Paul VanRaden 2010 Haplotype Probabilities with Few Markers (12 SNP / chromosome)
5
9WCGALP, Leipzig, Germany, August 2010 (5)Paul VanRaden 2010 Haplotype Probabilities with More Markers (50 SNP / chromosome)
6
9WCGALP, Leipzig, Germany, August 2010 (6)Paul VanRaden 2010 Haplotyping Program findhap.f90 Begin with population haplotyping Divide chromosomes into segments, ~250 SNP / segment List haplotypes by genotype match Similar to FastPhase, IMPUTE, or long range phasing End with pedigree haplotyping Detect crossover, fix noninheritance Impute nongenotyped ancestors
7
9WCGALP, Leipzig, Germany, August 2010 (7)Paul VanRaden 2010 Recent Program Revisions Improved imputation and reliability Changes since January 2010 Use known haplotype if second is unknown Use current instead of base frequency Combine parent haplotypes if crossover is detected Begin search with parent or grandparent haplotypes Store 2 most popular progeny haplotypes Simulated crossover rate increased
8
9WCGALP, Leipzig, Germany, August 2010 (8)Paul VanRaden 2010 Coding of Alleles and Segments Genotypes 0 = BB, 1 = AB or BA, 2 = AA 3 = B_, 4 = A_, 5 = __ (missing) Allele frequency used for missing Haplotypes 0 = B, 1 = not known, 2 = A Segment inheritance (example) Son has haplotype numbers 5 and 8 Sire has haplotype numbers 8 and 21 Son got haplotype number 5 from dam
9
9WCGALP, Leipzig, Germany, August 2010 (9)Paul VanRaden 2010 Most Frequent Haplotypes Most Frequent Haplotypes 1st segment of chromosome 15 1 5.16% 022222222020020022002020200020000200202000022022222202220 2 4.37% 022020220202200020022022200002200200200000200222200002202 3 4.36% 022020022202200200022020220000220202200002200222200202220 4 3.67% 022020222020222002022022202020000202220000200002020002002 5 3.66% 022222222020222022020200220000020222202000002020220002022 6 3.65% 022020022202200200022020220000220202200002200222200202222 7 3.51% 022002222020222022022020220200222002200000002022220002220 8 3.42% 022002222002220022022020220020200202202000202020020002020 9 3.24% 022222222020200000022020220020200202202000202020020002020 10 3.22% 022002222002220022002020002220000202200000202022020202220 For efficiency, store haplotypes just once. Most frequent haplotype in Holsteins had 4,316 copies =.0516 * 41,822 animals * 2 chromosomes each
10
9WCGALP, Leipzig, Germany, August 2010 (10)Paul VanRaden 2010 Population Haplotyping Steps Put first genotype into haplotype list Check next genotype against list Do any homozygous loci conflict? – If haplotype conflicts, continue search – If match, fill any unknown SNP with homozygote – 2 nd haplotype = genotype minus 1 st haplotype – Search for 2 nd haplotype in rest of list If no match in list, add to end of list Sort list to put frequent haplotypes 1st
11
9WCGALP, Leipzig, Germany, August 2010 (11)Paul VanRaden 2010 Check New Genotype Against List Check New Genotype Against List 1st segment of chromosome 15 5.16% 022222222020020022002020200020000200202000022022222202220 4.37% 022020220202200020022022200002200200200000200222200002202 4.36% 022020022202200200022020220000220202200002200222200202220 3.67% 022020222020222002022022202020000202220000200002020002002 3.66% 022222222020222022020200220000020222202000002020220002022 Get 2 nd haplotype by removing 1 st from genotype: 022002222002220022022020220020200202202000202020020002020 Search for 1 st haplotype that matches genotype: 022112222011221022021110220010110212202000102020120002021 3.65% 022020022202200200022020220000220202200002200222200202222 3.51% 022002222020222022022020220200222002200000002022220002220 3.42% 022002222002220022022020220020200202202000202020020002020 3.24% 022222222020200000022020220020200202202000202020020002020 3.22% 022002222002220022002020002220000202200000202022020202220
12
9WCGALP, Leipzig, Germany, August 2010 (12)Paul VanRaden 2010 Simulated 500K Tests How many 500K genotypes needed? Is computation affordable? Two subsets of mixed 500K and 50K: Of 33,414 HO, only 1,406 (young) had 500K Also bulls > 99% reliability, total 3,726 Linkage generated in base population Efficient and similar to autoregressive Linkage affects gain from more markers
13
9WCGALP, Leipzig, Germany, August 2010 (13)Paul VanRaden 2010 Holstein Linkage Disequilibrium
14
9WCGALP, Leipzig, Germany, August 2010 (14)Paul VanRaden 2010 Simulated Linkage
15
9WCGALP, Leipzig, Germany, August 2010 (15)Paul VanRaden 2010 Computer Requirements 500,000 markers, 33,414 animals StepGbytesCPU hours Simulate genotypes391.8 Pop’n haplotypes21.2 Pedigree haplotypes31.8 Store genotypes13- Store haplotypes3- Iterate allele effects (for 5 traits) 830
16
9WCGALP, Leipzig, Germany, August 2010 (16)Paul VanRaden 2010 Measures of Haplotyping Success Does estimated = true genotype? Does estimated = true linkage for adjacent heterozygous markers? Does estimated = true paternity? How many alleles remain missing? What is the error rate (Druet, 2010)? What is corr 2 (estimated, true genotype)? Are resulting GEBVs reliable?
17
9WCGALP, Leipzig, Germany, August 2010 (17)Paul VanRaden 2010 500K Imputation Results # 500K01,4063,79833,414 Percentages:50K50K & 500K500K Missing before186801 Missing after.047.23.3.05 Errors (young).031.3.9.03 Errors (old).043.41.7.04 Reliability82.683.483.684.0 Gain vs. 50K0.81.01.4
18
9WCGALP, Leipzig, Germany, August 2010 (18)Paul VanRaden 2010 500K Imputation Results # 500K=01,4063,79833,414 % wrong50K50K & 500K500K GenotypeYng.12.61.7.1 Old.17.33.4.1 LinkageYng.31.91.4.1 Old.45.42.5.2 PaternityYng2.04.95.02.5 Old4.37.66.24.2
19
9WCGALP, Leipzig, Germany, August 2010 (19)Paul VanRaden 2010 Imputation Summary 1,406 young animals genotyped at 500K REL gain 0.8% vs. 1.4% with all 500K Imputation better if ancestors also genotyped Could genotype additional reference bulls instead of re-genotyping bulls already done 32,008 animals imputed from 50K 10% SNP known before, 93% after 97-98% of 500K genotypes correct.839 squared correlation (estimated, true genotype)
20
9WCGALP, Leipzig, Germany, August 2010 (20)Paul VanRaden 2010 Multi-Breed Genomic Evaluation Treat allele effects as independent, same, or correlated, using data of 5,331 purebred Holsteins, 1,361 purebred Jerseys, and 506 purebred Brown Swiss
21
9WCGALP, Leipzig, Germany, August 2010 (21)Paul VanRaden 2010 Protein Yield R 2 SNP effects for breeds: HolsteinJerseyBrown Swiss None (PA).3142.4362.0933 Independent.5045.4874.1030 Same.4742.4731.1336 Correlated.5060.4916.1067 Optimum correlation was.3 with 43K markers, and would be larger with more markers
22
9WCGALP, Leipzig, Germany, August 2010 (22)Paul VanRaden 2010 Correlation with Single-Breed GEBV
23
9WCGALP, Leipzig, Germany, August 2010 (23)Paul VanRaden 2010 Fewer Markers, More Animals Fewer Markers, More Animals Half of young animals assigned 3K Proven bulls, cows all had 43K Dams imputed using 43K and 3K Half of ALL animals assigned 3K Could 3K reference animals help? 10,000 proven bulls yet to genotype Should cows with 3K be predictors?
24
9WCGALP, Leipzig, Germany, August 2010 (24)Paul VanRaden 2010 Reliability from 3K, 43K Mixture Chips3K3K and 43K43K # 43KN = 0½ All½ Young40,351 Missing %: Before147271 After.0531 Reliability %57646670 Rel - PA Rel21283034
25
9WCGALP, Leipzig, Germany, August 2010 (25)Paul VanRaden 2010 Correlations 2 of 3K and PA with 43K Genotyped ancestors had 43K Consistent gains across traits Reliability gain from progeny with 3K was 79-87% of gain from 43K Gain % = [Corr(3K,43K) 2 - Corr(PA,43K) 2 ] / [1 - Corr(PA,43K) 2 ] Large benefits for smaller cost
26
9WCGALP, Leipzig, Germany, August 2010 (26)Paul VanRaden 2010 Conclusions - 1 Missing genotypes can be filled easily Population and pedigree haplotyping can both process long segments efficiently Imputing 500,000 SNP for 33,414 Holsteins required 3 Gbyte memory, 3 CPU hours Haplotyping implemented for April 2010 routine U.S. evaluation Several recent improvements to accuracy Ready to include lower or higher density genotypes in evaluations
27
9WCGALP, Leipzig, Germany, August 2010 (27)Paul VanRaden 2010 Conclusions - 2 More markers improved reliability < 2% 1,406 high density genotypes sufficient 32,008 other animals imputed from 50K to 500K in simulation Fewer markers can decrease cost More animals can greatly increase reliability and selection differential Multi-breed model improves reliability only slightly (< 1%) at current density
28
9WCGALP, Leipzig, Germany, August 2010 (28)Paul VanRaden 2010 Acknowledgments Katie Olson computed the multi- breed genomic evaluation Mel Tooker assisted with graphics and computation Bob Schnabel helped improve marker locations on the map
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.