2007 Paul VanRaden and Mel Tooker Animal Improvement Programs Laboratory, USDA Agricultural Research Service, Beltsville, MD, USA 2007 Efficient Estimation of Breeding Values from Dense Genomic Data
FASS annual meeting, July 2007 (2) P.M. VanRaden 2007 Genomic Calculations Genotypes soon available from BFGL: 50,000 SNPs / animal 3,000 animals, many more possible Need efficient computing algorithms Traditional PTAs available from AIPL: PTAs combine phenotypes and pedigree SNP effects evaluated in second step using deregressed PTAs weighted by reliability
FASS annual meeting, July 2007 (3) P.M. VanRaden 2007 Genomic Computer Programs Simulate SNPs and QTLs Compare SNP numbers, size of QTLs Calculate genomic EBVs Use selection index, G instead of A Use iteration on data for SNP effects Form haplotypes from genotypes? Not tested yet, SNP regression used
FASS annual meeting, July 2007 (4) P.M. VanRaden 2007 Simulation Program Save memory by processing each chromosome separately 3,000 Holstein bulls to genotype 17,000 ancestors in pedigree file 1 billion (20,000 x 50,000 SNPs) genotypes simulated per replicate Only 150 million (3,000 x 50,000) genotypes stored for evaluation
FASS annual meeting, July 2007 (5) P.M. VanRaden 2007 Linear Estimates using Markers Selection index equations for EBV u^ = Cov(u,y) Var(y) -1 (y – Xb) u^ = Z Z’ [Z Z’ + R] -1 (y – Xb) R has diagonals = (1 / Reliability) - 1 BLUP equations for marker effects, sum to get EBV u^ = Z [Z’R -1 Z + I k] -1 Z’R -1 (y – Xb) k = var(u) / var(m)
FASS annual meeting, July 2007 (6) P.M. VanRaden 2007 Non-linear vs Linear Models
FASS annual meeting, July 2007 (7) P.M. VanRaden 2007 Marker Effect Prior Distribution Nonlinear Model
FASS annual meeting, July 2007 (8) P.M. VanRaden 2007 Iteration on Data Simple trick to reduce time from quadratic to linear with # SNPs Sum coefficients x solutions once Sum – diagonal = off-diagonals Janss and de Jong, 1999 conference Rediscovered by Legarra and Misztal Elements of Z are –p and (1 – p), where p is frequency of 2 nd allele
FASS annual meeting, July 2007 (9) P.M. VanRaden 2007 Computer Memory Inversion including G matrix Animals x markers to hold genotypes Animals 2 to hold elements of G <1 Gbyte for 50,000 SNPs, 3000 bulls Iteration on genotype data Markers + animals <.1 Gbyte for 50,000 SNPs, 3000 bulls Little memory required for either
FASS annual meeting, July 2007 (10) P.M. VanRaden 2007 Computing Times Inversion including G matrix Animals 2 x markers to form G matrix Animals 3 to invert selection index 10 hours for 3000 bulls, 50,000 SNPs Iteration on genotype data Markers x animals x iterations 16 hours for 1000 iterations.997 correlation with inversion
FASS annual meeting, July 2007 (11) P.M. VanRaden 2007 Convergence with iteration on data Jacobi iteration Use previous round coefficients x solutions Adaptive under-relaxation Increase relax if convergence improving Decrease relax (each round) if diverging Solution convergence reasonable SD of change <.0001 after 350 rounds SD of change < after 1700 rounds
FASS annual meeting, July 2007 (12) P.M. VanRaden 2007 Potential Results Simulation of 50,000 SNPs, 100 QTLs Predict young bullsAccuracyReliability Parent Average Linear SNP model Nonlinear model Higher REL if major QTLs exist or >3000 bulls genotyped, lower if more loci (>100) affect trait Reliability = accuracy 2
FASS annual meeting, July 2007 (13) P.M. VanRaden 2007 Reliability from Genotyping Daughter equivalents DE Total = DE PA + DE Prog + DE YD + DE G DE G is additional DE from genotype REL = DE total / (DE Total + k) Gains in reliability DE G could be about 15 for Net Merit More for traits with low heritability Less for traits with high heritability
FASS annual meeting, July 2007 (14) P.M. VanRaden 2007 Conclusions Predictions from 50,000 SNPs using: Selection index equations, or Iteration on genotype data Predictions correlated by up to.9999 Linear and nonlinear costs OK Convergence within 200 to 2500 rounds Nonlinear regression improved reliabilities Real data predictions available soon