Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (1) J. R. O’Connell 1 and P. M. VanRaden 2 1 University of Maryland School of Medicine, Baltimore, MD, USA 2 Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA Strategies to choose from millions of imputed sequence variants
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (2) Introduction l Whole-genome sequencing costs are rapidly declining l Large numbers of animals will be sequenced in the next few years l How will WGS be used in genomic evaluations?
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (3) Applications of WGS data l Develop reference panels for imputation using SNP chip l Discover QTLs l Discover rare and de novo mutations with deleterious effects for surveillance l Build better SNP chips
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (4) WGS implementation considerations l Cost of higher density SNP chips to industry l Imputation cost for weekly/monthly genomic evaluations l Redesigning chip incurs cost of generating new data for imputation w Build off the 60K chip
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (5) Project goal: Prepare for WGS l Develop programs to simulate genotypes and QTLs l Develop programs to process sequence data l Compare strategies for variant selection l Predict gains in reliability l Determine computational footprint for large populations
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (6) Data l 26,984 HOL bulls in U.S. reference population l 112,905 animals in the pedigree l 30 million simulated variants, 10,000 QTLs l 30 equal-length chromosomes (100 Mbases) l 3 different chip densities (HD, MD, LD) l 5 independent traits (same QTL locations)
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (7) Genotype simulation and pruning l Simulate 30 million genotypes plus QTLs in 1000 bulls using genosim.f90 l Remove loci with MAF 0.95 l Keep 500K loci near QTLs (genes) + 600K chip + 60K chip l 8,403,858 loci remain
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (8) QTL effect distribution (% variance) Top QTLs Trait Avg , ,
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (9) QTL correlation to 8M SNP set
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (10) Bull genotypes: Imputation findhap.f90 BullsGenotype densityVariants 1,000Sequenced pruned to8,403, High density600,000 24,863Medium density60, Low density12,000 26,984Total bulls, all imputed to8,403,858 17,896Phenotyped (old)DYD 9,088Validation (young)True BV
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (11) Computation required Step Proc- essors Time (hours) Memory (Gbyte) Disk (Gbyte) Simulate 30M Prune linkage Impute 8M
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (12) Study1: Select 25K from 600K chip l Compare 60K and 600K REL using same data l Choose best markers from 600K, add to 60K w Select largest 5,000 for each of 5 traits w Multiple regression: By effect size, effect variance w GWA: By p-value significance w Merge and remove duplicates, adding 23K
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (13) Results: REL from PA, 60K, 85K, 600K 60K + 25K TraitPA60KGWASizeVar600K Avg
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (14) Genome-wide association l Single SNP regression within mixed model that accounts for the polygenic effect using pedigree kinship w Y= Xb + SNP + a + e l Multiple testing adjustment leads to stringent cutoffs for significance l Focus on estimation rather than prediction
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (15) GWA trait 2 imputed QTLs <5% QTL < 10e-5
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (16) Study 2: Select sequence variants l 8M – search everywhere l 1M – search by bioinformatics using 2.5kb window on QTL l QTLs – perfect functional knowledge l Include 60K chip for imputation, but assign all or most prior variance to the selected variants w Selection of variants may affect optimal shape/ choice of prior distribution
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (17) REL from 1M, 60K+1M subset, QTLs 1 million near QTLsInclude QTLs Trait600K60K+25KAll 1M60K+10K10K Avg
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (18) Computation required Step Proc- essors Time (hours) Memory (Gbyte) Disk (Gbyte) Simulate 30M Prune linkage Impute 8M Predict 1M52220<1 Select 25K from 8M 300.5<1
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (19) Conclusions l Selection of variants requires several steps l Potential gains from sequence are large l Still requires very large reference population l Methods scalable to more sequences l Bioinformatics can narrow the search l GWA is scalable to larger data but multiple regression give higher reliability
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (20) Current directions l Evaluate several additional SNP selection criteria including priors for GWA l Apply methods to real data from 1000 Bulls Project l Evaluate bovine bioinformatics resources for functional annotation
Jeff O’ConnellInterbull annual meeting, Orlando, FL, July 2015 (21) Acknowledgments l JRO supported by USDA SCA