WiggansARS Big Data Computing Workshop (1) 2013 George R. Wiggans Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD , USA Big data in support of genetic improvement of dairy cattle
WiggansARS Big Data Computing Workshop (2) 2013 Mission l Genetic improvement of dairy cattle for economically important traits w Yield (milk, fat, and protein) w Conformation (overall and individual traits) w Longevity (productive life) w Fertility (conception and pregnancy rates) w Calving (dystocia and stillbirth) w Disease resistance (mastitis)
WiggansARS Big Data Computing Workshop (3) 2013 Data types l Identification information for animal: w Name w ID number w Birth date w Sire l Animal genotypes from marker panels that that range from 2,900 to 777,962 markers w Breed w Herd w Country w Dam Courtesy of Illumina, Inc.
WiggansARS Big Data Computing Workshop (4) 2013 Data types (continued) l Records for milk yield, fat percentage, protein percentage, and somatic cell count (1/month) l Appraiser-assigned scores for 16 body and udder characteristics related to conformation (e.g., stature) l Breeding records that include indicator for conception success l Calving difficulty scores and stillbirth indication
WiggansARS Big Data Computing Workshop (5) 2013 Data amounts l 68,270,792 identification records l 334,402 animal genotypes l 142,157,859 lactation records (since 1960) l 558,425,959 daily yield records (since 1990) l 139,043,355 reproduction event records l 25,223,471 calving difficulty scores l 21,971,890 stillbirth scores
WiggansARS Big Data Computing Workshop (6) 2013 Computing environment l Computation server w 2.3–2.7 GHz CPU (32 cores, 64 threads) w 256 GB RAM w 5 TB local storage l Database server w 3.0 GHz CPU (8 cores) w 40 GB RAM w 2 TB local storage l Shared storage w 19 TB
WiggansARS Big Data Computing Workshop (7) 2013 Data management l Variable length segments for database rows to minimize space and overhead in identifying data l All marker genotypes for an animal stored each as a single byte in a character large object (CLOB) l All breedings and monthly milk yield and component information for a cow’s lactation stored in variable character data types
WiggansARS Big Data Computing Workshop (8) 2013 Programming languages lClC w Database interface including data editing l FORTRAN w Calculation of genetic merit estimates l SAS w Data preparation, checking, and delivery
WiggansARS Big Data Computing Workshop (9) 2013 Calculation schedule l Triannual genetic merit estimates from processed phenotypic data l Monthly genomic evaluations based on estimates of marker effects using genotypic data and triannual phenotype-based evaluations
WiggansARS Big Data Computing Workshop (10) 2013 Transition to industry l Council on Dairy Cattle Breeding w Database maintenance w Calculation and distribution of genetic merit estimates l ARS w Research and development using data made available by Council l Adjacent work areas planned
WiggansARS Big Data Computing Workshop (11) 2013 Research resource l Massive amount of genomic data Location of causal genetic variants l Investigation of haplotypes never found in a homozygous state ÜDiscovery of chromosomal abnormalities resulting in early embryonic death l Investigation of sons of heterozygous sires ÜDetection of QTL from differences between sons by haplotype
WiggansARS Big Data Computing Workshop (12) 2013 Summary l Highly successful program leading to annual increases in genetic merit for production efficiency l Large database of phenotypic and genomic data provided by industry l Big data supports research to determine mechanism of genetic control of economically important traits l Data processing techniques developed to meet needs of industry