C. P. Van Tassell, G. R. Wiggans, and L. L. M. Thornton Animal Improvement Programs Laboratory Agricultural Research Service, USDA, Beltsville, MD Investigation of Herds Years with Abnormal Distributions of Calving Ease Scores
The Problem Herds with unusual distributions of data affect evaluations of bulls Worst case is when large share of records for a bull are in one “bad” herd Herd reporting changes over time
Percentage of Score by Parity In All Herds Calving Ease Score Counts by Herd-Parity (%) Parity 1 Parity 2+
Frequency of CE Scores by herd for HOUSA0000XXXXXXXX Herd Total (100) 0 ( 0) 0 ( 0) 0 ( 0) 0 ( 0) 2 ( 1) (100) 0 ( 0) 0 ( 0) 0 ( 0) 0 ( 0) 1 ( 1) (100) 0 ( 0) 0 ( 0) 0 ( 0) 0 ( 0) 1 ( 1) (100) 0 ( 0) 0 ( 0) 0 ( 0) 0 ( 0) 1 ( 1) ( 18) 0 ( 0) 5 ( 8) 1 ( 2) 48 ( 73) 66 ( 34) ( 57) 467 ( 19) 410 ( 17) 76 ( 3) 78 ( 3) ( 14) 2 ( 29) 4 ( 57) 0 ( 0) 0 ( 0) 7 ( 4) (100) 0 ( 0) 0 ( 0) 0 ( 0) 0 ( 0) 4 ( 2) ( 67) 1 ( 33) 0 ( 0) 0 ( 0) 0 ( 0) 3 ( 2) (100) 0 ( 0) 0 ( 0) 0 ( 0) 0 ( 0) 1 ( 1) ( 0) 1 ( 50) 0 ( 0) 1 ( 50) 0 ( 0) 2 ( 1) ( 67) 2 ( 33) 0 ( 0) 0 ( 0) 0 ( 0) 6 ( 3) Example of a Problem Bull
Concept Identify ‘outlier’ herds Remove that data Determine if evaluation is ‘better’ Trade-off between edits for bad data and overall loss of data
Test Edits Exclude herds with abnormal distributions of scores Abnormal herds defined by multinomial likelihood Population frequencies for parity groups (1 vs. 2+) used for expected values Herd test statistics calculated within parity (1 vs. 2+) and summed
GOF Statistics Multinomial distribution likelihood ratio with ‘expected’ distribution adjusted for herd size
Predictability of Future Evaluations Compare evaluations from complete data to evaluations from partial data Partial data truncated by: Date of calving Goodness of Fit (GOF) exclusion
Strategy for Herd Exclusions Adjacent herd-years also excluded if exceed a less extreme threshold 5-fold difference in likelihood A future evaluation could potentially have fewer records than a previous run!
Example Herd 1 year c1_1 c1_2 c1_3 c1_4 c1_5 sumh1 c2_1 c2_2 c2_3 c2_4 c2_5 sumh2 gof drop
Example Herd 2 year c1_1 c1_2 c1_3 c1_4 c1_5 sumh1 c2_1 c2_2 c2_3 c2_4 c2_5 sumh2 gof drop
Percentage of Score by Parity In All (AN) and GOF4 Excluded (AG) Herds Calving Ease Score Counts by Herd-Parity (%) Parity 1 - AN Parity 2 - AN Parity 1 - AG Parity 2 - AG
Conclusions GOF test excludes herds with poor score distribution uniformly across herd size Exclusion of herds results in loss of evaluations for some bulls Exclusion of data is expected to improve run to run stability
Remaining Issues Optimum amount of data to exclude Evaluate different fractions of data removal Recently submitted test run to InterBull with 1.5% data excluded Will likely move to 7% data discarded Will conduct sensitivity analysis to assess optimal data discard Current InterBull test run for calving ease
Frequency of Codes in Combined Interbull File CodeSource Official Report FrequencyPercent Cumulative FrequencyPercent Sire Calving Ease CFrom correlationNo DDomesticNo15, , DDomesticYes26, , IInterbullYes22, , PSire MGS IndicesYes43, , Daughter Calving Ease CFrom correlationYes10, , DDomesticNo15, , DDomesticYes26, , IInterbullYes17, , PSire MGS IndicesYes43, ,