Download presentation
Presentation is loading. Please wait.
Published byEugenia Paul Modified over 6 years ago
1
Predicting survivors of Neonatal calf diarrhea (NCD) using Logistic Regression or Gradient Boosting
Stefano Biffani*, Cesare Lubiano1, Davide Pravettoni1 and Antonio Boccardo1 *IBBA-CNR (UOS-Lodi), 1 Clinic for ruminants, swine and management - Large Animal Veterinary Teaching Hospital (UNIMI-LODI)
2
Background NCD Neonatal calf diarrhea (NCD), is a common disease affecting the newborn calf.
3
Affected by Neonatal calf diarrhea (NCD) within 1st 2 weeks
Background Affected by Neonatal calf diarrhea (NCD) within 1st 2 weeks 25-35 %
4
Background Clinical problems NCD 60-80 %
5
Consequences over $250 million /year Death Reduced growth
Veterinary costs Reduced growth Death
6
The idea NCD Clinical data Ancillary tests
LODI - Large Animal Veterinary Teaching Hospital NCD Clinical data Ancillary tests
7
The idea Emogas Analysis (pH, Ht, NA+,K+, Cl-,Anion Gap, pCO2, bicarbonati, base excess) Total Protein Dehydration and vitality Score Rectal temperature Age, hospitalization month, Sex Features
8
Will it survive ? The idea Features Statistical Learning:
Tools for understanding data Will it survive ?
9
Data 131 Holstein Calves (males=16, females=115) affected by NCD
hospitalized from January 2006 until August 2014 Proportion deceased = 34% (45/131)
10
Methods Logistic Regression (LR): Parametric method
Models the probability that the response variable (survive YES/NO) belongs to a particular category
11
Methods Gradient Boosting Machine (GBM) Non parametric method
the learning procedure consecutively fits new models (trees) to provide a more accurate estimate of the response variable
12
Methods
13
Methods Building the predictive model: Split the data in
Training dataset (75% , n=92, incidence = 34%) Testing dataset (25%, n=39, incidence = 36 %) Use training data to build the predictive model using a 5-fold cross-validation scheme
14
Methods Finally: Use the obtained model to predict the probability of the disease outcome (dead/survived) in the testing data
15
Statistics Accuracy (AC) : total number of predictions that were correct Sensitivity or true positive rate (TPR): Correctly predicted as survived Specificity or True Negative Rate (TNR): Correctly predicted as deceased
16
Results- Training Statistics LR GBM Accuracy 0.84 1.00 TPR 0.83 1.00
TNR 0.65 1.00 it How many times worked well How may really survived calves it predicted How many really deceased calves it predicted
17
Results - Testing Statistics LR GBM Accuracy 0.70 0.54 TPR 0.74 0.63
TNR 0.58 0.33 it How many times worked well How may really survived calves it predicted How many really deceased calves it predicted
18
Results - Consideration
What happened ???? Overfitting
19
Results - Consideration
GBM did a fantastic job on training data but failed on testing data – why ??? Size of training data Reduce complexity (# trees)
20
Results – Testing reducing complexity (# trees)
Statistics GBM GBMn Accuracy 0.54 0.62 TPR 0.63 0.69 TNR 0.33 0.46 it How many times worked well How may really survived calves it predicted How many really deceased calves it predicted
21
Results - Consideration
Feature importance
22
Conclusions Learning from data !!!
It’s possible to build predictive model for NCD. Accuracy depending on training sample size & method Total protein and Anion Gap important features
23
Thanks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.