Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predicting survivors of Neonatal calf diarrhea (NCD) using Logistic Regression or Gradient Boosting Stefano Biffani*, Cesare Lubiano1, Davide Pravettoni1.

Similar presentations


Presentation on theme: "Predicting survivors of Neonatal calf diarrhea (NCD) using Logistic Regression or Gradient Boosting Stefano Biffani*, Cesare Lubiano1, Davide Pravettoni1."— Presentation transcript:

1 Predicting survivors of Neonatal calf diarrhea (NCD) using Logistic Regression or Gradient Boosting
Stefano Biffani*, Cesare Lubiano1, Davide Pravettoni1 and Antonio Boccardo1 *IBBA-CNR (UOS-Lodi), 1 Clinic for ruminants, swine and management - Large Animal Veterinary Teaching Hospital (UNIMI-LODI)

2 Background NCD Neonatal calf diarrhea (NCD), is a common disease affecting the newborn calf.

3 Affected by Neonatal calf diarrhea (NCD) within 1st 2 weeks
Background Affected by Neonatal calf diarrhea (NCD) within 1st 2 weeks 25-35 %

4 Background Clinical problems NCD 60-80 %

5 Consequences over $250 million /year Death Reduced growth
Veterinary costs Reduced growth Death

6 The idea NCD Clinical data Ancillary tests
LODI - Large Animal Veterinary Teaching Hospital NCD Clinical data Ancillary tests

7 The idea Emogas Analysis (pH, Ht, NA+,K+, Cl-,Anion Gap, pCO2, bicarbonati, base excess) Total Protein Dehydration and vitality Score Rectal temperature Age, hospitalization month, Sex Features

8 Will it survive ? The idea Features Statistical Learning:
Tools for understanding data Will it survive ?

9 Data 131 Holstein Calves (males=16, females=115) affected by NCD
hospitalized from January 2006 until August 2014 Proportion deceased = 34% (45/131)

10 Methods Logistic Regression (LR): Parametric method
Models the probability that the response variable (survive YES/NO) belongs to a particular category

11 Methods Gradient Boosting Machine (GBM) Non parametric method
the learning procedure consecutively fits new models (trees) to provide a more accurate estimate of the response variable

12 Methods

13 Methods Building the predictive model: Split the data in
Training dataset (75% , n=92, incidence = 34%) Testing dataset (25%, n=39, incidence = 36 %) Use training data to build the predictive model using a 5-fold cross-validation scheme

14 Methods Finally: Use the obtained model to predict the probability of the disease outcome (dead/survived) in the testing data

15 Statistics Accuracy (AC) : total number of predictions that were correct Sensitivity or true positive rate (TPR): Correctly predicted as survived Specificity or True Negative Rate (TNR): Correctly predicted as deceased

16 Results- Training Statistics LR GBM Accuracy 0.84 1.00 TPR 0.83 1.00
TNR 0.65 1.00 it How many times worked well How may really survived calves it predicted How many really deceased calves it predicted

17 Results - Testing Statistics LR GBM Accuracy 0.70 0.54 TPR 0.74 0.63
TNR 0.58 0.33 it How many times worked well How may really survived calves it predicted How many really deceased calves it predicted

18 Results - Consideration
What happened ???? Overfitting

19 Results - Consideration
GBM did a fantastic job on training data but failed on testing data – why ??? Size of training data Reduce complexity (# trees)

20 Results – Testing reducing complexity (# trees)
Statistics GBM GBMn Accuracy 0.54 0.62 TPR 0.63 0.69 TNR 0.33 0.46 it How many times worked well How may really survived calves it predicted How many really deceased calves it predicted

21 Results - Consideration
Feature importance

22 Conclusions Learning from data !!!
It’s possible to build predictive model for NCD. Accuracy depending on training sample size & method Total protein and Anion Gap important features

23 Thanks


Download ppt "Predicting survivors of Neonatal calf diarrhea (NCD) using Logistic Regression or Gradient Boosting Stefano Biffani*, Cesare Lubiano1, Davide Pravettoni1."

Similar presentations


Ads by Google