Predicting survivors of Neonatal calf diarrhea (NCD) using Logistic Regression or Gradient Boosting Stefano Biffani*, Cesare Lubiano1, Davide Pravettoni1.

Predicting survivors of Neonatal calf diarrhea (NCD) using Logistic Regression or Gradient Boosting
Stefano Biffani*, Cesare Lubiano1, Davide Pravettoni1 and Antonio Boccardo1 *IBBA-CNR (UOS-Lodi), 1 Clinic for ruminants, swine and management - Large Animal Veterinary Teaching Hospital (UNIMI-LODI)

Background NCD Neonatal calf diarrhea (NCD), is a common disease affecting the newborn calf.

Affected by Neonatal calf diarrhea (NCD) within 1st 2 weeks
Background Affected by Neonatal calf diarrhea (NCD) within 1st 2 weeks 25-35 %

Background Clinical problems NCD 60-80 %

Consequences over $250 million /year Death Reduced growth
Veterinary costs Reduced growth Death

The idea NCD Clinical data Ancillary tests
LODI - Large Animal Veterinary Teaching Hospital NCD Clinical data Ancillary tests

The idea Emogas Analysis (pH, Ht, NA+,K+, Cl-,Anion Gap, pCO2, bicarbonati, base excess) Total Protein Dehydration and vitality Score Rectal temperature Age, hospitalization month, Sex Features

Will it survive ? The idea Features Statistical Learning:
Tools for understanding data Will it survive ?

Data 131 Holstein Calves (males=16, females=115) affected by NCD
hospitalized from January 2006 until August 2014 Proportion deceased = 34% (45/131)

Methods Logistic Regression (LR): Parametric method
Models the probability that the response variable (survive YES/NO) belongs to a particular category

Methods Gradient Boosting Machine (GBM) Non parametric method
the learning procedure consecutively fits new models (trees) to provide a more accurate estimate of the response variable

Methods

Methods Building the predictive model: Split the data in
Training dataset (75% , n=92, incidence = 34%) Testing dataset (25%, n=39, incidence = 36 %) Use training data to build the predictive model using a 5-fold cross-validation scheme

Methods Finally: Use the obtained model to predict the probability of the disease outcome (dead/survived) in the testing data

Statistics Accuracy (AC) : total number of predictions that were correct Sensitivity or true positive rate (TPR): Correctly predicted as survived Specificity or True Negative Rate (TNR): Correctly predicted as deceased

Results- Training Statistics LR GBM Accuracy 0.84 1.00 TPR 0.83 1.00
TNR 0.65 1.00 it How many times worked well How may really survived calves it predicted How many really deceased calves it predicted

Results - Testing Statistics LR GBM Accuracy 0.70 0.54 TPR 0.74 0.63
TNR 0.58 0.33 it How many times worked well How may really survived calves it predicted How many really deceased calves it predicted

Results - Consideration
What happened ???? Overfitting

GBM did a fantastic job on training data but failed on testing data – why ??? Size of training data Reduce complexity (# trees)

Results – Testing reducing complexity (# trees)
Statistics GBM GBMn Accuracy 0.54 0.62 TPR 0.63 0.69 TNR 0.33 0.46 it How many times worked well How may really survived calves it predicted How many really deceased calves it predicted

Feature importance

Conclusions Learning from data !!!
It’s possible to build predictive model for NCD. Accuracy depending on training sample size & method Total protein and Anion Gap important features

Thanks

Predicting survivors of Neonatal calf diarrhea (NCD) using Logistic Regression or Gradient Boosting Stefano Biffani*, Cesare Lubiano1, Davide Pravettoni1.

Similar presentations

Presentation on theme: "Predicting survivors of Neonatal calf diarrhea (NCD) using Logistic Regression or Gradient Boosting Stefano Biffani*, Cesare Lubiano1, Davide Pravettoni1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Predicting survivors of Neonatal calf diarrhea (NCD) using Logistic Regression or Gradient Boosting Stefano Biffani*, Cesare Lubiano1, Davide Pravettoni1.

Similar presentations

Presentation on theme: "Predicting survivors of Neonatal calf diarrhea (NCD) using Logistic Regression or Gradient Boosting Stefano Biffani*, Cesare Lubiano1, Davide Pravettoni1."— Presentation transcript:

Similar presentations

About project

Feedback