Download presentation
Presentation is loading. Please wait.
1
Ordinal Classification of Heart Disease Severity
Joey Glasser David Cavender Vincent Li Zsofia Voros
2
Problem and its relevance to health:
For years, heart disease has been the leading cause of death in the United States. While many factors such as smoking and being overweight have been associated with heart disease, there are still plenty of nuances in the causes of heart disease. A stronger analysis of relationships between different attributes of a person's health and whether or not they have heart disease via creation of a predictive model will help to increase understanding of the causes of heart disease and, in turn, decrease the mortality rate.
3
The Data The Cleveland Clinic Foundation collected information on 14 health-related attributes of roughly 300 individuals. Some attributes include: Age Sex Smoking Frequency Cardiovascular stress from exercise One attribute represents if a given individual does or doesn't have heart disease. It takes the value of zero if the individual does not have heart disease, and otherwise is an integer value between one and four representing the severity of the heart disease.
4
The Model In this project we will use a logistic regression to predict the degree of heart disease in individuals using the explanatory variables such as sex, age, cholesterol levels, etc. The data we are analyzing is ordinal, meaning there is a ranking to the predicted variable: the degree of heart disease. Essentially, we will create four logistic models to predict if the degree of heart disease is greater than 0, greater than 1, greater than 2, and greater than 3. To predict the degree of heart disease for a given person, their data will be fed into all four models and the results will be combined to calculate the probability a person has degree of heart disease of 0, 1, 2, 3, and 4. Whatever probability is the highest is the one the overall model will predict.
5
How the model will be evaluated Anticipated challenges
The model will be evaluated by using the micro-averaged F1-score on a holdout test data set. We decided to use the micro-averaged F1-score since the class sizes are imbalanced. To compare how our method compares to a simple logistic regression model, we will build both and compare their scores on the test data set. Anticipated challenges One challenge that might arise is dependence between predictors. In the case of multicollinearity, we would analyze and remove the necessary amount of highly correlated predictors, or change our model from a logistic model to a classification model that handles data with multicollinearity more accurately such as a random forest model.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.