Download presentation
Presentation is loading. Please wait.
Published byEmory Blake Modified over 5 years ago
1
A Systematic Empirical Evaluation of Machine Learning Algorithms on Parkinson Disease: (A Quantitative Approach) Engr. Muhammad Waqar Khan
2
OBJECTIVES The aim of this project is to Provide a systematic evaluation of machine learning algorithm for Parkinson’s Disease prediction. Enhancing the prediction level to diagnosis the disease at earlier stage. The purpose of this project is to compare and FIND BEST Candidate of Prediction algorithm for Parkinson disease to Detect on the basis of Patient’s Voice By using tele-monitoring device. This can save the time of Doctor in decision , increases accuracy and yield of diagnosis.
3
Goals A systematic evaluation of Different machine learning algorithms along with their variants to predict Parkinson Disease on early stage. The performance of each algorithm is measured as a function of average testing accuracy (%), root means square error (RMSE).
4
What Is Tele-Monitoring System?
a telemonitoring system tracks the motion of patients in their surroundings. The system focuses on providing interoperability and usability in order to ensure high acceptance. Patients wear inertia sensors and perform standardized motor tasks. Data are recorded, processed and data is rated based on the score. Medical treatment in patients suffering from Parkinson's disease is very difficult as dose-finding is mainly based on selective and subjective impressions by the physician.
5
Phases of the Project The techniques required for this project are divided into two main phases: the training phase the testing phase
6
Training Phase Voice Samples Data in ASCII CSV format
Feature Extraction Feature Vector Table Machine Learning (Multiple Algos) Numbers Numbers Numbers
7
Testing Phase Data in ASCII CSV format Voice Samples Feature Input
Features Feature Input Different Trained Algorithms Output
8
Research Work Datasets found:
UCI Machine learning repository Center for Machine Learning and Intelligent Systems
9
Selected Dataset Parkinson's Tele monitoring Data Set Download link:
10
About – Parkinson’s Tele-Monitoring Dataset
Parkinson’s Tele-Monitoring Dataset: This study investigates the BEST Machine Learning algorithm of Prediction methods for the early diagnosis of Parkinson disease with the help of Tele-Monitoring device. In order to achieve these Intel had collected a range of vocal measurements from 42 peoples of having Parkinson's disease. This Dataset is jointly created by the University of Oxford, Intel Corp and several medical institutes in the US. The list of the attribute and brief description is listed in Table 1. By default the dataset divided into two files one for training purpose containing rows and another for testing purpose containing 1010 rows, Each row corresponds to one all vocal records from individuals.
11
Parkinson’s Tele-Monitoring Dataset
Parkinson’s Tele-Monitoring DT Description: This dataset is composed of a range of biomedical voice measurements from 42 people with early-stage Parkinson's disease recruited to a six-month trial of a telemonitoring device for remote symptom progression monitoring. Columns in the table contain subject number, subject age, subject gender, time interval from baseline recruitment date, motor UPDRS, total UPDRS, and 16 biomedical voice measures. Each row corresponds to one of 5,875 voice recording from these individuals. The main aim of the data is to predict the motor and total UPDRS scores ('motor_UPDRS' and 'total_UPDRS') from the 16 voice measures. The data is in ASCII CSV format. The rows of the CSV file contain an instance corresponding to one voice recording. There are around 200 recordings per patient, the subject number of the patient is identified in the first column.
12
PD Dataset Description Cont.
13
What is PD (Parkinson disease) ?
PD (Parkinson disease) is: A disorder of the nervous system that gradually affects the central nervous system is called Parkinson Disease. Parkinson's Disease is a degenerative syndrome of the CNS (Central Nervous System) that mainly affects the motor system. The symptoms generally come on slowly over time. Early in the disease, the most obvious are shaking, rigidity, slowness of movement, and difficulty with walking.
14
Step1: Division of dataset
Total Records: 5853 Number of Records used for training: 4843 Number of Records used for testing: 1010 Training : Testing 80:20 4:1
15
Step 2: Feature Extraction
After Dividing the Dataset of 42 peoples into training and testing folders, we will run the matlab code on each file consisting of various parameters to predict on their basis.
16
Step 3: Feature Vector Table
The features from the previous step will be in the form of large table containing Multiple provided parameter all in one file. The table containing all the features of the Voices That will be used to train the Multiple prediction Algorithms TO find best one through a machine learning.
18
Step 4: Machine Learning Algorithms (Predictors)
Different Machine Learning Algorithms have been used for Prediction along with its multiple variants Support Vector Machine Regression Trees Linear Regression Ensembles of Trees Gaussian Processes Regression
19
Support Vector Machine (SVM)
In supervised machine learning, The SVM is a discriminative classifier and a predictor described by a separating hyper plane, the labelled input training data is feed to the model. subsequently, the algorithms will generate an optimum hyperplane to classifies test data SVM can use for both linear and non-linear classification by using a trick (kernel). Certainly mapping their responses to high-dimensional component spaces. Some SVM variations are Linear SVM, Quadratic SVM, Cubic SVM, Fine Gaussian SVM, Medium Gaussian, and Coarse Gaussian.
20
Regression Trees (RT) In supervised machine learning, Regression Trees is used for forecasting and prediction. In Regression Trees the control flow moves from a single observation in a branch to conclusions about the target value of the article, which illustrate in the leaves. Regression Trees is one of another approach as used in Machine learning, Statistics and Data mining for the predictive model. In case of continuous value for the target variable which is a real number usually, are termed as Regression Trees.
21
Gaussian Processes Regression
A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution. A Gaussian process is completely specified by its mean function and its covariance function.
22
Linear Regression (LR)
Linear Regression is one of the basic algorithms for supervised Machine learning, it Is the relationship function of two variables is modelled using linear equation. In this relationship, variable 1 is dependent on variable 2 while the second variable is independent and explanatory in nature. Relationship between variable should be determined first, not in every case one variable is dependent on another variable (for example, higher grades in intermediate do not cause higher grades in university level), Ideal LR find the straight line on a graph through the point (best fitted) its tool which use to plot graph is scatterplot through which it is easy to analysis the of relationship strength between those variable.
23
Ensembles of Trees (ET)
In Machine Learning ML, Ensembles of regression tree is another supervised learning and prediction based model consist of a weighted combination of multiple regression trees. For better understanding, we can say in ensembles it uses to combine multiple weak regression trees to elevate its predictive strength and performance. EOT (ensembles of Tree) worked by generating multiple regression model with more diversification. By combining different models its impact on reduction variance in the strong model. We have currently two different types of ensembles for predictions Bagging and boosting. Bagging tree helps to reduce or avoid overfitting by reducing variance. It typically used in an application of decision trees models. It is also an example to averaging model approach . Used of Boosting in machine learning is to decrease the value of variance this algorithm worked on the transformation of weak learners by giving strength to become a stronger one.
24
Tools Used for Software Implementation
25
PROJECT PHASES
26
Performance Parameters
A major factor in performance parameters are: Average Tasting Accuracy Root Mean Square Error (RMSE) R-Squared Mean Square Error (MSE) Mean Absolute Error (MAE) Prediction Speed Training Time
27
Train different Algorithms for Prediction of PD.
After the completion of Feature Vector Table. Dataset is trained by a different machine learning algorithms known as Supervised learning Prediction algorithm to detect the Parkinson Disease Through Tele-monitoring.
28
Project approach For prediction we use multiple algorithm to find out the best and optimized algorithm. After all processing (testing and training) we found the Gaussian Processes(matern 5/2 gpr) algorithm best suites for this.
29
Matern 5/2 Results After Training different algorithms. We found Matern 5/2 GPR as a best candidate for PD’s detection. The Dataset was trained for different values of instances in order to train the Data with the possibility of achieving most accurate results and minimum error. Average testing accuracy: 96.23% Root mean square error (RMSE): 0.032 R-Squared: 0.88 mean square error (MSE): 0.001 mean Absolut error (MAE): 0.02 Prediction speed: ~5100 Training time: Machine cycle
30
TESTING GRAPH
31
Comparison Results Of All Applied ML Algorithms
Supervised machine learning algorithms have been trained and tested and their results compiled on the basis of earlier stated performance Parameters. Details comparison table of All Machine Learning algorithms Comparison graph of different prediction algorithms Individual Testing Graphs
32
Comparison table for Different Machine learning algorithms
33
COMPARISON GRAPH FOR DIFFERENT PREDICTION ALGORITHMS
34
Testing Curves
39
Machine Clock Graph The Graph shows Actual Time IN which the tuning is performed.
40
Conclusion Artificial Intelligence is a field that is growing at an alarming rate in today’s world. This field can be very helpful in many aspects of living and can be used to solve many modern day problems. The empirical comparison for this project was chosen by keeping in mind the advancements of today’s fast moving world. Everything is created to save time and cost. This project is related to medical science to show that IT is becoming involved in many fields of life. This project can not only save cost of a neurologist but also time and accuracy of detecting a severe disease which can be difficult to detect using the conventional detecting techniques and so can save precious lives.
41
THANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.