Download presentation
Presentation is loading. Please wait.
Published byDennis Whitehead Modified over 8 years ago
1
An Effective Hybridized Classifier for Breast Cancer Diagnosis DISHANT MITTAL, DEV GAURAV & SANJIBAN SEKHAR ROY VIT University, India
2
Research Topic Classify the cancer tumors as benign and malignant. Using a hybrid algorithm to accomplish the task. Boosting the effectiveness of the classifier Using a unique combination generated by interfacing a learning algorithm known as Stochastic Gradient Descent with an artificial neural network called Self Organizing Maps. Delivering results comparable to state of the art machine learning techniques. Verification using a vast dataset and 10-fold cross validation.
3
Why use learning algorithms? There are many processes for diagnosing the presence of breast cancer like medical history and physical exam, imaging tests Mammograms breast ultrasound and MRI. Common thing between these - manually checking of the parameters and diagnosing cancer. Using learning algorithm, this manual latency can be avoided, there by increasing efficiency and decreasing cost.
4
Why use learning algorithms? Procedure which conducts a structured study can deliver accurate decisions. Automation of diagnostic system is needed to enhance reliability. More lives can be saved if we rely on technological advancements.
5
Is hybridizing necessary? A unique combination can lead to drastic performance improvement. Can lead to optimized working efficiency of individual classifiers. Commanding way to break down intricate classification complications.
6
Literature Canopy pine plantation, ISODATA, Maximum Likelihood, 2 per cent increase (Donald et al.) Naïve Bayes, Sequential Minimum Optimization, K-Nearest Neighbors, Permutation (Gouda et al.) Irregular Pattern, Infant growth, Threat Features, SOM (Cenk Budayan et al.). Water, Soil, Sediment, Petrochemical, SOM, Fuzzy C-means (Richard Olawoyin)
7
Why SOM? Better class identification. More robust Ability to find patterns in complex datasets. Ability to extract non-linear relationships between input vectors Relative information among vectors is conserved.
8
Why SGD? Linear complexity. Easy hybridization. Since SGD doesn’t perform well with huge datasets, hence by reducing dimensions significantly and still conserving information in terms of distances, the performance is significantly boosted.
9
Dataset Description Breast Cancer Dataset : 699 vectors and 9 attributes. AttributesRange Clump thickness1-10 Uniformity of cell size1-10 Uniformity of cell shape1-10 Marginal adhesion1-10 Single epithelial cell size1-10 Bare nuclei1-10 Bland chromatin1-10 Normal nucleoli1-10 Mitosis1-10
10
Dataset Description Internet Advertisement Dataset : 3279 vectors and 1558 attributes. AttributesRange HeightContinuous WidthContinuous Aspect ratioContinuous 457 features representing URL terms0,1 495 features representing origURL terms 0,1 472 features representing ancURL terms 0,1 111 features representing alt terms0,1 19 features from caption terms0,1
11
Process Breakdown Module 1 Read input vectors SOM implementation Generate of intermediate layer Module 2 Read intermediate layer SGD implementation Generate output classes for each instance
12
Module 1
13
Module 1 Output:
14
Module 2
15
Mathematical formulations SOM Distance calculation Weight updating
16
Mathematical formulations SGD Classification function Cost Function
17
Evaluation metrics Confusion Matrix Predicted Class Class=1Class=0 Actual Class Class=1PQ Class=0RS P and S - samples for which actual and predicted classes are same. Q and R - samples for which actual and predicted classes are different. Accuracy= (P+S)/(P+Q+R+S) 10-fold cross validation.
18
Results Breast cancer train set Breast cancer test set
19
Results Internet advertisements train set Internet advertisements test set
20
Results – Breast Cancer Dataset
21
Results – Internet Advertisements Dataset
22
Conclusion This fusion of supervised and unsupervised learning technique significantly boosted the effectiveness of the model. The technique was able to produce results comparable to existing state of the art machine learning techniques. Proposed model can be given importance for its application on large numeric data and can be utilized in real time classification algorithms
23
Future Scope Introduce uncertainty in the algorithm to comment upon the stage to which the tumor belongs Incorporate rough and fuzzy set factors Example: StageBenignMalignant Stage 00.80.2 Stage 10.70.3 Stage 20.60.4 Stage 3a0.5 Stage 3b0.450.55 Stage 3c0.400.6 Stage 40.30.7
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.