Download presentation
Presentation is loading. Please wait.
Published byOwen French Modified over 9 years ago
1
EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman 2 1 Department of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium, 2 Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium
2
EMBC2001 Overview Introduction Data Exploration Input Selection Model Building Model Evaluation Conclusions
3
EMBC2001 Introduction Problem ovarian masses: a common problem in gynecology. develop a reliable diagnostic tool to discriminate preoperatively between benign and malignant tumors. assist clinicians in choosing the appropriate treatment. Data Patient data collected at Univ. Hospitals Leuven, Belgium, 1994~1999 425 records, 25 features. 291 benign tumors, 134 (32%) malignant tumors.
4
EMBC2001 Introduction Methods Data exploration: Data preprocessing, univariate analysis, PCA, factor analysis, discriminant analysis, logistic regression… Modeling: Logistic regression (LR) models Artificial neural networks (ANN): MLP, RBF Performance measures: Receiver operating characteristic (ROC) analysis ROC curves constructed by plotting the sensitivity versus the 1- specificity, or false positive rate, for varying probability cutoff level. visualization of the relationship between sensitivity and specificity of a test. Area under the ROC curves (AUC) measures the probability of the classifier to correctly classify events and nonevents.
5
EMBC2001 Data exploration Univariate analysis: preprocessing: descriptive statistics, histograms… Demographic, serum marker, color Doppler imaging and morphologic variables
6
EMBC2001 Data exploration Multivariate analysis: factor analysis biplots Fig. Biplot of Ovarian Tumor data. The observations are plotted as points (0=benign, 1=malignant), the variables are plotted as vectors from the origin. - visualization of the correlation between the variables - visualization of the relations between the variables and clusters.
7
EMBC2001 Input Selection Stepwise logistic regression analysis Searching in the feature space fix several of the most significant variables, then vary combinations with the other predictive variables. different logistic regression models with different subsets of input variables were built and validated. subsets of variables were selected according to their predictive performance on the training set and test set.
8
EMBC2001 Model building Logistic regression (LR) model Artificial neural networks feed-forward neural networks, universal approximators: - multi-layer perceptron (MLP) - generalized regression network (GRNN) generalization capacity: central issue during network design and training.
9
EMBC2001 Model building - LR Parameter estimation: - maximum likelihood - iterative procedure Fig. Architecture of LRs for Predicting Malignancy of Ovarian Tumors structure: LR1: 8-1 LR2: 7-1
10
EMBC2001 Training Bayesian regularization combined with Levenberg- Marquardt optimization. Model Building - ANN - MLP Fig. Architecture of MLPs for Predicting Malignancy of Ovarian Tumors structure MLP1: 8-3-1 MLP2: 7-3-1
11
EMBC2001 Model Building – ANN - GRNN Fig. Architecture of GRNNs for Predicting Malignancy of Ovarian Tumors Training : GRNN is another term for Nadaraya-Watson kernel regression. No iterative training; the widths of RBF units h act as smoothing parameters, chosen by cross- validation. structure GRN1: 8-N-1 GRN2: 7-N-1
12
EMBC2001 RMI: risk of malignancy index = score morph × score meno × CA125 Training set : data from the first treated 265 patients Test set : data from the latest treated 160 patients Model Evaluation - Holdout CV AUC estimates and standard errors from hold out CV
13
EMBC2001 stratified 7-fold CV for each run of 7- fold CV: mAUC : ( i AUC i )/7, i =1,…7, AUC i is the AUC on the ith validation set expected ROC: Averaging. Repeat 7-fold CV 30 times with different partitions => better statistical estimate Model Evaluation - K-fold CV Box plot of meanAUC from 7-fold CVExpected ROC curves from k-fold CV
14
EMBC2001 Multiple comparison of mAUCs: one-way ANOVA followed by Tukey multiple comparison. Rank ordered significant subgroups from multiple comparison on mean AUC Note: The subsets of adjacent means that are not significantly different at 95% confidence level are indicated by drawing a line under the subsets. Model Evaluation - K-fold CV
15
EMBC2001Conclusions Summary AUC is the advocated performance measure Data exploratory analysis helps to analyze the data set. MLPs have the potential to give more reliable prediction. Future work Develop models with kernel methods, e.g. LS-SVM ANNs are blackbox models. A hybrid methodology, greybox models might be more promising
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.