Diagnosis of Ovarian Cancer Based on Mass Spectrum of Blood Samples Committee: Eugene Fink Lihua Li Dmitry B. Goldgof Hong Tang
Outline Introduction Previous work Feature selection Experiments
Motivation Early cancer detection is critical for successful treatment. Five year survival for ovarian cancer: Early stage: 90% Late stage: 35% 80% are diagnosed at a late stage.
Motivation Desired features of cancer detection: Early detection High accuracy Low cost
Mass spectrum We can detect some early-stage cancers by analyzing the blood mass spectrum. ratio of molecular weight to electrical charge intensity 20,000 05,000 10,00015, –4 10 –
Mass spectrum Data mining Results Blood
Outline Introduction Previous work Feature selection Experiments
Initial work Vlahou et al. (2001): Manual diagnosis of bladder cancer based on mass spectra Petricoin et al. (2002): Application of clustering to mass spectra for the ovarian- cancer diagnosis
Decision trees Adam et al. (2002): 96% accuracy for prostate cancer Qu et al. (2002): 98% accuracy for prostate cancer Later work Neural networks Poon et al. (2003): 91% accuracy for liver cancer Clustering Petricoin et al. (2002): 80% accuracy for prostate cancer
Outline Introduction Previous work Feature selection Experiments
Feature selection ratio of molecular weight to electrical charge intensity Cancer Healthy Statistical difference:
Feature selection ratio of molecular weight to electrical charge intensity Window size: minimal distance between selected points Cancer Healthy
Outline Introduction Previous work Feature selection Experiments
Data sets Data set Number of cases Cancer Healthy
Learning algorithms Decision trees (C4.5) Support vector machines ( SVMF u) Neural networks (Cascor 1.2)
Control variables Number of features, 1–64 Window size, 1–1024
Best control values Decision trees Data set Number of features Window size Accuracy % % %
Best control values Support vector machines Data set Number of features Window size Accuracy % % %
Best control values Neural networks Data set Number of features Window size Accuracy % % %
Learning curve Data set 1 accuracy (%) training size Decision trees, SVM, Neural networks
accuracy (%) Learning curve Data set 2 training size Decision trees, SVM, Neural networks
Learning curve Data set 3 accuracy (%) training size Decision trees, SVM, Neural networks 250
Main results Automated detection of ovarian cancer by analyzing the mass spectrum of the blood Experimental comparison of decision trees, SVM and neural networks Identification of the most informative points of the mass-spectrum curves
Future work Experiments with other data sets Other methods for feature selection Combining with genetic algorithm