Diagnosis of Ovarian Cancer Based on Mass Spectra of Blood Samples Hong Tang Yelena Mukomel Eugene Fink
Motivation Early detection of cancer by analysis of blood samples. Fast inexpensive test Little discomfort
Outline Mass-spectrum curves Feature extraction Experimental results Conclusions
Mass spectrum – –4 5,000 10,00015,000 20,000 0 ratio of molecular weight to net electric charge signal intensity The curve of a cancer patient usually differs from that of a healthy person.
Patient data Data set Number of cases CancerHealthy Mass-spectrum curves of 685 people Every curve consists of 15,155 points
Outline Mass-spectrum curves Feature extraction Experimental results Conclusions
Candidate features – –4 5,000 10,00015,000 20,000 0 ratio of molecular weight to net electric charge signal intensity Every point of the mass-spectrum curve is a candidate feature Its relevance depends on the mean difference between values for cancer patients and healthy people
Feature relevance hh cc standard deviations hh cc means cancer healthy signal intensity candidate feature Mean difference: | c – h | Standard deviation of the difference: ( c 2 + h 2 ) 0.5 Relevance measure: | c – h | ( c 2 + h 2 ) 0.5
Minimal distance Impose a lower bound on the distance between feature points, which prevents the selection of correlated features After selecting a feature point, discard all points within this distance bound – –4 signal intensity feature min distance discard
Feature selection Repeat for a given number of features: Select the most relevant feature point Discard all points within the minimal distance from the selected feature – –4 signal intensity
Outline Mass-spectrum curves Feature extraction Experimental results Conclusions
Number of feature points: 1 to 64 Control variables Min distance between features: 1 to 1024 Data mining techniques: – Decision trees (C4.5) – Support vector machines ( SVMF u) – Neural networks (Cascor 1.2)
Sensitivity: Probability of the correct diagnosis for a cancer patient Measurements Specificity: Probability of the correct diagnosis for a healthy person
Results Num. of features Min. dist. Sensi- tivity Speci- ficity Set 1DT SVM NN % 82% 80% 78% 84% 84% Set 2DT SVM NN % 96% 93% 96% 93% 98% Set 3DT SVM NN % 100% 100% 100% 99% 99%
Summary Performance range Sensitivity: 80%–100% Specificity: 78%–100%
Summary SensitivitySpecificity Set 1 80%–86% 78%–84% Set 2 92–96% 93%–96% Set 3 98%–100%99%–100% Optimal parameters Number of feature points: 4–32 Min distances between features: 1–256 Data mining technique: Any Performance range
Outline Mass-spectrum curves Feature extraction Experimental results Conclusions
We have developed a technique for the detection of ovarian cancer based on the analysis of blood mass spectra. The accuracy of this technique is still low, and results vary across data sets.
Future work Use more patient data Consider other features of mass-spectrum curves Apply to other cancers