Download presentation
Presentation is loading. Please wait.
Published byLora Caldwell Modified over 9 years ago
1
Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, 2001- Rutgers University Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg Second Annual Review June 1, 2001 Data Mining Institute University of Wisconsin - Madison
2
American Cancer Society Year 2001 Breast Cancer Estimates Breast cancer, the most common cancer among women, is the second leading cause of cancer deaths in women (after lung cancer) 192,200 new cases of breast cancer in women will be diagnosed in the United States 40,600 deaths will occur from breast cancer (40,200 among women, 400 among men) in the United States According to the World Health Organization, more than 1.2 million people will be diagnosed with breast cancer this year worldwide
3
Key Objective Identify breast cancer patients for whom adjuvant chemotherapy prolongs survival time Main Difficulty: Cannot carry out comparative tests on human subjects Similar patients must be treated similarly Our Approach: Classify patients into: Good, Intermediate & Poor groups Classification based on: 5 cytological features plus Tumor size Classification criteria: Tumor size & Lymph node status
4
Principal Results For 253 Breast Cancer Patients All 69 patients in the Good group: Had the best survival rate Had no chemotherapy All 73 patients in the Poor group: Had the worst survival rate Had chemotherapy For the 121 patients in the Intermediate group: The 67 patients who had chemotherapy had better survival rate than: The 44 patients who did not have chemotherapy Last result reverses role of chemotherapy for both the overall population as well as the Good & Poor groups
5
Outline Tools used Support vector machines (SVMs). Feature selection Classification Clustering k-Median (k-Mean fails!) Cluster chemo patients into chemo-good & chemo-poor Cluster no-chemo patients into no-chemo-good & no-chemo-poor Three final classes Good = No-chemo good Poor = Chemo poor Intermediate = Remaining patients Generate survival curves for three classes Use SVM to classify new patients into one of above three classes
6
Support Vector Machines Used in this Work 6 out of 31 features selected: Feature selection: SVM with 1-norm approach, s. t. min,, denotes Lymph node > 0 or where Lymph node =0 Classification: Use SSVMs with Gaussian kernel 5 out 30 cytological features describe nuclear size, shape and texture Tumor size
7
Clustering in Data Mining General Objective Given: A dataset of m points in n-dimensional real space Problem: Extract hidden distinct properties by clustering the dataset
8
Concave Minimization Formulation of Clustering Problem, and a number Given: Set of m points in represented by the matrix of desired clusters Problem: Determine centers,insuch that the sum of the minima over of the 1-norm distance between each point,, and cluster centers,is minimized Objective: Sum of m minima of linear functions, hence it is piecewise-linear concave Difficulty: Minimizing a general piecewise-linear concave function over a polyhedral set is NP-hard
9
Clustering via Concave Minimization Reformulation: min s.t. min s.t. Minimize the sum of 1-norm distances between each data point: and the closest cluster center
10
Finite K-Median Clustering Algorithm (Minimizing Piecewise-linear Concave Function) Step 0 (Initialization): Given k initial cluster centers Different initial centers will lead to different clusters Step 1 (Cluster Assignment): Assign points to the cluster with the nearest cluster center in 1-norm Step 2 (Center Update) Recompute location of center for each cluster as the cluster median (closest point to all cluster points in 1-norm) Step3 (Stopping Criterion) Stop if the cluster centers are unchanged, else go to Step 1
11
Clustering Process: Feature Selection & Initial Cluster Centers 6 out of 31 features selected by a linear SVM ( ) SVM separating lymph node positive (Lymph > 0) from lymph node negative (Lymph = 0) Perform k-Median algorithm in 6-dimensional feature space Initial cluster centers used: Medians of Good1 & Poor1 Good1: Patients with Lymph = 0 AND Tumor < 2 Poor1: Patients with Lymph > 4 OR Tumor Typical indicator for chemotherapy
12
Clustering Process 253 Patients (113 NoChemo, 140 Chemo) Cluster 113 NoChemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 69 NoChemo Good 44 NoChemo Poor 67 Chemo Good 73 Chemo Poor Good Poor Intermediate Cluster 140 Chemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 Good1: Lymph=0 AND Tumor<2 Compute Median Using 6 Features Poor1: Lymph>=5 OR Tumor>=4 Compute Median Using 6 Features Compute Initial Cluster Centers
13
Survival Curves for Good, Intermediate & Poor Groups
14
Survival Curves for Intermediate Group: Split by Chemo & NoChemo
15
Survival Curves for All Patients Split by Chemo & NoChemo
16
Survival Curves for Intermediate Group Split by Lymph Node & Chemotherapy
17
Survival Curves for All Patients Split by Lymph Node Positive & Negative
18
Nonlinear SVM Classifier 82.7% Tenfold Test Correctness Good2: Good & ChemoGood Poor2: NoChemoPoor & Poor Compute LI(x) & CI(x) Compute LI(x) & CI(x) SVM Good Intermediate Good Poor Intermediate (ChemoGood) Intermediate (NoChemoPoor) Four groups from the clustering result: SVM Poor Intermediate SVM
19
Conclusion Used five features from a fine needle aspirate & tumor size to cluster breast cancer patients into 3 groups: Good – No chemotherapy recommended Intermediate – Chemotherapy likely to prolong survival Poor – Chemotherapy may or may not enhance survival 3 groups have very distinct survival curves First categorization of a breast cancer group for which chemotherapy enhances longevity SVM- based procedure for classifying new patients into one of above three survival groups
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.