Download presentation
Presentation is loading. Please wait.
Published byHilary Gardner Modified over 8 years ago
1
Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational and Applied Mathematics Seminar April 19, 2005
2
Breast Cancer Estimates American Cancer Society & World Health Organization Breast cancer is the most common cancer among women in the US. 212,930 new cases of breast cancer are estimated by the ACS to occur in the US in 2005: 211,240 in women and 1,690 in men. 40,870 deaths are estimated to occur from breast cancer in the US in 2005: 40,410 among women and 460 among men. WHO estimates: More than 1.2 million people worldwide were diagnosed with breast cancer in 2001 and 0.5 million died from breast cancer in 2000.
3
Key Objective Identify breast cancer patients for whom chemotherapy prolongs survival time Main Difficulty: Cannot carry out comparative tests on human subjects Similar patients must be treated similarly Our Approach: Classify patients into: Good, Intermediate & Poor groups such that: Good group does not need chemotherapy Intermediate group benefits from chemotherapy Poor group not likely to benefit from chemotherapy
4
Outline Tools used Support vector machines (Linear & Nonlinear SVMs) Feature selection & classification Clustering (k-Median algorithm not k-Means) Cluster into good & intermediate & poor classes Cluster no-chemo patients into 2 groups: good & poor Cluster chemo patients into 2 groups : good & poor Generate three final classes Good class (Good from no-chemo cluster group) Poor class (Poor from chemo cluster group) Intermediate class: Remaining patients (chemo & no-chemo) Generate survival curves for three classes Use SSVM to classify new patients into one of above three classes Data description
5
Cell Nuclei of a Fine Needle Aspirate
6
Thirty Cytological Features Collected at Diagnosis Time
7
Two Histological Features Collected at Surgery Time
8
Breast Cancer Diagnosis Based on 3 FNA Features 97% Ten-fold Cross Validation Corrrectnes 780 Patients: 494 Benign, 286 Maignant Research by Mangasarian,Street, Wolberg
9
1- Norm Support Vector Machines Maximize the Margin between Bounding Planes A+ A-
10
Support Vector Machine Algebra of 2-Category Linearly Separable Case Given m points in n dimensional space Represented by an m-by-n matrix A Membership of each in class +1 or –1 specified by: An m-by-m diagonal matrix D with +1 & -1 entries More succinctly: where e is a vector of ones. Separate by two bounding planes,
11
Feature Selection Using 1-Norm Linear SVM Classification Based on Lymph Node Status Features selected: 6 out of 31 by above SVM: Feature selection: 1-norm SVM: s. t. min,, denotes Lymph node > 0 or where Lymph node =0 5 out 30 cytological features that describe nuclear size, shape and texture from fine needle aspirate Tumor size from surgery
12
Features Selected by Support Vector Machine
13
Nonlinear SVM for Classifying New Patients Linear SVM: (Linear separating surface: ) (LP) min s.t. Replace by a nonlinear kernel : min s.t. in the “dual space”, gives: By QP duality:. Maximizing the margin min s.t.
14
The Nonlinear Classifier The nonlinear classifier: Where K is a nonlinear kernel, e.g.: Gaussian (Radial Basis) Kernel : The -entry of represents “similarity” between the data points and
15
Clustering in Data Mining General Objective Given: A dataset of m points in n-dimensional real space Problem: Extract hidden distinct properties by clustering the dataset into k clusters
16
Concave Minimization Formulation of 1-Norm Clustering Problem (k-Median), and a number Given: Set of m points in represented by the matrix of desired clusters Objective Function: Sum of m minima of linear functions, hence it is piecewise-linear concave Difficulty: Minimizing a general piecewise-linear concave function over a polyhedral set is NP-hard Find: Cluster centers that minimize the sum of 1-norm distances of each point: to its closest cluster center.
17
Clustering via Finite Concave Minimization Equivalent bilinear reformulation: min s.t. min s.t. Minimize the sum of 1-norm distances between each data point: and the closest cluster center
18
K-Median Clustering Algorithm Finite Termination at Local Solution Step 1 (Cluster Assignment): Assign points to the cluster with the nearest cluster center in 1-norm Step 2 (Center Update) Recompute location of center for each cluster as the cluster median (closest point to all cluster points in 1-norm) Step3 (Stopping Criterion) Stop if the cluster centers are unchanged, else go to Step 1 Step 0 (Initialization): Pick 2 initial cluster centers as medians of: (L=0 & T<2) & (L 5 or T 4)
19
Feature Selection & Initial Cluster Centers 6 out of 31 features selected by 1-norm SVM ( ) SVM separating lymph node positive (Lymph > 0) from lymph node negative (Lymph = 0) Apply k-Median algorithm in 6-dimensional input space Initial cluster centers used: Medians of Good1 & Poor1 Good1: Patients with Lymph = 0 AND Tumor < 2 Poor1: Patients with Lymph > 4 OR Tumor Typical indicator for chemotherapy
20
Overall Clustering Process 253 Patients (113 NoChemo, 140 Chemo) Cluster 113 NoChemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 69 NoChemo Good 44 NoChemo Poor 67 Chemo Good 73 Chemo Poor Good Poor Intermediate Cluster 140 Chemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 Good1: Lymph=0 AND Tumor<2 Compute Median Using 6 Features Poor1: Lymph>=5 OR Tumor>=4 Compute Median Using 6 Features Compute Initial Cluster Centers
21
Survival Curves for Good, Intermediate & Poor Groups (Nonlinear SSVM for New Patients)
22
Survival Curves for Intermediate Group: Split by Chemo & NoChemo
23
Survival Curves for Overall Patients: With & Without Chemotherapy
24
Survival Curves for Intermediate Group Split by Lymph Node & Chemotherapy
25
Survival Curves for Overall Patients Split by Lymph Node Positive & Negative
26
Conclusion Used five cytological features & tumor size to cluster breast cancer patients into 3 groups: Good – No chemotherapy recommended Intermediate – Chemotherapy likely to prolong survival Poor – Chemotherapy may or may not enhance survival 3 groups have very distinct survival curves First categorization of a breast cancer group for which chemotherapy enhances longevity SVM- based procedure assigns new patients into one of above three survival groups
27
Talk & Paper Available on Web www.cs.wisc.edu/~olvi Y.-J. Lee, O. L. Mangasarian & W. H. Wolberg: “ Computational Optimization and Applications” Volume 25, 2003, pages 151-166”
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.