Prediction of Sickle Cell Anemia Patient’s Response to Hydroxyurea Treatment Using ARTMAP Network Hongyu Xu, Faramarz Valafar, Marko Vuskovic Department of Computer Science San Diego State University San Diego, CA METMBS 2003 Las Vegas, June 24, 2003 The full paper and these slides are available at: Control/Neuromuscular.htm
2 Contents Sickle cell anemia Data and data preprocessing Linear dependency of features Feature selection Data labeling MART clustering algorithm MART classification algorithm Results Conclusion
3 Sickle Cell Anemia Sickle cell anima is a genetic disorder, caused by single point mutation in the beta globin gene that changes from CCTGAGG to CCTGTGG. The molecules of sickle cell hemoglobin adhere to each other and distort red blood cells (RBC) into sickle shape. They stick in narrow blood vessels, blocking the flow of blood. Sickle cell patients experience severe painful crises. Many sickle cell patients die before the age of 20. In the United States, about 1 in 500 African Americans develops sickle cell anima [5]. In Africa, about 1 in 100 individuals develop the disease. In 1983, a drug called hydroxyurea (HU) was first used on sickle cell patients. The patients who responded to HU treatment positively experienced less pain and their life span were prolonged, but HU can also be quite toxic.
4 Patient Features Note: The data used in this research is obtained from University of Georgia, Structural Genomics Group. Dr. Homayoun Valafar was responsible for the data collection and preprocessing.
5 Excerpt from patient’s data 1.0e+004 *
6 Data Preprocessing Normalization Log transformation Treatment of incomplete features
7 Patient Data (after log transform)
Linear Dependency of Features
Linear Dependency of Features (Cont.)
10 SBAN SBEN SCAM SSEN
11 Before removal: After removal:
12 Feature Selection
13 Feature Selection (Cont.)
14 Data Labeling
15 Data Labeling (Cont.)
16 Representation of Patient’s Data in Reduced Feature Space Double rule
17 Representation of Patient’s Data in Reduced Feature Space (Cont.) 15% rule
18 Approaches in Pattern Recognition Bayes ’ Classifier Neural networks Single layer Perceptrons Probability density estimation Parzen window Multilayer Perceptrons K-nearest neighbor Mixture model Feed forward ART Recurrent Basis functions Maximum liklihood Bayesian inference Pattern recognition Pattern recognition MART K-mean SOM Mixture model Radial Basis Function
19 ART Networks Grossberg, 1976 Unsupervised ART Learning Fuzzy ART Carpenter, Grossberg, etal,1991 ARTMAP Carpenter, Grossberg, etal,1991 Fuzzy ARTMAP Carpenter, Grossberg, etal,1991 Gaussian ARTMAP Williamson,1992 ART1, ART2 Carpenter & Grossberg, 1987 Supervised ART Learning Simplified ART Baraldi and Alpaydin, 1998 Simplified ARTMAP Kasuba, 1993 Mahalanobis distance based ARTMAP Vuskovic & Du, 2001 Vuskovic, Xu & Du, 2002
20 MART clustering Algorithm
21 MART clustering Algorithm (Cont.)
22 MART Functions
23 MART Classification Algorithm The trained network is a Gaussian mixture model. Each class maps to one or more clusters. The class probability is proportional to the sum of posterior probabilities of individual clusters of the same class. The prediction is class that yields the maximum class probability. Class conditional pdf of x given cluster j Prior probability of cluster j Posterior probability
24 Results
25 Conclusion MART has shown superior performance in various benchmarks, which has inspired us to apply MART to sickle cell anemia patients data. MART achieved 96.82% accuracy for predicting responders to HU treatment and give 92.59% global accuracy. Removal of linear dependency of features has improved the numerical stability of the algorithms. Reduction of the feature space from 23 to only 3 features has considerably improved the performance (decreased the numerical complexity and even increased the accuracy) In the future we plan to explore other labeling methods. We also plan to investigate more data preprocessing methods, which include both linear and nonlinear transformations.