W. Art Chaovalitwongse Industrial & Systems Engineering

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
AIME03, Oct 21, 2003 Classification of Ovarian Tumors Using Bayesian Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens.
COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
Design of Experiments Lecture I
Introduction to Neural Networks Computing
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
 Introduction and motivation  Comparitive investigation: Predictive performance of measures of synchronization  Statistical validation of seizure predictions:
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Northeast Regional Epilepsy Group Christos Lambrakis M.D. 1.
Discriminative and generative methods for bags of features
Standard electrode arrays for recording EEG are placed on the surface of the brain. Detection of High Frequency Oscillations Using Support Vector Machines:
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Seizure prediction by non- linear time series analysis of brain electrical activity Ilana Podlipsky.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Neural Networks (NN) Ahmad Rawashdieh Sa’ad Haddad.
Chapter 5 Data mining : A Closer Look.
Model Assessment and Selection Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Evaluating Classifiers
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Efficient Model Selection for Support Vector Machines
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Exploration of Instantaneous Amplitude and Frequency Features for Epileptic Seizure Prediction Ning Wang and Michael R. Lyu Dept. of Computer Science and.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
1 E. Fatemizadeh Statistical Pattern Recognition.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Classification Ensemble Methods 1
Computational Intelligence: Methods and Applications Lecture 33 Decision Tables & Information Theory Włodzisław Duch Dept. of Informatics, UMK Google:
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Feasibility of Using Machine Learning Algorithms to Determine Future Price Points of Stocks By: Alexander Dumont.
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Support Feature Machine for Classification of Abnormal Brain Activity
CS 9633 Machine Learning Support Vector Machines
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
An Artificial Intelligence Approach to Precision Oncology
School of Computer Science & Engineering
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Supervised Time Series Pattern Discovery through Local Importance
Epileptic Seizure Prediction
An Inteligent System to Diabetes Prediction
Northeast Regional Epilepsy Group Christos Lambrakis M.D.
Multivariate Methods Berlin Chen
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Medical Diagnosis Decision-Support System: Optimizing Pattern Recognition of Medical Data W. Art Chaovalitwongse Industrial & Systems Engineering Rutgers University Center for Discrete Mathematics & Theoretical Computer Science (DIMACS) Center for Advanced Infrastructure & Transportation (CAIT) Center for Supply Chain Management, Rutgers Business School This work is supported in part by research grants from NSF CAREER CCF-0546574, and Rutgers Computing Coordination Council (CCC).

Outline Introduction Pattern-Based Classification Framework Classification: Model-Based versus Pattern-Based Medical Diagnosis Pattern-Based Classification Framework Application in Epilepsy Seizure (Event) Prediction Identify epilepsy and non-epilepsy patients Application in Other Diagnosis Data Conclusion and Envisioned Outcome Here is the outline of this talk. The focus of this talk will be on epilepsy and brain disorders First I will try to convince the audience why this problem is important and those patients need our help Then I will identify the research goals, then I’ll talk about how to acquire and process the data from the brain – specifically try to predict seizures The second research challenge is how to use optimization and data mining techniques to recognize/or classify normal and abnormal brain data – this framework can be applied to other medical data or data in other real life problems.

Pattern Recognition: Classification Supervised learning: A class (category) label for each pattern in the training set is provided. Positive Class ? Negative Class

Model-Based Classification Linear Discriminant Function Support Vector Machines Neural Networks

Support Vector Machine A and B are data matrices of normal and pre-seizure, respectively e is the vector of ones  is a vector of real numbers  is a scalar u, v are the misclassification errors m = number of samples for class 1 n = number of samples for class 2 Bradley, Fung and Mangasarian revamped this idea – using this robust optimization model – it is very fast and scalable Mangasarian, Operations Research (1965); Bradley et al., INFORMS J. of Computing (1999)

Pattern-Based Classification: Nearest Neighbor Classifiers Basic idea: If it walks like a duck, quacks like a duck, then it’s probably a duck Training Records Test Record Compute Distance Choose k of the “nearest” records

Traditional Nearest Neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x

Drawbacks Feature Selection Unbalanced Data Sensitive to noisy features Optimizing feature selection n features, 2n combinations  combinatorial optimization Unbalanced Data Biased toward the class (category) with larger samples Distance weighted nearest neighbors Pick the k nearest neighbors from each class (category) to the training sample and compare the average distances.

Multidimensional Time Series Classification in Medical Data Positive versus Negative Responsive versus Unresponsive Multidimensional Time Series Classification Multisensor medical signals (e.g., EEG, ECG, EMG) Multivariate is ideal but computationally impossible It is very common that physicians always use baseline data as a reference for diagnosis The use of baseline data - naturally lends itself to nearest neighbor classification For multidimensional time series, it is ideal to do multivariate analysis – but it is computationally impossible in our application In our work , we use univariate analysis – perform classification on each electrode at a time. Then we use the idea of ensemble classification to make the final decision. Normal ? Abnormal

Ensemble Classification for Multidimensional time series data Use each electrode as a base classifier Each base classifier makes its own decision Multiple decision makers - How to combine them? Voting the final decision Averaging the prediction score Suppose there are 25 base classifiers Each classifier has error rate,  = 0.35 Assume classifiers are independent Probability that the ensemble classifier makes a wrong prediction (voting): Most ensemble deal with how to sample the data Bagging, Bootstrapping Boosting, - here we use the idea of voting and averaging/or accumulating prediction score Here I give an example why we use ensemble classification

Modified K-Nearest Neighbor for MDTS Abnormal Normal K = 3 D(X,Y) Time series distances: (1) Euclidean, (2) T-Statistical, (3) Dynamic Time Warping

Dynamic Time Warping (DTW) The minimum-distance warp path is the optimal alignment of two time series, where the distance of a warp path W is: is the Euclidean distance of warp path W. is the distance between the two data point indices (from Li and Lj) in the kth element of the warp path. Dynamic Programming: The optimal warping distance is Exponential number of warp paths – we need to put some constraint on the warp path. Figure B) Is from Keogh and Pazzani, SDM (2001)

Optimizing Pattern Recognition

Support Feature Machine Given an unlabeled sample A, we calculate average statistical distances of A↔Normal and A↔Abnormal samples in baseline (training) dataset per electrode (channel). Statistical distances: Euclidean, T-statistics, Dynamic Time Warping Combining all electrodes, A will be classified to the group (normal or abnormal) that yields the minimum average statistical distance; or the maximum number of votes Can we select/optimize the selection of a subset of electrodes that maximizes number of correctly classified samples

SFM: Averaging and Voting Two distances for each sample at each electrode are calculated: Intra-Class: Average distance from each sample to all other samples in the same class at Electrode j Inter-Class: Average distance from each sample to all other samples in different class at Electrode j Averaging: If for Sample i (on average of selected electrodes) Average intra-class distance over all electrodes Average inter-class distance over all electrodes < We claim that Sample i is correctly classified. Voting: If for Sample i at Electrode j (vote) Intra-class distance < Inter-class distance (good vote) Based on selected electrodes, if # of good votes > # of bad votes, then Sample i is correctly classified. Chaovalitwongse et al., KDD (2007) and Chaovalitwongse et al., Operations Research (forthcoming)

Distance Averaging: Training Sample i at Feature 1 ∙∙∙ Sample i at Feature 2 Sample i at Feature m Select a subset of features ( ) such that as many samples as possible. Industrial & Systems Engineering Rutgers University

Majority Voting: Training (Correct) if ; (Incorrect) otherwise. Negative Positive i Feature j i’ Industrial & Systems Engineering Rutgers University

SFM Optimization Model Intra-Class Inter-Class Chaovalitwongse et al., KDD (2007) and Chaovalitwongse et al., Operations Research (forthcoming)

Averaging SFM Maximize the number of correctly classified samples Logical constraints on intra-class and inter-class distances if a sample is correctly classified Must select at least one electrode Chaovalitwongse et al., KDD (2007) and Chaovalitwongse et al., Operations Research (forthcoming)

Voting SFM Precision matrix, A contains elements of Maximize the number of correctly classified samples Logical constraints: Must win the voting if a sample is correctly classified Must select at least one electrode Precision matrix, A contains elements of Chaovalitwongse et al., KDD (2007) and Chaovalitwongse et al., Operations Research (forthcoming)

Support Feature Machine

Support Vector Machine Feature 3 Pre-Seizure Feature 2 Normal Feature 1

Application in Epilepsy Diagnosis

Facts about Epilepsy About 3 million Americans and other 60 million people worldwide (about 1% of population) suffer from Epilepsy. Epilepsy is the second most common brain disorder (after stroke), which causes recurrent seizures (not vice versa). Seizures usually occur spontaneously, in the absence of external triggers. Epileptic seizures occur when a massive group of neurons in the cerebral cortex suddenly begin to discharge in a highly organized rhythmic pattern. Seizures cause temporary disturbances of brain functions such as motor control, responsiveness and recall which typically last from seconds to a few minutes. Based on 1995 estimates, epilepsy imposes an annual economic burden of $12.5 billion* in the U.S. in associated health care costs and losses in employment, wages, and productivity. Cost per patient ranged from $4,272 for persons** with remission after initial diagnosis and treatment to $138,602 for persons** with intractable and frequent seizures. Today about 3 million americans and other 60 million people worldwide have epilepsy. Epilepsy is the second most common brain disorder after stroke. It causes recurrent seizures, which appear to occur spontaneously and randomly. What happens when someone has a seizure – in his or her brain, there is a massive group of neurons hypersynchronized in a highly organized rhythmic patterns – which lasts about 20 seconds to a few mins This brain disease causes our country so much money – in 1995 estimate, it imposes an economic burden of $12.5 billions – no just healthcare cost - including job loss, productivity Per patient, the healthcare cost ranged from 4k to almost 140k per year – and these numbers are more than 10 years ago. By now I hope I’ve convinced the audience that we should do something about this disease – next I will discuss standard diagnosis, treatment, (acquired data) and how we can help these patients. *Begley et al., Epilepsia (2000); **Begley et al., Epilepsia (1994).

Simplified EEG System and Intracranial Electrode Montage Electroencephalogram (EEG) is a traditional tool for evaluating the physiological state of the brain by measuring voltage potentials produced by brain cells while communicating

Scalp EEG Acquisition 18 Bipolar Channels

Goals: How can we help? Seizure Prediction Recognizing (data-mining) abnormality patterns in EEG signals preceding seizures Normal versus Pre-Seizure Alert when pre-seizure samples are detected (online classification) e.g., statistical process control in production system, attack alerts from sensor data, stock market analysis EEG Classification: Routine EEG Check Quickly identify if the patients have epilepsy Epilepsy versus Non-Epilepsy Many causes of seizures: Convulsive or other seizure-like activity can be non-epileptic in origin, and observed in many other medical conditions. These non-epileptic seizures can be hard to differentiate and may lead to misdiagnosis. e.g., medical check-up, normal and abnormal samples Given multi-dimensional time series and a set of events/episodes (if you will). How can we predict the event Classification of medical data (normal and abnormal) for guiding the future diagnosis Feature selection -> initiating events – most differentiable

Normal versus Pre-Seizure

10-second EEGs: Seizure Evolution Normal Pre-Seizure Seizure Onset Post-Seizure Chaovalitwongse et al., Annals of Operations Research (2006)

Normal versus Pre-Seizure Data Set EEG Dataset Characteristics Patient ID Seizure types Duration of EEG(days) # of seizures 1 CP, SC 3.55 7 2 CP, GTC, SC 10.93 3 CP 8.85 22 4 ,SC 5.93 19 5 13.13 17 6 11.95 3.11 9 8 6.09 23 11.53 20 10 9.65 12 Total   84.71 153 CP: Complex Partial; SC subclinical; GTC: Generalized Tonic/Clonic

Sampling Procedure Randomly and uniformly sample 3 EEG epochs per seizure from each of normal and pre-seizure states. For example, Patient 1 has 7 seizures. There are 21 normal and 21 pre-seizure EEG epochs sampled. Use leave-one(seizure)-out cross validation to perform training and testing. Seizure Duration of EEG 30 minutes 8 hours Pre-seizure Normal

Information/Feature Extraction from EEG Signals Measure the brain dynamics from EEG signals Apply dynamical measures (based on chaos theory) to non-overlapping EEG epochs of 10.24 seconds = 2048 points. Maximum Short-Term Lyapunov Exponent measure the stability/chaoticity of EEG signals measure the average uncertainty along the local eigenvectors and phase differences of an attractor in the phase space Pardalos, Chaovalitwongse, et al., Math Programming (2004)

Evaluation Sensitivity = TP/(TP+FN) Specificity = TN/(TN+FP) Sensitivity measures the fraction of positive cases that are classified as positive. Specificity measures the fraction of negative cases classified as negative. Sensitivity = TP/(TP+FN) Specificity = TN/(TN+FP) Type I error = 1-Specificity Type II error = 1-Sensitivity Chaovalitwongse et al., Epilepsy Research (2005)

Leave-One-Seizure-Out Cross Validation P1 N1 Training Set SFM Selected Electrodes N2 P2 1 2 3 4 5 6 7 . 2324 25 26 N3 P3 N4 P4 Testing Set N5 P5 N – EEGs from Normal State P – EEGs from Pre-Seizure State assume there are 5 seizures in the recordings

EEG Classification Support Vector Machine [Chaovalitwongse et al., Annals of OR (2006)] Project time series data in a high dimensional (feature) space Generate a hyperplane that separates two groups of data – minimizing the errors Ensemble K-Nearest Neighbor [Chaovalitwongse et al., IEEE SMC: Part A (2007)] Use each electrode as a base classifier Apply the NN rule using statistical time series distances and optimize the value of “k” in the training Voting and Averaging Support Feature Machine [Chaovalitwongse et al., SIGKDD (2007); Chaovalitwongse et al., Operations Research (forthcoming)] Apply the NN rule to the entire baseline data Optimize by selecting the best group of classifiers (electrodes/features) Voting: Optimizes the ensemble classification Averaging: Uses the concept of inter-class and intra-class distances (or prediction scores) First we implement a modified support vector machine, which is one of the most commonly used classification technique. The main idea is to

Performance Characteristics: Upper Bound NN -> Chaovalitwongse et al., Annals of Operations Research (2006) SFM -> Chaovalitwongse et al., SIGKDD (2007); Chaovalitwongse et al., Operations Research (forthcoming) KNN -> Chaovalitwongse et al., IEEE Trans Systems, Man, and Cybernetics: Part A (2007)

Separation of Normal and Pre-Seizure EEGs From 3 electrodes selected by SFM From 3 electrodes not selected by SFM

Performance Characteristics: Validation Overfitting the data Sample size CPU time SVM-> Chaovalitwongse et al., Annals of Operations Research (2006) SFM -> Chaovalitwongse et al., SIGKDD (2007); Chaovalitwongse et al., Operations Research (forthcoming) KNN -> Chaovalitwongse et al., IEEE Trans Systems, Man, and Cybernetics: Part A (2007) 39

Epilepsy versus Non-Epilepsy

Epilepsy versus Non-Epilepsy Data Set Routine EEG check: 25-30 minutes of recordings ~ with scalp electrodes Each sample is 5-minute EEG epoch (30 points of STLmax values). Each sample is in the form of 18 electrodes X 30 points

Leave-One-Patient-Out Cross Validation Training Set E1 N2 N3 N4 N5 E2 E3 E4 E5 SFM Selected Electrodes 1 2 3 4 5 6 7 . 2324 25 26 Testing Set N – Non-Epilepsy P – Epilepsy

Voting SFM: Validation

Averaging SFM: Validation

1 Fp1 – C3 16 T6 – Oz 17 Fz – Oz The issue is not just to get 100% classification – rather we focus more on why we get that kind of results and understand the data. For example, we look at the selected electrodes that help in distinguishing epilepsy and non-epilepsy patients. We found 3 electrodes that play a major role – when we went back to the neurologists and talked to him. He was very surprised to see. One would not expect to see that the selected electrodes would be involved in epilepsy mechanisms. Again it could be the scalp electrode – one focus on the left but electrodes on the right pick up first.

Other Medical Diagnosis

Other Medical Datasets Breast Cancer Features of Cell Nuclei (Radius, perimeter, smoothness, etc.) Malignant or Benign Tumors Diabetes Patient Records (Age, body mass index, blood pressure, etc.) Diabetic or Not Heart Disease General Patient Info, Symptoms (e.g., chest pain), Blood Tests Identify Presence of Heart Disease Liver Disorders Features of Blood Tests Detect the Presence of Liver Disorders from Excessive Alcohol Consumption

Performance Training Testing LP SVM NLP SVM V-SFM A-SFM WDBC 98.08 96.17 97.28 97.42 HD 85.06 84.66 86.48 86.92 PID 77.66 77.51 75.01 77.96 BLD 65.71 57.97 63.46 66.43 LP SVM NLP SVM V-NN A-NN V-SFM A-SFM 97.00 95.38 91.60 93.18 94.99 96.01 82.96 83.94 80.87 82.77 82.49 84.92 76.93 76.09 63.14 74.94 72.75 75.83 65.71 57.97 38.38 54.09 58.20 59.57

Average Number of Selected Features LP SVM NLP SVM V-SFM A-SFM WDBC 30 11.6 8.5 HD 13 7.4 8.7 PID 8 4.3 4.5 BLD 6 3.3 3.7

Medical Data Signal Processing Apparatus (MeDSPA) Quantitative analyses of medical data Neurophysiological data (e.g., EEG, fMRI) acquired during brain diagnosis Envisioned to be an automated decision-support system configured to accept input medical signal data (associated with a spatial position or feature) and provide measurement data to help physicians obtain a more confident diagnosis outcome. To improve the current medical diagnosis and prognosis by assisting the physicians recognizing (data-mining) abnormality patterns in medical data recommending the diagnosis outcome (e.g., normal or abnormal) identifying a graphical indication (or feature) of abnormality (localization) We envision the outcome of our research in medical diagnosis as a tool or apparatus to process medical data signal This is just my vision but we still have a long way to go. We have started off with neurophysiological signals like electroencephalograms or fMRI – Then use the tools developed over the course of my research as an automated decision support systems for physicians to help Recognize abnormal data or abnormal patterns in medical data Try to localize the source of abnormality Recommend the diagnosis outcome – rather improve the confidence in the diagnosis

Automated Abnormality Detection Paradigm Data Acquisition Multichannel Brain Activity Optimization: Feature Extraction/ Clustering Interface Technology Statistical Analysis: Pattern Recognition Nurse Stimulator Initiate a warning or a variety of therapies (e.g., electrical stimulation, drug injection) User/Patient Drug

Acknowledgement: Collaborators E. Micheli-Tzanakou, PhD L.D. Iasemidis, PhD R.C. Sachdeo, MD R.M. Lehman, MD B.Y. Wu, MD, PhD Students Y.J. Fan, MS Other undergrad students

Thank you for your attention! Questions?