Diagnosis of Ovarian Cancer Based on Mass Spectra of Blood Samples Hong Tang Yelena Mukomel Eugene Fink.

Slides:



Advertisements
Similar presentations
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Word Spotting DTW.
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
AOSC 634 Air Sampling and Analysis Lecture 1 Measurement Theory Performance Characteristics of instruments Nomenclature and static response Copyright Brock.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Proteomic Mass Spectrometry
1 Information-Theoretic Mass Spectral Library Search Arvind Visvanathan CSCE 990 Seminar in Multi-Dimensional Chromatography Systems, Informatics, and.
For internal use only / Copyright © Siemens AG All rights reserved. Multiple-instance learning improves CAD detection of masses in digital mammography.
Data mining and statistical learning - lecture 13 Separating hyperplane.
1 OUTLINE Motivation Distributed Measurements Importance Sampling Results Conclusions.
Sparse vs. Ensemble Approaches to Supervised Learning
Diagnosis of Ovarian Cancer Based on Mass Spectrum of Blood Samples Committee: Eugene Fink Lihua Li Dmitry B. Goldgof Hong Tang.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Neural Optimization of Evolutionary Algorithm Strategy Parameters Hiral Patel.
A hybrid method for gene selection in microarray datasets Yungho Leu, Chien-Pan Lee and Ai-Chen Chang National Taiwan University of Science and Technology.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Computer Vision Lecture 8 Performance Evaluation.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
197 Case Study: Predicting Breast Cancer Invasion with Artificial Neural Networks on the Basis of Mammographic Features MEDINFO 2004, T02: Machine Learning.
A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The.
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
Electrical and Computer Systems Engineering Postgraduate Student Research Forum 2001 WAVELET ANALYSIS FOR CONDITION MONITORING OF CIRCUIT BREAKERS Author:
Chapter 11 Analysis and Explanation. Chapter 11 Outline Explain how CI systems do what they do Only a few methodologies are discussed here Sensitivity.
Define Problem Select Appropriate Methods Obtain and store sample Pre-treat sample Perform required measurements Compare results with standards Apply necessary.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure.
RECENT DEVELOPMENTS OF INDUCTION MOTOR DRIVES FAULT DIAGNOSIS USING AI TECHNIQUES 1 Oly Paz.
Multimodal Information Analysis for Emotion Recognition
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
SDSS photo-z with model templates. Photo-z Estimate redshift (+ physical parameters) –Colors are special „projection” of spectra, like PCA.
Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000.
Univ logo Fault Diagnosis for Power Transmission Line using Statistical Methods Yuanjun Guo Prof. Kang Li Queen’s University, Belfast UKACC PhD Presentation.
A.N.N.C.R.I.P.S The Artificial Neural Networks for Cancer Research in Prediction & Survival A CSI – VESIT PRESENTATION Presented By Karan Kamdar Amit.
On the Role of Dataset Complexity in Case-Based Reasoning Derek Bridge UCC Ireland (based on work done with Lisa Cummins)
Prognostic Prediction of Breast Cancer Using C5 Sakina Begum May 1, 2001.
Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Reservoir Uncertainty Assessment Using Machine Learning Techniques Authors: Jincong He Department of Energy Resources Engineering AbstractIntroduction.
On Utillizing LVQ3-Type Algorithms to Enhance Prototype Reduction Schemes Sang-Woon Kim and B. John Oommen* Myongji University, Carleton University*
Cheng-Lung Huang Mu-Chen Chen Chieh-Jen Wang
Intelligent Database Systems Lab Presenter: NENG-KAI, HONG Authors: HUAN LONG A, ZIJUN ZHANG A, ⇑, YAN SU 2014, APPLIED ENERGY Analysis of daily solar.
Applying Support Vector Machines to Imbalanced Datasets Authors: Rehan Akbani, Stephen Kwek (University of Texas at San Antonio, USA) Nathalie Japkowicz.
Introduction Background Medical decision support systems based on patient data and expert knowledge A need to analyze the collected data in order to draw.
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
FUZZ-IEEE Kernel Machines and Additive Fuzzy Systems: Classification and Function Approximation Yixin Chen and James Z. Wang The Pennsylvania State.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.
1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1classifier 2classifier.
In Search of the Optimal Set of Indicators when Classifying Histopathological Images Catalin Stoean University of Craiova, Romania
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data.
Supervised Time Series Pattern Discovery through Local Importance
Implementing Boosting and Convolutional Neural Networks For Particle Identification (PID) Khalid Teli .
Multi-dimensional likelihood
Prediction as Data Mining Task
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
Brain Hemorrhage Detection and Classification Steps
An Inteligent System to Diabetes Prediction
Classification of class-imbalanced data
COSC 4335: Other Classification Techniques
Somi Jacob and Christian Bach
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Credit Card Fraudulent Transaction Detection
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

Diagnosis of Ovarian Cancer Based on Mass Spectra of Blood Samples Hong Tang Yelena Mukomel Eugene Fink

Motivation Early detection of cancer by analysis of blood samples. Fast inexpensive test Little discomfort

Outline Mass-spectrum curves Feature extraction Experimental results Conclusions

Mass spectrum – –4 5,000 10,00015,000 20,000 0 ratio of molecular weight to net electric charge signal intensity The curve of a cancer patient usually differs from that of a healthy person.

Patient data Data set Number of cases CancerHealthy Mass-spectrum curves of 685 people Every curve consists of 15,155 points

Outline Mass-spectrum curves Feature extraction Experimental results Conclusions

Candidate features – –4 5,000 10,00015,000 20,000 0 ratio of molecular weight to net electric charge signal intensity Every point of the mass-spectrum curve is a candidate feature Its relevance depends on the mean difference between values for cancer patients and healthy people

Feature relevance hh cc standard deviations hh cc means cancer healthy signal intensity candidate feature Mean difference: |  c –  h | Standard deviation of the difference: (  c 2 +  h 2 ) 0.5 Relevance measure: |  c –  h | (  c 2 +  h 2 ) 0.5

Minimal distance Impose a lower bound on the distance between feature points, which prevents the selection of correlated features After selecting a feature point, discard all points within this distance bound – –4 signal intensity feature min distance discard

Feature selection Repeat for a given number of features: Select the most relevant feature point Discard all points within the minimal distance from the selected feature – –4 signal intensity

Outline Mass-spectrum curves Feature extraction Experimental results Conclusions

Number of feature points: 1 to 64 Control variables Min distance between features: 1 to 1024 Data mining techniques: – Decision trees (C4.5) – Support vector machines ( SVMF u) – Neural networks (Cascor 1.2)

Sensitivity: Probability of the correct diagnosis for a cancer patient Measurements Specificity: Probability of the correct diagnosis for a healthy person

Results Num. of features Min. dist. Sensi- tivity Speci- ficity Set 1DT SVM NN % 82% 80% 78% 84% 84% Set 2DT SVM NN % 96% 93% 96% 93% 98% Set 3DT SVM NN % 100% 100% 100% 99% 99%

Summary Performance range Sensitivity: 80%–100% Specificity: 78%–100%

Summary SensitivitySpecificity Set 1 80%–86% 78%–84% Set 2 92–96% 93%–96% Set 3 98%–100%99%–100% Optimal parameters Number of feature points: 4–32 Min distances between features: 1–256 Data mining technique: Any Performance range

Outline Mass-spectrum curves Feature extraction Experimental results Conclusions

We have developed a technique for the detection of ovarian cancer based on the analysis of blood mass spectra. The accuracy of this technique is still low, and results vary across data sets.

Future work Use more patient data Consider other features of mass-spectrum curves Apply to other cancers