Prognostic Prediction of Breast Cancer Using C5 Sakina Begum May 1, 2001.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Evolutionary Neural Logic Networks for Breast Cancer Diagnosis A.Tsakonas 1, G. Dounias 2, E.Panourgias 3, G.Panagi 4 1 Aristotle University of Thessaloniki,
Rule extraction in neural networks. A survey. Krzysztof Mossakowski Faculty of Mathematics and Information Science Warsaw University of Technology.
My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro Tom Khabaza Sridhar Ramaswamy Presented briefly by Joey.
Lecture Notes for Chapter 4 Introduction to Data Mining
Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Diagnosis of Ovarian Cancer Based on Mass Spectra of Blood Samples Hong Tang Yelena Mukomel Eugene Fink.
Lecture 5 (Classification with Decision Trees)
Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.
1 Diagnosing Breast Cancer with Ensemble Strategies for a Medical Diagnostic Decision Support System David West East Carolina University Paul Mangiameli.
Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.
Faculty of Medicine - Benha University
FINE - NEEDLE ASPIRATION BIOPSY By Dr. Tarek Atia.
Breast Cancer Diagnosis A discussion of methods Meena Vairavan.
Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.
1 Introduction to Support Vector Machines for Data Mining Mahdi Nasereddin Ph.D. Pennsylvania State University School of Information Sciences and Technology.
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Intelligent Data Analysis (IDA) by Josipa Kern, PhD Andrija Stampar School of Public Health Medical School University of Zagreb Zagreb, Croatia.
Bayesian Network for Predicting Invasive and In-situ Breast Cancer using Mammographic Findings Jagpreet Chhatwal1 O. Alagoz1, E.S. Burnside1, H. Nassif1,
Breast Cancer Diagnosis via Linear Hyper-plane Classifier Presented by Joseph Maalouf December 14, 2001 December 14, 2001.
Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive.
1 An Efficient Classification Approach Based on Grid Code Transformation and Mask-Matching Method Presenter: Yo-Ping Huang Tatung University.
COMP3503 Intro to Inductive Modeling
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
WELCOME. Malay Mitra Lecturer in Computer Science & Application Jalpaiguri Polytechnic West Bengal.
Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.
Feature Selection: Why?
Today Ensemble Methods. Recap of the course. Classifier Fusion
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Neural Network Classification versus Linear Programming Classification in breast cancer diagnosis Denny Wibisono December 10, 2001.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Improved Design for Fine-Needle Aspiration (FNA) of Breast Cancer Lesions Alissa Garman Janie Goldsworthy Kristi Hinner Nick Kortan Client: Elizabeth Burnside.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Konstantina Christakopoulou Liang Zeng Group G21
Data Mining and Decision Support
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.
Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.
THIRD CLASSIFICATION OF MICROCALCIFICATION STAGES IN MAMMOGRAPHIC IMAGES THIRD REVIEW Supervisor: Mrs.P.Valarmathi HOD/CSE Project Members: M.HamsaPriya( )
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Classification of Breast Cancer Cells Using Artificial Neural Networks and Support Vector Machines Emmanuel Contreras Guzman.
Hybrid Ant Colony Optimization-Support Vector Machine using Weighted Ranking for Feature Selection and Classification.
Learning to Detect and Classify Malicious Executables in the Wild by J
Fine-needle aspiration of clinically suspicious palpable breast masses with histopathologic correlation Reshma Ariga, M.D., Kenneth Bloom, M.D., Vijaya.
In Search of the Optimal Set of Indicators when Classifying Histopathological Images Catalin Stoean University of Craiova, Romania
Mammogram Analysis – Tumor classification
Rule Induction for Classification Using
Chapter 6 Classification and Prediction
Classification and Prediction
Problem Statement GOAL: To modify the needle used during a Fine Needle Aspiration (FNA) procedure. The modification should not drastically change the.
Regression Computer Print Out
Predicting Breast Cancer Diagnosis From Fine-Needle Aspiration
Classification and Prediction
Handling and Evaluation of Breast Cancer Biopsy
CSCI N317 Computation for Scientific Applications Unit Weka
Somi Jacob and Christian Bach
Submitted By : Pratish Singh Kuldeep Choudhary Chinmay Panchal
CS+Social Good.
Adapted by Dr. Sarah from a talk by Dr. Catherine A. Gorini
Machine Learning: Lecture 5
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

Prognostic Prediction of Breast Cancer Using C5 Sakina Begum May 1, 2001

Breast Cancer Diagnosis Second leading cause of cancer death in women. Fine Needle Aspirate (FNA)  extract cells and fluid from mass using thin needle  examine cells under microscope Early detection of breast cancer depends on accurate diagnosis.

Ability to correctly diagnose cancer using FNA and visual interpretation varies from 65% to 98%.

University of Wisconsin hospitals use Xcyt. Use information about cell characteristic from FNA and multisurface method to determine if tumor is benign or malignant. I wanted to do the same thing using C5.

Data Preparation File has 569 patients, 32 attributes for each patient  ID  diagnosis  10 average cell characteristics  10 standard deviations for each cell characteristic  10 “worst” cell characteristics Two files:  All 32 attributes  12 attributes (including 10 average cell characteristics)

sed and awk are programmable UNIX utilities that perform actions on lines that match a particular condition. awk -f awkfile -F, data1 > data2 {print($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)} M sed ‘s/ /,/g’ data2 > cancer.data ,M,17.99,10.38,122.8,1001,0.1184,0.2776,0.3001, ,0.2419,

Data Mining C5 extracts informative patterns from data. -f identifies the application name (called a filestem). -r causes rules to be derived from trees. -S x constructs a classifier containing x% of data from data file. Classifier is evaluated on a non-overlapping set of test cases.

Successive runs of C5 with sampling will usually produce different results. I used sampling size 10%, 30%, 50%, 70%, and 90%. I ran C5 three times on each different sampling size. By default, the random sample changes every time a classifier is constructed. case 1 case 2 case 1 case 2 case 1 case 2

concave pointsareaperimeter texture symmetry concave points compactness M B M M M M M B B B                 0.211

Each rule consists of:  arbitrary rule number  statistics  one or more conditions that must be satisfied  class predicted by rule  confidence with which prediction is made Statistics:  number of training cases covered by rule/number of cases that do not belong to the rule  lift is result of dividing the rules estimated accuracy by relative frequency of predicted class.

Conclusion Decision tree gives average of 6% to 7% errors. Classifier may be overtrained. Better results by selecting few cell features. Developers of Xcyt obtained best results using three features: worst area, worst smoothness, and average texture.

Lessons Learned Familiar with C5. Importance of knowledge of domain. Further work:  Build classifier using different subset of features.  Use adaptive boosting option.

References W. N. Street, O. L. Mangasarian, W. H. Wolberg. An Inductive Learning Approach to Prognostic Prediction O. L. Mangasarian, W. N. Street, W. H. Wolberg. Breast Cancer Diagnosis and Prognosis via Linear Programming Machine Learning for Cancer Diagnosis and Prognosis: