Feature Selection and Error Tolerance for the Logical Analysis of Data Craig Bowles Kathryn Davidson Cornell University University of Pennsylvania Mentor:

Slides:

Advertisements

Similar presentations

Evolutionary Neural Logic Networks for Breast Cancer Diagnosis A.Tsakonas 1, G. Dounias 2, E.Panourgias 3, G.Panagi 4 1 Aristotle University of Thessaloniki,

Advertisements

Reminder: Closest-Pair Problem Given a collection P of n points in  × , obtain the two points p1 and p2 such that their distance is less or equal that.

Bayesian Decision Theory

Unsupervised Learning with Artificial Neural Networks The ANN is given a set of patterns, P, from space, S, but little/no information about their classification,

COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –

Artificial Neural Networks

A novel supervised feature extraction and classification framework for land cover recognition of the off-land scenario Yan Cui

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

Turning Privacy Leaks into Floods: Surreptitious Discovery of Social Network Friendships Michael T. Goodrich Univ. of California, Irvine joint w/ Arthur.

Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.

Reduced Support Vector Machine

Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.

Lecture 5 (Classification with Decision Trees)

CS 376b Introduction to Computer Vision 02 / 25 / 2008 Instructor: Michael Eckmann.

Error Tolerance and Feature Selection for the Logical Analysis of Data Presenter: Kathryn Davidson University of Pennsylvania Mentor: Dr. Endre Boros RUTCOR.

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.

On Generating All Shortest Paths and Minimal Cut-sets By: Beth Hayden, Daniel MacDonald July 20, 2005 Advisor: Dr. Endre Boros, RUTCOR.

September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +

MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Breast Cancer Diagnosis A discussion of methods Meena Vairavan.

Mathematical Programming in Support Vector Machines

Neural Networks Lecture 8: Two simple learning algorithms

CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.

Breast Cancer Diagnosis via Linear Hyper-plane Classifier Presented by Joseph Maalouf December 14, 2001 December 14, 2001.

Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive.

1 Lazy Learning – Nearest Neighbor Lantz Ch 3 Wk 2, Part 1.

CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.

WELCOME. Malay Mitra Lecturer in Computer Science & Application Jalpaiguri Polytechnic West Bengal.

Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.

CSCI 256 Data Structures and Algorithm Analysis Lecture 14 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Applying Statistical Machine Learning to Retinal Electrophysiology Matt Boardman January, 2006 Faculty of Computer Science.

Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000.

An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)

Week 1 - An Introduction to Machine Learning & Soft Computing

Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.

Neural Network Classification versus Linear Programming Classification in breast cancer diagnosis Denny Wibisono December 10, 2001.

Prognostic Prediction of Breast Cancer Using C5 Sakina Begum May 1, 2001.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto Univ.)

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Principal Component Analysis (PCA)

MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Predicting Post-Operative Patient Gait Jongmin Kim Movement Research Lab. Seoul National University.

Start with student evals. What function does perceptron #4 represent?

Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.

Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.

Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.

An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)

Hybrid Ant Colony Optimization-Support Vector Machine using Weighted Ranking for Feature Selection and Classification.

Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.

Background on Classification

Classification Discriminant Analysis

DICOM 11/21/2018.

Adaboost for faces. Material

Algorithms, Part 3 of 3 Topics In-Class Project: The Box

Concave Minimization for Support Vector Machine Classifiers

Chapter 7: Transformations

CS+Social Good.

Adapted by Dr. Sarah from a talk by Dr. Catherine A. Gorini

Presentation transcript:

Feature Selection and Error Tolerance for the Logical Analysis of Data Craig Bowles Kathryn Davidson Cornell University University of Pennsylvania Mentor: Endre Boros RUTCOR

Our Goals Train a computer to tell us which attributes in a medical data set are important Have the computer suggest possible formulas for distinguishing healthy and sick patients Achieve these goals with as much tolerance for data error as possible

Training Data Set: Wisconsin Breast Cancer Database from University of Wisconsin Hospitals, Madison, Dr. William H. Wolberg Sample Patient Vector: ,5,1,1,1,2,1,3,1,1,2 ID #, 9 test results, class distinction There is a total of 699 patients (458 Benign “2”, 241 Malignant “4”)

Minimal Difference Vectors  Dualization  Maximal Error Difference Vector: |( ) – ( )| = ( ) ***An error  ( ) would not distinguish ( ) from ( )  = {Difference Vectors} (+90,000 for WBCD) Minimal Difference Vectors:    such that there is no other vector in  that is less than or equal to  in every coordinate Next Step: Input {Minimal Difference Vectors} into Dualization Algorithm

Minimal Difference Vectors  Dualization  Maximal Error Vectors Input: Minimal Vectors = Red Output: (5,0), (3,2), (2,3) To find blue points (what we want): Take (dimension) – outputs = (0,5), (2,3), (3,2) /DualizationCode.html

Error Minimal Difference Vectors  Dualization  Maximal Error Vectors The output of the Dualization Algorithm is another set of vectors For each coordinate in these vectors, take the compliment (i.e. 10 – coordinate) Divide each vector by 2 to find a maximal error vector Sort these error vectors by greatest sum, most 5’s, maximal minimum element, etc. Choose an epsilon from the sorted lists that looks good

Binarization For a good error, we binarize the original data: eg: Error  = (0.5, 5, 0, 5, 0, 0, 5, 5, 5) Thresholds (  ): Col 1 = 4, Col 3 =7, Col 5 = 5, Col 6 = 8 For a patient (5, 1, 1, 1, 2, 1, 3, 1, 1) we test each threshold to see if the value in the patient vector is greater or less than the threshold value, or if it is within error (5, 1, 1, 1, 2, 1, 3, 1, 1)  (4, 1, 1, 1, 2, 1, 3, 1, 1)  * 0 0 0

RESULTS FOR WISCONSIN BREAST CANCER DATA Error Tolerance: 0.5, 5, 0, 5, 0, 0, 5, 5, 5 Attributes and Corresponding Thresholds: 1 : 4, 3 : 7, 5 : 5, 6 : 8 Total Pos/Neg Entries: : 96, 21 (162, 23) : 1, 37 (1, 41) : 0, 6 (0, 6) : 2, 16 (3, 17) : 1, 27 (1, 28) : 1, 12 (1, 14) : 0, 19 (0, 21) : 0, 24 (0, 24) : 2, 52 (2, 52) : 268, 2 (334, 4) : 1, 3 (1, 7) : 5, 2 (6, 3) : 0, 4 (0, 5) : 0, 2 (0, 4)

Formula for WBCD: Let P = Col 1  4 Q = Col 3  7 R = Col 5  5 S = Col 6  8 Then we can characterize most (432/444) positives with: -Q  -R  -S Some example patient vectors: Negatives: (8,7,5,10,7,9,5,5,4), (7,4,6,4,6,1,4,3,1) Positives: (4,1,1,1,2,1,2,1,1), (4,1,1,1,2,1,3,1,1)

More to do: Test our procedure on different databases Study heuristic methods for threshold selection In general, explore ways to use more flexible error vectors and/or thresholds References: [1] Boros, E., Hammer, P.L., Ibaraki, T., and Kogan, A. "Logical Analysis of Numerical Data," Math. Programming, 79, (1997), [2] Boros, E., Ibaraki, T., and Makino, K., "Variations on Extending Partially Defined Boolean Functions with Missing Bits," June 6, 2000 [3] Boros, E. July 1, 2005 [4] Mangasarian, O.L. and W. H. Wolberg. "Cancer diagnosis via linear programming", SIAM News, Volume 23, Number 5, September 1990, pp 1 & 18 [5] Rudell, Richard. Espresso Boolean Minimization July 18, 2005