COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah – www.k-space.org.

Slides:



Advertisements
Similar presentations
Computer Aided Diagnosis: Feature Extraction
Advertisements

COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
Component Analysis (Review)
Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Gene selection using Random Voronoi Ensembles Stefano Rovetta Department of Computer and Information Sciences, University of Genoa, Italy Francesco masulli.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Principal Component Analysis
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Evaluation.
Lecture 6: Multiple Regression
Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability Software complexity and software quality.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
An Introduction to Support Vector Machines Martin Law.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Summarized by Soo-Jin Kim
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Lecture 19 Representation and description II
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Universit at Dortmund, LS VIII
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Usman Roshan Machine Learning, CS 698
An Introduction to Support Vector Machines (M. Law)
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Chapter 7 FEATURE EXTRACTION AND SELECTION METHODS Part 2 Cios / Pedrycz / Swiniarski / Kurgan.
Applying Statistical Machine Learning to Retinal Electrophysiology Matt Boardman January, 2006 Faculty of Computer Science.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Chapter 16 Data Analysis: Testing for Associations.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Chapter 12 Object Recognition Chapter 12 Object Recognition 12.1 Patterns and pattern classes Definition of a pattern class:a family of patterns that share.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Data Mining and Decision Support
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
2D-LDA: A statistical linear discriminant analysis for image matrix
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Feature Extraction 主講人:虞台文.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Principal Component Analysis (PCA)
Data Transformation: Normalization
Chapter 7. Classification and Prediction
LECTURE 11: Advanced Discriminant Analysis
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
School of Computer Science & Engineering
LECTURE 10: DISCRIMINANT ANALYSIS
Machine Learning Basics
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Feature Selection Methods
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –

Recommended Reference  Recent Advances in Breast Imaging, Mammography, and Computer-Aided Diagnosis of Breast Cancer, Editors: Jasjit S. Suri and Rangaraj M. Rangayyan, SPIE Press, 2006.

CAD

Feature Extraction and Selection  Feature extraction and selection in pattern recognition are based on finding mathematical methods for reducing dimensionality of pattern representation  A lower-dimensional representation based on independent pattern descriptors is important  Plays crucial role in determining separating properties of pattern classes  The choice of features, attributes, or measurements to be included has an important influence on:  Accuracy of classification  Time needed for classification  Number of examples needed for learning  Cost of performing classification

Feature Selection  Features extracted from images need not represent significant information to diagnosis  May describe aspects of no relevance to the pathology of interest  May vary a lot with acquisition settings (pose, processing, etc.)  Several problems should be mitigated in feature selection  Features that do not correlate with pathology  Features that are not independent  Building classifiers with features not properly selected will:  Cause problems in the training phase  Will not yield the best overall classification accuracy

Statistical Significance of Features  Idea: if the feature changes consistently with pathology, then the hypothesis of a statistically significant difference between the set of values for normal and abnormal cases will be true  Inferential Statistical tests like Student t-test should detect a difference

Statistical Significance of Features  Student t-test steps:  Consider a particular feature of interest  Divide the values into two sets for normal and abnormal cases  Compute the mean and standard deviation for both sets  Use the t-test to compute the p-value of the null hypothesis that both sets do not have a statistically significant difference  The feature is suitable if the p-value is small (e.g., 0.05, 0.01, etc.)

Statistical Significance of Features  Important to keep in mind that large difference in value does not mean statistical significance  Data dispersion is a key factor  General form: multiple groups  Diagnosis not detection  More general inferential statistics  Nonparametric methods  Kolmogorov-Smirnov test  Wilcoxon signed rank test

Assessment of Feature Independence  Some features may end up being dependent  Example: feature computed as a constant factor of another  Only one of them should be included in the input to classification stage  Several methods can help identify such dependence  Pearson’s linear correlation coefficient  Principal component analysis

Pearson’s linear correlation coefficient  Computes the correlation between a given pair of features  Computing formula:  The value of r lies between -1 and 1, inclusive  Value of 1 is called “complete positive correlation”: straight line y = c x  Value of -1 is called “complete negative correlation”: straight line y= -c x  Value of r near zero indicates that the variables x and y are uncorrelated

Principal Component Analysis  Computes the eigenvalue decomposition of the matrix of features  Rank of matrix = number of independent features  Directions of principal components may have different performance in classification

Retrospective Assessment of Features  Retrospective: evaluation after seeing the classification results based on these features  Basic idea: use for classification and then choose the features that produce the best results  Exhaustive search  Branch and Bound Algorithm  Max-Min Feature Selection  Sequential Forward and Sequential Backward Selection  Fisher's Linear Discriminant

Exhaustive Search  Let y with y = [y 1,y 2, …, y D ] be a pattern vector, exhaustive search selects the d best features out of the maximal available features D as to minimize the classification error  The resulting number of total combinations is:  Main advantage: guaranteed best answer  Main disadvantage of exhaustive search is that the total number of combinations increases exponentially with the dimension of the feature vector

Branch and Bound Algorithm  Solves the problem of huge computational cost associated with the exhaustive search  New technique to determine the optimal feature set without explicit evaluation of all possible combinations of d features  This technique is applicable when the separability criterion is monotonic:  Combinatorial optimization  Efficient computation

Assignments  Apply 2 feature selection methods to the features obtained from previous tasks and compare their results  Add a random data vector as a feature to your extracted features and report how your feature selection strategy was able to detect that it is irrelevant  Add a new feature as the sum of two existing features to your extracted features and report how your feature selection strategy was able to detect that this new feature is dependent