Focused Reducts Janusz A. Starzyk and Dale Nelson.

Slides:



Advertisements
Similar presentations
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Cluster Analysis: Basic Concepts and Algorithms
Wavelets Fast Multiresolution Image Querying Jacobs et.al. SIGGRAPH95.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
L.M. McMillin NOAA/NESDIS/ORA Regression Retrieval Overview Larry McMillin Climate Research and Applications Division National Environmental Satellite,
Chapter 7 – Classification and Regression Trees
CMPUT 466/551 Principal Source: CMU
Chapter 7 – Classification and Regression Trees
The loss function, the normal equation,
Discriminative and generative methods for bags of features
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Evolutionary Feature Extraction for SAR Air to Ground Moving Target Recognition – a Statistical Approach Evolving Hardware Dr. Janusz Starzyk Ohio University.
Robust supervised image classifiers by spatial AdaBoost based on robust loss functions Ryuei Nishii and Shinto Eguchi Proc. Of SPIE Vol D-2.
Ensemble Learning: An Introduction
Induction of Decision Trees
What is Cluster Analysis?
Examples of Ensemble Methods
2015年7月2日星期四 2015年7月2日星期四 2015年7月2日星期四 Data Mining: Concepts and Techniques1 Data Transformation and Feature Selection/Extraction Qiang Yang Thanks: J.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Ensemble Learning (2), Tree and Forest
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Computer vision.
Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineering National Institute of Technology.
Chapter 9 – Classification and Regression Trees
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
Benk Erika Kelemen Zsolt
What’s in a sum? Simon Dawes CMHAS. Should you stack frames using Mean, Median or Sum?
Detection, Classification and Tracking in a Distributed Wireless Sensor Network Presenter: Hui Cao.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
The Implementation of Markerless Image-based 3D Features Tracking System Lu Zhang Feb. 15, 2005.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Linear Models for Classification
Radiometric Normalization Spring 2009 Ben-Gurion University of the Negev.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Classification Ensemble Methods 1
KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Semantic Alignment Spring 2009 Ben-Gurion University of the Negev.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Clustering Anna Reithmeir Data Mining Proseminar 2017
Data Transformation: Normalization
Classification: Logistic Regression
Data Mining K-means Algorithm
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Pawan Lingras and Cory Butz
K Nearest Neighbor Classification
Data Mining Practical Machine Learning Tools and Techniques
Introduction to Data Mining, 2nd Edition
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
The loss function, the normal equation,
Data Transformations targeted at minimizing experimental variance
Mathematical Foundations of BME Reza Shadmehr
Text Categorization Berlin Chen 2003 Reference:
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Focused Reducts Janusz A. Starzyk and Dale Nelson

What Do We Know? Major Assumption ASSUMPTION: This is ALL we know Real World Model Sampled Data

Problem Size Dilemma …

Rough Set Tutorial Difference between rough sets and fuzzy sets Labeling data Remove duplicates/ambiguities What is a core? What is a reduct?

Rough Sets vs Fuzzy Sets Fuzzy Sets - How gray is the pixel Rough Sets - How big is the pixel

Example Sample HRR Data

Example Label Data Label 1 = Label 2.45 Labeling can be different for different columns/attributes Ranges can be different for different columns/attributes

Remove Ambiguities & Duplicates

Equivalence Classes E 1 ={1, 2, 3} E 2 ={4, 5} E 3 ={6} E 4 ={7} E 5 ={8}

Definitions Reduct - A reduct is a reduction of an information system which results in no loss of information (classification ability) by removing attributes (range bins). There may be one or many for a given information system) Core - A core is the set of attributes (range bins) which are common to all reducts.

Compute Core Signals 6 and 8 are ambiguous upon removal of Range Bin 1. Therefore, Range Bin 1 is part of core. Core - The range bins common to ALL reducts - The most essential range bins without which signals cannot be classified

Compute Core No ambiguous signals therefore, Range Bin 2 is NOT part of core.

Compute Core No ambiguous signals therefore, Range Bin 3 is NOT part of core.

Compute Core No ambiguous signals therefore, Range Bin 4 is NOT part of core.

Compute Reducts Range Bin 1 + Range Bin 2 Range Bin 1 and Range Bin 2 classify therefore, they belong to a reduct

Compute Reducts Range Bin 1 + Range Bin 3 Range Bin 1 and Range Bin 3 do not classify therefore, they do NOT belong to a reduct

Compute Reducts Range Bin 1 + Range Bin 4 Range Bin 1 and Range Bin 4 classify therefore, they belong to a reduct

Reduct Summary Range bins 1 and 2 are a reduct –Sufficient to classify all signals Range bins 1 and 4 are a reduct –Sufficient to classify all signals Range bins 1 and 3 are NOT a reduct –Cannot distinguish target classes 2 and 3 No need to try –Range bins 1, 2, 3 –Range bins 1, 2, 4

Did You Notice? Calculating a reduct is time consuming! n = 29 value = 536,870,911 We are interested in n  50 This is a BIG NUMBER requiring a lot of time to compute reduct which is a f (# signals), too

Why Haven’t Rough Sets Been Used Before?

The Procedure Normalize signal Partition signal –Block –Interleave Wavelet transform Binary multi-class entropy labeling Entropy based range bin selection Determine minimal reducts Fuse marginal reducts for classification

Data Synthetic generated by XPATCH Six targets –1071 Signals per target –128 Range bins/signal –Azimuth -25 o to +25 o –Elevation -20 o to 0 o

Normalize the Data Ensures all data is range normalized Use the 2 Norm Divide each signal bin value by N

Partition the Signal Block Partitioning

Partition the Signal Interleave Partitioning 1st2nd3rd4th5th6th7th8th 1 Piece 2 Pieces 4 Pieces 8 Pieces

Why Use a Wavelet Transform? Original Signal Best- 20/60 signals Classified Best Wavelet 50/60 Signals Classified!! Many features are better than the best from original signal

HRR Signal and Its Haar Transform

Multi-Class Information Entropy Let x i be range bin values across all signals for a target class Define Without assuming any particular distribution we can define the probability as: Using this definition we define two other probabilities where Then multi-class entropy is defined as:

Binary Multi-Class Labeling

Range Bin Selection Total range bins available depends on partition size We chose 50 bins per reduct –Time considerations –Implications Based on maximum relative entropy

Compute Core Computation of core is easy and fast –Eliminate one range bin at a time and see if the training set is ambiguous - only that range bin can discriminate between the ambiguous signals –Accumulate the bins resulting in ambiguous data - that is the core These range bins MUST be in every reduct O(n) process

Compute Minimal Reducts To the core add one range bin at a time and compute the number of ambiguities Select the range bin(s) with the fewest ambiguities-there may be several-save these as we will use them to compute the reduct Add that range bin to the core and repeat previous step until there are no ambiguities - this is a reduct Calculate reducts for all bins with equivalent number of ambiguities-yields multiple reducts O(n 2 ) process

Time Complexity Training Set Size 50 to 400 Attributes (Range Bins) 1602 Signals Test Set Size 4823 Signals Need 50

Fuzzy Rough Set Classification Test signals may have a range bin value very close to labeling division point If this happens we define a distance  where this is considered a “don’t care” region Classification process proceeds without the “don’t care” range bin

Weighting Formula Requirements We desire the following for combining classifications –All Pcc(s) = 0  weight = 0 –All Pcc(s) = 1  weight = 1 –Several low Pcc(s)  weight higher than any of the Pcc(s) –One high Pcc and several low Pcc(s)  weight higher than the highest Pcc

Weighting Formula

Fusing Marginal Reducts Each signal is marked with the classification by each reduct along with the reduct’s performance (Pcc) on the training set A weight is computed for each target class for each signal A signal is assigned the target class with the highest weight

Results - Training

Results Testing

Conjectures Robust in the presence of noise –Due to binary labeling –Due to fuzzification Robust to signal registration –Due to binary labeling –Due to averaging effect of wavelets on interleaved partitions –Due to fuzzification

Rough Set Theoretic HRR ATR - Summary 01TIME METHOD -Normalize Signal -Partition Signal - Block - Interleave -Wavelet Transform -Binary Multi-class Entropy Labeling -Entropy based Range Bin Selection -Determine Minimal Reducts -Fuse marginal reducts for classification BREAKTHROUGHS -Reduct (classifier) generation time from exponential to quadratic ! -Fusion of marginal (poor performing) reducts -Wavelet Transform Aiding -Multi partition to increase number of range bins considered -Use of binary multi-class entropy labeling -Entropy based range bin selection -Performance within 1% of theoretic best -Max problem size increased by 2 orders of magnitude APPLICATIONS -1-D Signals -HRR -LADAR vibration -Sonar -Medical -Stock market -Data Mining Quadratic Exponential

Future Directions Fuzz factor sensitivity study Sensitivity to signal alignment Sensitivity to noise Iterated wavelet transform performance study Effectiveness on air to ground targets Other application areas