Feature extraction for change detection Can you detect an abrupt change in this picture? Ludmila I Kuncheva School of Computer Science Bangor University.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Interest points CSE P 576 Ali Farhadi Many slides from Steve Seitz, Larry Zitnick.
Data Mining Classification: Alternative Techniques
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
ON CHANGE Zeno. The Dichotomy Zeno’s arguments about motion which provide trouble for those who try to resolve them are four in number. The first.
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Zeno’s Paradoxes Can I move?. Zeno The Arrow Paradox Y ou cannot even move. If everything when it occupies an equal space is at rest, and if that which.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Three Papers: AUC, PFA and BIOInformatics The three papers are posted online.
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Spike-triggering stimulus features stimulus X(t) multidimensional decision function spike output Y(t) x1x1 x2x2 x3x3 f1f1 f2f2 f3f3 Functional models of.
Ensemble Learning: An Introduction
Evaluating Hypotheses
ROC Curves.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Jeremy Wyatt Thanks to Gavin Brown
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Experimental Evaluation
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
Decision Tree Models in Data Mining
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Ensembles of Classifiers Evgueni Smirnov
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
1 CSI 5388: ROC Analysis (Based on ROC Graphs: Notes and Practical Considerations for Data Mining Researchers by Tom Fawcett, (Unpublished) January 2003.
Think of a topic to study Review the previous literature and research Develop research questions and hypotheses Specify how to measure the variables in.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Empirical Financial Economics Asset pricing and Mean Variance Efficiency.
1 Pattern Recognition Concepts How should objects be represented? Algorithms for recognition/matching * nearest neighbors * decision tree * decision functions.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Computer Graphics and Image Processing (CIS-601).
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 8 Sept 23, 2005 Nanjing University of Science & Technology.
Introduction to Statistics Measures of Central Tendency and Dispersion.
Evaluating Classification Performance
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Deep Feedforward Networks
Trees, bagging, boosting, and stacking
Principal Component Analysis (PCA)
Basic machine learning background with Python scikit-learn
Outline S. C. Zhu, X. Liu, and Y. Wu, “Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo”, IEEE Transactions On Pattern Analysis And Machine.
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
Generally Discriminant Analysis
CS4670: Intro to Computer Vision
Principal Component Analysis
CAMCOS Report Day December 9th, 2015 San Jose State University
Presentation transcript:

Feature extraction for change detection Can you detect an abrupt change in this picture? Ludmila I Kuncheva School of Computer Science Bangor University Answer – at the end

Plan 1.Zeno says there is no such thing as change... 2.If change exists, is it a good thing? 3.Context or nothing! 4.Feature extraction for change detection – PCA backwards?

Zeno of Elea (ca. 490–430 BC) “If everything, when it occupies an equal space, is at rest, and if that which is in locomotion is always occupying such a space at any moment, the flying arrow is therefore motionless.” – as recounted by Aristotle, Physics VI:9, 239b5 No motion, no movement, NO CHANGE Zeno’s Paradox of the Arrow

Does change exist? Zeno says ‘no’...

Nonetheless... Change Types Possible applications: fraud detection market analysis medical condition monitoring network traffic control Univariate detectors (Control charts): Shewhart's method CUSUM (CUmulative SUM) SPRT (Wald's Sequential Probability Ratio Test)

2 approaches Use an adaptive algorithm (No need to identify the type of change or detect change explicitly) Detect change (Update/re-train the algorithm if necessary) Labelled data Unlabelled data

Data (all features) Labels are available Classifier Distribution modelling Error rate Change statistic threshold Change/ NO change Classification

Data (all features) Labels are available Labels are NOT available Classifier Distribution modelling Error rate Change statistic threshold Change/ NO change Data (all features) Feature EXTRACTOR Distribution modelling Change statistic threshold Change/ NO change Features multidimensional

Data (all features) Labels are available Labels are NOT available Classifier GMM HMM Parzen windows kernel methods martingales Error rate threshold Change/ NO change Data (all features) Feature EXTRACTOR clustering kernel methods GMM kd-trees Hotelling threshold Change/ NO change Features

A change in the (unconditional) data distribution will: 1.render the classifier useless 2.make no difference to the classification performance 3.improve the classification performance Classification

A change in the (unconditional) data distribution will: 1.render the classifier useless 2.make no difference to the classification performance 3.improve the classification performance Vote, please!

A change in the (unconditional) data distribution will: 1.render the classifier useless 2.make no difference to the classification performance 3.improve the classification performance Vote, please!

Classification No change in the (unconditional) data distribution will: 1.render the classifier useless 2.make no difference to the classification performance 3.improve the classification performance

No change in the (unconditional) data distribution will: 1.render the classifier useless 2.make no difference to the classification performance 3.improve the classification performance Vote, please!

Classifier ensembles Brain-computer interface MathWorks products My scope of interest Literature

Change may or may not cause trouble...

Is there a change ?

mean (moving average) mean – 2std changes Shewhart with threshold 2 sigma Yes!

Is there a change ? No!

Is there a change?

Yes, for the purposes of “Spot the difference”. No, as this is a bee with a flower in the sun.

Is there a change? No!

Is there a change? sin(10x) * randn sin(20x) * randn Yes!

change detection

Change does not exist out of context!

ENTER Feature Extraction

Context: Amplitude variability Feature: AMPLITUDE

Context: Time series patterns in a fixed window. Feature: A PATTERN IN A FIXED WINDOW

Context: Children’s puzzle Feature: PIXEL B/W VALUE Context: Frequency variability Feature: FREQUENCY sin(10x) * randnsin(20x) * randn

Suppose that CONTEXT is not available. Principal Component Analysis (PCA) captures data variability. Then why not use PCA here? Labels are NOT available Data (all features) Feature EXTRACTOR Distribution modelling Change statistic threshold Change/ NO change Features

PCA intuition: The components corresponding to the largest eigen values are more important

But is this the case for change detection? Distributions are similar (small sensitivity to change) Distributions are different (large sensitivity to change) PC1 PC2 Holds for “blind”: Translation Rotation Variance change... Kuncheva L.I. and W.J. Faithfull, PCA feature extraction for change detection in multidimensional unlabelled data, IEEE Transactions on Neural Networks and Learning Systems, 25(1), 2014, 69-80

Some experiments: 1.Take a data set with n features 2.Sample randomly “windows” W1 and W2 with K objects in each window. 3.Calculate PCA from W1. Choose a proportion of explained variance and use the remaining (low- variance) components. 4.Generate a random integer k between 1 and n 4(a)Shuffle VALUES Choose randomly k features. For each chosen feature, shuffle randomly the values for this feature in window W2. 4(b)Shuffle FEATURES Choose randomly k features. Randomly permute the respective columns in window W2. 5.Transform W2 using the calculated PC and keep the low-variance components. 6.Calculate the CHANGE DETECTION CRITERION between W1 and W2. Store as NEGATIVE INSTANCE (no change).

Some experiments: 1.Take a data set with n features 2.Sample randomly “windows” W1 and W2 with K objects in each window. 3.Calculate PCA from W1. Choose a proportion of explained variance and use the remaining (low- variance) components. 4.Generate a random integer k between 1 and n 4(a)Shuffle VALUES Choose randomly k features. For each chosen feature, shuffle randomly the values for this feature in window W2. 4(b)Shuffle FEATURES Choose randomly k features. Randomly permute the respective columns in window W2. 5.Transform W2 using the calculated PC and keep the low-variance components. 6.Calculate the CHANGE DETECTION CRITERION between W1 and W2. Store as POSITIVE INSTANCE (change).

Run 100 times for POS and 100 for NEG to get the ROC curve for a given data set. Run 100 times for POS and 100 for NEG without applying PCA to get the ROC curve for a given data set. Use the Area Under the Curve (AUC), however disputed this might have become recently... Larger AUC corresponds to better change detection

VALUE shuffle

FEATURE shuffle

PCA - use the least relevant components!?

Conclusion 1.Change detection may be harmful, beneficial or indifferent to classification performance 2.Change does not exist out of context, therefore GENERIC algorithms for change detection are somewhat pointless... 3.Feature extraction for change detection may not follow conventional intuition.

Can you detect an abrupt change in this picture? Remember my little puzzle?