What is Pattern Recognition Recognizing the fish! 1.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Applications of one-class classification
DECISION TREES. Decision trees  One possible representation for hypotheses.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Pattern Recognition and Machine Learning
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
1 Pattern Recognition Pattern recognition is: 1. The name of the journal of the Pattern Recognition Society. 2. A research area in which patterns in data.
CS292 Computational Vision and Language Pattern Recognition and Classification.
Pattern Recognition: Readings: Ch 4: , , 4.13
1 Pattern Recognition Pattern recognition is: 1. The name of the journal of the Pattern Recognition Society. 2. A research area in which patterns in data.
CSE803 Fall Pattern Recognition Concepts Chapter 4: Shapiro and Stockman How should objects be represented? Algorithms for recognition/matching.
Machine Learning CMPT 726 Simon Fraser University
What is Pattern Recognition Recognizing the fish! 1.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Thanks to Nir Friedman, HU
Stockman CSE803 Fall Pattern Recognition Concepts Chapter 4: Shapiro and Stockman How should objects be represented? Algorithms for recognition/matching.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Learning Chapter 18 and Parts of Chapter 20
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Exercise Session 10 – Image Categorization
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Chapter 4 Pattern Recognition Concepts continued.
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
Principles of Pattern Recognition
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
1 Pattern Recognition Concepts How should objects be represented? Algorithms for recognition/matching * nearest neighbors * decision tree * decision functions.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Image Classification 영상분류
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Visual Information Systems Recognition and Classification.
Pattern Recognition 1 Pattern recognition is: 1. The name of the journal of the Pattern Recognition Society. 2. A research area in which patterns in data.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
CSE 4705 Artificial Intelligence
Lecture 1.31 Criteria for optimal reception of radio signals.
Probability Theory and Parameter Estimation I
Special Topics In Scientific Computing
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Pattern Recognition Pattern recognition is:
Learning.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Image Analysis
Pattern Recognition and Machine Learning
Learning Chapter 18 and Parts of Chapter 20
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

What is Pattern Recognition Recognizing the fish! 1

WHAT İS PATTERN Structures regulated by rules Goal:Represent empirical knowledge in mathematical forms the Mathematics of Perception Need: Algebra, probability theory, graph theory

What do you see??? 3

4

Heraclitos: All flows! It is only the invariance, the permanent facts, that enable us to find the meaning in a world of flux. We can only perceive variances Our aim is to find the invariant laws of our varying obserbvations Pattern Recognition

Learning vs. Recognition  Learning: Find the Mathematical Model of a class, using data  Recognition: Use that model to recognize the unknown object 6

Two Schools of Thought 7 Statistical Pattern Recognition: Express the image class by a random variable corresponding to class features and model the class by finding the probability density fonction Structural Pattern Recognition Express the image class by a set of picture primitives. Model the class by finding the relationship among the primitives using grammars or a graphs

1. STATISTICAL PR: ASSUMPTION SOURCE: Hypothesis Classes Obljects CHANNEL: Noisy OBSERVATION: Multiple sensor Variations

Probability Theory  Questions: 1. What is the prob. of selecting an apple from the red urn? 2. Given that you select red urn, what is the prob of selecting orange? 3. Given that you picked up orange, what is the prob. That it comes from the red urn? 4. What is the overall prob. of selecting apple Apples and Oranges in red and blue urns

Define random variables and probability measures  Y: urns: r,b  X: fruits: o, a 1. P(X= o, Y=r) 2. P(Y=r/ X=a) 3. P(X=o/Y=b) 4. P(Y=b) 5. P(X=o)

Make experiment infinitely many times and count Marginal Probability Conditional Probability Joint Probability

Measuring the Probability Sum Rule Product Rule

The Rules of Probability Sum Rule Product Rule

Marginal and conditional probabilities

Transformed Densities: Given p(x), what is p(y), where, x =g(y)

Bayes’ Theorem posterior  likelihood × prior

Probability Density and Probability Distribution

Expectations Conditional Expectation (discrete) Approximate Expectation (discrete and continuous)

Variances and Covariances

1. BASİC CONCEPTS 20 A class is a set of objects having some important properties in common A feature extractor is a program that inputs the data (image) and extracts features that can be used in classification. A classifier is a program that inputs the feature vector and assigns it to one of a set of designated classes or to the “reject” class.

Feature Vector Representation  X=[x1, x2, …, xn], each xj a real number  xj may be geometrical measurements  Chain code  xj may be count of object parts  Example: object rep. [#holes, #strokes, moments, …] 21

Possible Features for char rec. 22

Modeling the source  Functions f(x, K) perform some computation on feature vector x  Knowledge K from training or programming is used  Final stage determines class 23

Two Schools of Thought 24 Statistical Pattern Recognition: Express the image class by a random variable corresponding to class features and model the class by finding the probability density fonction Structural Pattern Recognition Express the image class by a set of picture primitives. Model the class by finding the relationship among the primitives using grammars or a graphs

Classifiers often used in CV 25 Nearest Mean Classifier Nearest neighbor classifier Bayesian Classifiers Decision Tree Classifiers Artificial Neural Net Classifiers Support Vector Machines

Example: Classify the Fish 26

Design of classifier  Binarize  Find conneced components  Extract features:Measure length and width  Design a learning method  Design a decision strategy 27

1. Classification using nearest class mean  Compute the Euclidean distance between feature vector X and the mean of each class.  Choose closest class, if close enough (reject otherwise) 28

Nearest mean might yield poor results with complex structure  Class 2 has two modes; where is its mean? Ex: bears: polar bears, brawn bears  But if modes are detected, two subclass mean vectors can be used 29

Scaling coordinates by Covariance Matrix  Define Mahalanobis Distance between two vectors   x-x c  = [ x-x c ] T C -1 [x-x c ]  Where covariance matrix  C = 1/N {  (x i – x c ) T (x i – x c )} 30

2. Nearest Neighbor Classification –The k – nearest-neighbor rule Goal: Classify x by assigning it the label most frequently represented among the k nearest samples and use a voting scheme

Pattern Classification, Chapter 4 (Part 2)32

Pattern Classification, Chapter 4 (Part 2) 33

Bayes Decision Making Thomas Bayes Goal: Given the training data compute max P(wi/x) i

Bayesian decision-making 35

Normal distribution  0 mean and unit std deviation  Table enables us to fit histograms and represent them simply  New observation of variable x can then be translated into probability Stockman CSE803 Fall

Parametric Models can be used Stockman CSE803 Fall

The Gaussian Distribution

Gaussian Mean and Variance

The Multivariate Gaussian

Cherry with bruise  Intensities at about 750 nanometers wavelength  Some overlap caused by cherry surface turning away Stockman CSE803 Fall

Information Theory:Claude Shannon Goal: Find the amount of information carried by a specific value of a r.v. Need something intuitive.

Information Theory: C. Shannon  Information: giving form or shape to the mind  Assumptions:  Source Receiver Information is the quality of a messagemessage it may be a truth or a lie, if the amount of information in the received message increases, the message is more accurate. Need a common alphabet to communucate message

Quantification of information.information  Given r.v. X and p(x),  what is the amount of information when we receive an outcome of x?  Self Information h(x)= -log p (x)  Low probability Surprise High info  Base e: nats  Base 2: bits

Entropy: Average of self information needed to specify the state of a random variable. 45 Given a set of training vectors S, if there are c classes, Entropy(S) =  -pi log (pi) Where pi is the proportion of category i examples in S. i=1 c 2 If all examples belong to the same category, the entropy is 0. If the examples are equally mixed (1/c examples of each class), the entropy is a maximum at 1.0. e.g. for c=2, -.5 log log.5 = -.5(-1) -.5(-1) = 1 22

Entropy: Why does entropy measures information? İt makes sense intuitively “Nobody knows what entropy really is, so in any discussion you will always have an advan tage". Von Neumann

Entropy

2. Structural Techniques: Goal: Represent the classes by graphs 49

Training the system: Given the training data  Step 1: Process the image –Remove noise –binarize –Find skeleton –Normalize the size  Step 2: Extract features –Side, lake, bay  Step 3: Represent the character by a graph 50

3. Represent the character by a graph: G(N, A) 51

 Training: represent each class by a graph  Recognition: Use graph similarities to assign a label 52

Decision Trees: 53 #holes L-W ratio #strokes best axis direction #strokes - / 1 x w 0 A 8 B < t  t

Binary decision tree 54

Entropy-Based Automatic Decision Tree Construction 55 Node 1 What feature should be used? What values? Training Set S x1=(f11,f12,…f1m) x2=(f21,f22, f2m). xn=(fn1,f22, f2m) Quinlan suggested information gain in his ID3 system and later the gain ratio, both based on entropy.

Information Gain 56 The information gain of an attribute A is the expected reduction in entropy caused by partitioning on this attribute. Gain(S,A) = Entropy(S) -  Entropy(Sv) v  Values(A) |Sv| |S| where Sv is the subset of S for which attribute A has value v. Choose the attribute A that gives the maximum information gain.

Information Gain (cont) 57 Attribute A v1vk v2 Set S repeat recursively Information gain has the disadvantage that it prefers attributes with large number of values that split the data into small, pure subsets. S={s  S | value(A)=v1}

Gain Ratio 58 Gain ratio is an alternative metric from Quinlan’s 1986 paper and used in the popular C4.5 package (free!). GainRatio(S,A) = Gain(S,a) SplitInfo(S,A) SplitInfo(S,A) =  log |Si| |S| |Si| |S| where Si is the subset of S in which attribute A has its ith value. 2 i=1 ni SplitInfo measures the amount of information provided by an attribute that is not specific to the category.

Information Content 59 Note: A related method of decision tree construction using a measure called Information Content is given in the text, with full numeric example of its use.

Artificial Neural Nets 60 Artificial Neural Nets (ANNs) are networks of artificial neuron nodes, each of which computes a simple function. An ANN has an input layer, an output layer, and “hidden” layers of nodes Inputs Outputs

Node Functions 61 x1x2xjxnx1x2xjxn output output = g (  xj * w(j,i) ) Function g is commonly a step function, sign function, or sigmoid function (see text). neuron i w(1,i) w(j,i)

62

63

Neural Net Learning 64 That’s beyond the scope of this text; only simple feed-forward learning is covered. The most common method is called back propagation. There are software packages available. What do you use?

How to Measure PERFORMANCE  Classification rate: Count the number of correctly classified sample in wi, ci, find the frequencey classification rate=ci/N 65

2-Class problem Estimated Class 1 (var)Estimated Class 1 (yok) True Class 1 (var)True hit, Correct Detection False dismissal, false negative True Class 2 (yok)False positive False alarm, True dismissal 66

Receiver Operating Curve ROC  Plots correct detection rate versus false alarm rate  Generally, false alarms go up with attempts to detect higher percentages of known objects 67

Presicion- Recall in CBIR or DR 68

Confusion matrix shows empirical performance 69