Presentation is loading. Please wait.

Presentation is loading. Please wait.

What is Pattern Recognition Recognizing the fish! 1.

Similar presentations


Presentation on theme: "What is Pattern Recognition Recognizing the fish! 1."— Presentation transcript:

1 What is Pattern Recognition Recognizing the fish! 1

2 WHAT İS PATTERN Structures regulated by rules Goal:Represent empirical knowledge in mathematical forms the Mathematics of Perception Need: Algebra, probability theory, graph theory

3 All flows! Heraclitos It is only the invariance, the permanent facts, that enable us to find the meaning in a world of flux. We can only perceive variances Our aim is to find the invariant laws of our varying obserbvations Pattern Recognition

4 Learning vs Recognition  Learning: Find the Mathematical Model of a class, using data  Recognition: Use that model to recognize the unknown object 4

5 Two Schools of Thought 5 Statistical Pattern Recognition: Express the image class by a random variable corresponding to class features and model the class by finding the probability density fonction Structural Pattern Recognition Express the image class by a set of picture primitives. Model the class by finding the relationship among the primitives using grammars or a graphs

6 1. STATISTICAL PR: ASSUMPTION SOURCE: Hypothesis Classes Obljects CHANNEL: Noisy OBSERVATION: Multiple sensor Variations

7 Probability Theory Apples and Oranges

8 Probability Theory Marginal Probability Conditional Probability Joint Probability Red Urn:x1 C1 samples Blue urn:x2 c2 samples Oranges: y161 Apples: y221

9 Probability Theory Sum Rule Product Rule

10 The Rules of Probability Sum Rule Product Rule

11 Bayes’ Theorem posterior  likelihood × prior

12 Probability Densities

13 Expectations Conditional Expectation (discrete) Approximate Expectation (discrete and continuous)

14 Variances and Covariances

15 1. BASİC CONCEPTS 15 A class is a set of objects having some important properties in common A feature extractor is a program that inputs the data (image) and extracts features that can be used in classification. A classifier is a program that inputs the feature vector and assigns it to one of a set of designated classes or to the “reject” class. With what kinds of classes do you work?

16 Feature Vector Representation  X=[x1, x2, …, xn], each xj a real number  xj may be an object measurement  xj may be count of object parts  Example: object rep. [#holes, #strokes, moments, …] 16

17 Possible Features for char rec. 17

18 Modeling the source  Functions f(x, K) perform some computation on feature vector x  Knowledge K from training or programming is used  Final stage determines class 18

19 Two Schools of Thought 19 Statistical Pattern Recognition: Express the image class by a random variable corresponding to class features and model the class by finding the probability density fonction Structural Pattern Recognition Express the image class by a set of picture primitives. Model the class by finding the relationship among the primitives using grammars or a graphs

20 Classifiers often used in CV 20 Nearest Mean Classifier Nearest neighbor classifier Bayesian Classifiers ---------------------------------- Decision Tree Classifiers Artificial Neural Net Classifiers Support Vector Machines

21 1. Classification using nearest class mean  Compute the Euclidean distance between feature vector X and the mean of each class.  Choose closest class, if close enough (reject otherwise) 21

22 Nearest mean might yield poor results with complex structure  Class 2 has two modes; where is its mean?  But if modes are detected, two subclass mean vectors can be used 22

23 Scaling coordinates by Covariance Matrix  Define Mahalanobis Distance between two vectors   x-x c  = [ x-x c ] T C -1 [x-x c ]  Where covariance matrix  C = 1/N {  (x i – x c ) T (x i – x c )} 23

24 2. Nearest Neighbor Classification –The k – nearest-neighbor rule Goal: Classify x by assigning it the label most frequently represented among the k nearest samples and use a voting scheme

25 Pattern Classification, Chapter 4 (Part 2)25

26 Pattern Classification, Chapter 4 (Part 2) 26

27 Bayes Decision Making Thomas Bayes 1701--1761 27 Goal: Given the training data compute max P(wi/x) i

28 Bayesian decision-making 28

29 Bayes 29

30 Normal distribution  0 mean and unit std deviation  Table enables us to fit histograms and represent them simply  New observation of variable x can then be translated into probability Stockman CSE803 Fall 200830

31 Parametric Models can be used Stockman CSE803 Fall 200831

32 Cherry with bruise  Intensities at about 750 nanometers wavelength  Some overlap caused by cherry surface turning away Stockman CSE803 Fall 200832

33 Information Theory:Claude Shannon 1916- 2001 Goal: Find the amount of information carried by a specific value of a r.v. Need something intuitive.

34 Information Theory: C. Shannon  Information: giving form or shape to the mind  Assumptions:  Source Receiver Information is the quality of a messagemessage it may be a truth or a lie, if the amount of information in the received message increases, the message is more accurate. Need a common alphabet to communucate message

35 Quantification of information.information  Given r.v. X and p(x),  what is the amount of information when we receive an outcome of x?  Self Information h(x)= -log p (x)  Low probability Surprise High info  Base e: nats  Base 2: bits

36 Entropy: Average of self information needed to specify the state of a random variable. 36 Given a set of training vectors S, if there are c classes, Entropy(S) =  -pi log (pi) Where pi is the proportion of category i examples in S. i=1 c 2 If all examples belong to the same category, the entropy is 0. If the examples are equally mixed (1/c examples of each class), the entropy is a maximum at 1.0. e.g. for c=2, -.5 log.5 -.5 log.5 = -.5(-1) -.5(-1) = 1 22

37 Entropy: Why does entropy measures information? İt makes sense intuitively “Nobody knows what entropy really is, so in any discussion you will always have an advan tage". Von Neumann

38 Entropy

39

40 2. Structural Techniques: Goal: Represent the classes by graphs 40

41 Training the system: Given the training data  Step 1: Process the image –Remove noise –binarize –Find skeleton –Normalize the size  Step 2: Extract features –Side, lake, bay  Step 3: Represent the character by a graph 41

42 3. Represent the character by a graph: G(N, A) 42

43  Training: represent each class by a graph  Recognition: Use graph similarities to assign a label 43

44 Decision Trees: 44 #holes L-W ratio #strokes best axis direction #strokes - / 1 x w 0 A 8 B 0 1 2 < t  t 2 4 01 0 60 90 0 1

45 Binary decision tree 45

46 Entropy-Based Automatic Decision Tree Construction 46 Node 1 What feature should be used? What values? Training Set S x1=(f11,f12,…f1m) x2=(f21,f22, f2m). xn=(fn1,f22, f2m) Quinlan suggested information gain in his ID3 system and later the gain ratio, both based on entropy.

47 Information Gain 47 The information gain of an attribute A is the expected reduction in entropy caused by partitioning on this attribute. Gain(S,A) = Entropy(S) -  ----- Entropy(Sv) v  Values(A) |Sv| |S| where Sv is the subset of S for which attribute A has value v. Choose the attribute A that gives the maximum information gain.

48 Information Gain (cont) 48 Attribute A v1vk v2 Set S repeat recursively Information gain has the disadvantage that it prefers attributes with large number of values that split the data into small, pure subsets. S={s  S | value(A)=v1}

49 Gain Ratio 49 Gain ratio is an alternative metric from Quinlan’s 1986 paper and used in the popular C4.5 package (free!). GainRatio(S,A) = ------------------ Gain(S,a) SplitInfo(S,A) SplitInfo(S,A) =  - ----- log ------ |Si| |S| |Si| |S| where Si is the subset of S in which attribute A has its ith value. 2 i=1 ni SplitInfo measures the amount of information provided by an attribute that is not specific to the category.

50 Information Content 50 Note: A related method of decision tree construction using a measure called Information Content is given in the text, with full numeric example of its use.

51 Artificial Neural Nets 51 Artificial Neural Nets (ANNs) are networks of artificial neuron nodes, each of which computes a simple function. An ANN has an input layer, an output layer, and “hidden” layers of nodes............. Inputs Outputs

52 Node Functions 52 x1x2xjxnx1x2xjxn output output = g (  xj * w(j,i) ) Function g is commonly a step function, sign function, or sigmoid function (see text). neuron i w(1,i) w(j,i)

53 53

54 54

55 Neural Net Learning 55 That’s beyond the scope of this text; only simple feed-forward learning is covered. The most common method is called back propagation. There are software packages available. What do you use?

56 How to Measure PERFORMANCE  Classification rate: Count the number of correctly classified sample in wi, ci, find the frequencey classification rate=ci/N 56

57 2-Class problem Estimated Class 1 (var)Estimated Class 1 (yok) True Class 1 (var)True hit, Correct Detection False dismissal, false negative True Class 2 (yok)False positive False alarm, True dismissal 57

58 Receiver Operating Curve ROC  Plots correct detection rate versus false alarm rate  Generally, false alarms go up with attempts to detect higher percentages of known objects 58

59 Presicion- Recall in CBIR or DR 59

60 Confusion matrix shows empirical performance 60


Download ppt "What is Pattern Recognition Recognizing the fish! 1."

Similar presentations


Ads by Google