McCulloch–Pitts Neuronal Model : Concept Map Pattern Recognition : Feature Representation x Decision Function d(x) Geometric Interpretation : Decision Boundary (Surface) Neural Network Conventional : Statistical Formulation - Bayes : Optimal Syntactic Approach McCulloch–Pitts Neuronal Model : Threshold Logic Sigmoid Neuronal Model : Chap 3 Single Layer Network Multilayer Network Digital Logic → Can represent any region in space Geometric interpretation
Chapter 1. Pattern Recognition and Neural Networks Two Objectives Class Recognition : Image -> Apple Attribute Recognition – Image -> Color, Shape, Taste, etc. Ex. Color (attribute) of apple is red. (1) Approaches Statistical a. Template Matching P ( ω x ) b. Statistical 2 P ( ω x ) c. Syntactic 1 d. Neural Network x Class 1 Class 2 Bayes Optimal Decision Boundary Ex. x = temperature, ω1: healthy, ω2 : sick. x = (height, weight), ω1: female, ω2 : male.
Training Data = Labelled Input/Output Data (2) Procedure - Train and Generalize Raw x d ( x ) Preprocessing Feature Extraction Discriminant Decision making Class Data Eliminate bad data (outliers) Filter out noise For data reduction, better separation Training Data = Labelled Input/Output Data = { x | d(x) is known }
x1 = temp, x2 = pressure decision boundary = ( n-1) dim . = line, (3) Decision ( Discriminant ) Function a. 2-Class Weather Forecast Problem n = 2, M = 2 x 2 w 2 w d ( x ) = decision boundary = 1 = ( n-1) dim . = line, plane, hyperplane. x1 = temp, x2 = pressure x 1 unnormalized normalized w1 w2 w3
is a unit normal to the Hyperplane. In general, x x w T w D D > x w T D < x w T T < x w w T x = D is a unit normal to the Hyperplane. w T x =
Pairwise Separable - Discriminants Req. b. Case of M = 3 , n = 2 – requires 3 discriminants Pairwise Separable - Discriminants Req. + -
Linear classifier Machine Max Linear classifier Machine
d 3 w w 1 w w 3 3 1 IR d2 w d = 2 23 d = d w 1 1 2
j 2. PR – Neural Network Representation (1) Models of a Neuron A. McCulloch-Pitts Neuron – threshold logic with fixed weights 1 x x w 1 1 -q = bias 1 x w 2 j x S 2 y = j ( u` ) 2 M u q M x w p p Adaptive Linear Combiner Nonlinear Activation Function x p (Adaline)
? B. Generalized Model w w -w -w Half Plane Detector x bias q - x x 1 1 q - w 2 + x 2 x -w bias 1 1 q ? -w 2 x 2 j j j One-Sided Ramp Logistic Hard Limiter, Threshold Logic Binary Piecewise Linear j j Signum, Threshold Logic Ramp j tanh Two-Sided Bipolar
(2) Boolean Function Representation Examples x , x = binary (0, 1) x 1 2 x 1 1 1.5 AND 0.5 OR x x 2 2 x -1 x 1 -1 1 -1.5 NAND -1 -0.5` NOR -1 x x 2 2 1 Excitatory 1 -1 x -0.5 INVERTER 1 0.5 MEMORY -1 Inhibitory Cf. x , x may be bipolar ( -1, 1) → Different Bias will be needed above. 1 2
(3) Geometrical Interpretation A. Single Layer with Single Output = Fire within a Half Plane for a 2-Class Case B. Single Layer Multiple Outputs – for a Multi-Class Case w w w 1 2 3 1 0 0 1 x 0 1 0 2 x 0 0 1
Other weights needed for Binary Representation. C. Multilayer with Single Output - XOR Linearly Nonseparable Nonlinearly Separable → Bipolar Inputs, Other weights needed for Binary Representation. x OFF 1 3 ON 1 x 2 x x -5 1 2
a. Successive Transforms 1 x 2 XOR -1.5 0.5 1.5 - 1 AND - 1 OR NAND x 0.5 1 1 0.5 XOR 1 x 0.5 2 - 1 OR x x j ( x x ) x - 2 1 x =0.5 x + - 1 - 2 1 2 2 ) ( 2 1 x + - j x x x - + 1 1 2 2 1 x =0.5 x -
x b. XOR = OR ─ (1,1) AND x c. Parity x x 1-bit parity x x = XOR x 1 1 -2 1.5 0.5 1 x 1 2 c. Parity 1 x 0.5 x 0.5 1 1 1 0.5 1-bit parity 1 1 1 x 1.5 -1 x 1.5 2 1 = XOR 2 -1 0.5 n-bit parity (-1) n+1 x n-0.5 n
D. Multilayer with Single Output – Analog Inputs (1/2) OR OR 2 2 2 2 3 3 1 1 AND 2 2 3 3
E. Multilayer Single Output – Analog Inputs – (2/2) 1 2 5 AND 1 2 4 3 6 5 OR 4 3 6 1 1 2 AND 2 3 3 OR 4 4 5 5 AND 6 6
F.
MLP Decision Boundaries XOR Interwined General 1-layer: Half planes A B 2-layer: Convex A B 3-layer: Arbitrary 2019-02-24
Transform of NN from ① to ② : See how the weights are changed. Exercise : Transform of NN from ① to ② : See how the weights are changed. 1 ① 2 ② 2 1 ① ② 1 2 3 1 1 ① 2 ②
Questions from Students -05 How to learn the weights ? Any analytic and systematic methodology to find the weights ? Why do we use polygons to represent the active regions [1 output] ? Why should di(x) be the maximum for the Class i ?