Artificial Neural Networks … let us move on to… Artificial Neural Networks November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Computers vs. Neural Networks “Standard” Computers Neural Networks one CPU / few processing highly parallel cores processing fast processing units slow processing units reliable units unreliable units static infrastructure dynamic infrastructure November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Why Artificial Neural Networks? There are two basic reasons why we are interested in building artificial neural networks (ANNs): Technical viewpoint: Some problems such as character recognition or the prediction of future states of a system require massively parallel and adaptive processing. Biological viewpoint: ANNs can be used to replicate and simulate components of the human (or animal) brain, thereby giving us insight into natural information processing. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Why Artificial Neural Networks? Why do we need another paradigm than symbolic AI for building “intelligent” machines? Symbolic AI is well-suited for representing explicit knowledge that can be appropriately formalized. However, learning in biological systems is mostly implicit – it is an adaptation process based on uncertain information and reasoning. ANNs are inherently parallel and work extremely efficiently if implemented in parallel hardware. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
How do NNs and ANNs work? The “building blocks” of neural networks are the neurons. In technical systems, we also refer to them as units or nodes. Basically, each neuron receives input from many other neurons, changes its internal state (activation) based on the current input, sends one output signal to many other neurons, possibly including its input neurons (recurrent network) November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
How do NNs and ANNs work? Information is transmitted as a series of electric impulses, so-called spikes. The frequency and phase of these spikes encodes the information. In biological systems, one neuron can be connected to as many as 10,000 other neurons. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Structure of NNs (and some ANNs) In biological systems, neurons of similar functionality are usually organized in separate areas (or layers). Often, there is a hierarchy of interconnected layers with the lowest layer receiving sensory input and neurons in higher layers computing more complex functions. For example, neurons in macaque visual cortex have been identified that are activated only when there is a face (monkey, human, or drawing) in the macaque’s visual field. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Blue: motion perception pathway “Data Flow Diagram” of Visual Areas in Macaque Brain Blue: motion perception pathway Green: object recognition pathway November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Stimuli in receptive field of neuron November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Structure of NNs (and some ANNs) In a hierarchy of neural areas such as the visual system, those at the bottom (near the sensory “input” neurons) only “see” local information. For example, each neuron in primary visual cortex only receives input from a small area (1 in diameter) of the visual field (called their receptive field). As we move towards higher areas, the responses of neurons become less and less location dependent. In inferotemporal cortex, some neurons respond to face stimuli shown at any position in the visual field. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Receptive Fields in Hierarchical Neural Networks neuron A Complexity tells us to use pyramids - but new problem... Routing is one of those problems that was little considered before Anderson & Van Essen pointed it out (Koch & Ullman did say selected features routed to high level, but not how). They used shifts in space (a giant multiplexor) controlled by outside signals to route a signal from anywhere int he input field to the appropriate portions of the output layer. Claimed that stereo,motion can also be handled analogously. Never implemented or tested the idea; in Olshausen’s thesis. connectivity was prohibitive. ‘Patchy’ connectivity solved this. Burt used a Laplacian pyramid: each level is determined as the difference of the previous two levels of the Guassian pyramid. Fovea imposed on this with 1/theta acuity solves routing. Only attended signal reaches output (rest is discarded at each level) Key to my routing is observation that stimulus feeds a forward cone and that any output ayer unit is root of a sub-pyramid. Potential for cross-talk is large Single attentional fixation at a time - can’t route two without interference! receptive field of A November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Receptive Fields in Hierarchical Neural Networks neuron B in top layer Complexity tells us to use pyramids - but new problem... Routing is one of those problems that was little considered before Anderson & Van Essen pointed it out (Koch & Ullman did say selected features routed to high level, but not how). They used shifts in space (a giant multiplexor) controlled by outside signals to route a signal from anywhere int he input field to the appropriate portions of the output layer. Claimed that stereo,motion can also be handled analogously. Never implemented or tested the idea; in Olshausen’s thesis. connectivity was prohibitive. ‘Patchy’ connectivity solved this. Burt used a Laplacian pyramid: each level is determined as the difference of the previous two levels of the Guassian pyramid. Fovea imposed on this with 1/theta acuity solves routing. Only attended signal reaches output (rest is discarded at each level) Key to my routing is observation that stimulus feeds a forward cone and that any output ayer unit is root of a sub-pyramid. Potential for cross-talk is large Single attentional fixation at a time - can’t route two without interference! receptive field of B in input layer November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
How do NNs and ANNs work? NNs are able to learn by adapting their connectivity patterns so that the organism improves its behavior in terms of reaching certain (evolutionary) goals. The strength of a connection, or whether it is excitatory or inhibitory, depends on the state of a receiving neuron’s synapses. The NN achieves learning by appropriately adapting the states of its synapses. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Pattern and Object Recognition Let us study Artificial Neural Networks and compare them to other approaches in the context of pattern and object recognition. In computer vision, pattern recognition is fundamental for region and object classification. No recognition is possible without knowledge. Specific knowledge about both the objects being processed and hierarchically higher and more general knowledge about object classes is required. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Statistical Pattern Recognition Object recognition is based on assigning classes to objects. The device that does these assignments is called the classifier. The number of classes is usually known beforehand, and typically can be derived from the problem specification. The classifier does not decide about the class from the object itself — rather, sensed object properties called patterns are used. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Statistical Pattern Recognition November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Statistical Pattern Recognition For statistical pattern recognition, quantitative descriptions of objects’ characteristics (features or patterns) are used. The set of all possible patterns forms the pattern space or feature space. The classes form clusters in the feature space, which can be separated by discrimination hyper-surfaces. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Statistical Pattern Recognition November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Statistical Pattern Recognition Note that successful classification requires two components: Computation of discriminative feature vectors that are similar within classes and differ between them, Finding a discrimination function that accurately separates the feature clusters representing individual classes. The better the features that we define, the simpler can be the discrimination function. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Training and Testing When evaluating the performance of a classifier, we cannot test it on the same data with which we trained it. If we did, even a lookup table could yield perfect performance. Instead, we are interested in the classifier’s ability to generalize, i.e., its performance on data that it has never “seen” before. The are three common strategies: Holdout testing, k- fold cross validation, and leave-one-out testing. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Holdout Testing The available set of exemplars is divided into training and test sets. Subsequently, the classifier is trained with the training set once and evaluated with the test set once. Classification performance is calculated as the proportion of correctly classified exemplars in the test set. It is very efficient because it only requires one round of training and testing. However, it “wastes” a lot of exemplars by only using them for testing. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
K-Fold Cross Validation The set of exemplars is divided into k subsets s1, s2, …, sk of approximately equal size. In the first of k training/testing cycles, s1 is chosen as the test set and the union of the remaining subsets is used for training. Similarly, in the second cycle, s2 is the test set and the other subsets form the training set, and so on. The average proportion of correct classifications across the k cycles is taken as the classification performance. The advantage of this method is that we can use a large share of the available exemplars for training. However, we have to run k training/testing cycles. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Leave-One-Out Testing This is an extreme case of k-fold cross-validation. If we have N exemplars, we perform N training/testing cycles, in each of which we train with all exemplars except one and then test performance on only that one exemplar that was left out. In each cycle, we leave out a different exemplar so that after the N cycles, each exemplar was chosen exactly once. The average proportion of correct classifications across the N cycles determines the classification performance. Computationally expensive but useful if only a small set of exemplars is available. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
The MNIST Handwritten Digit Dataset 60,000 28x28-pixel images for training 10,000 28x28-pixel images for testing November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
Types of Classifier We will take a look at three common types of classifier: k-Nearest Neighbor (kNN) classifier, Naïve Bayes (NB) classifier, and Artificial Neural Network (ANN) classifier. These are just popular examples of classifiers that greatly differ from each other. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
The k-Nearest Neighbor Classifier The k-Nearest Neighbor (kNN) classifier is arguably the simplest classifier that can still provide good results. It is a “lazy” learner – it simply stores all training exemplars. When classifying a new feature vector, it simply measures its Euclidean (or other) distance to each of the stored vectors. It finds the k nearest neighbors and determines which class has the most representatives among them. That class is taken as the classification result. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I
The k-Nearest Neighbor Classifier Typical values of k range from 3 to 10. Instead of giving one “vote” to each neighbor, the votes can be weighted by the neighbors proximity to the input vector. This classifier typically performs well if the feature space is low-dimensional and many training exemplars are available. Training is extremely efficient but classification is not, because often the entire set of exemplars needs to be processed for each new classification. November 6, 2018 Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I