April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 1 Now let us talk about… Neural Network Application Design
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 2 NN Application Design Now that we got some insight into the theory of artificial neural networks, how can we design networks for particular applications? Designing NNs is basically an engineering task. As we discussed before, for example, there is no formula that would allow you to determine the optimal number of hidden units in a BPN for a given task.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 3 NN Application Design We need to address the following issues for a successful application design: Choosing an appropriate data representation Choosing an appropriate data representation Performing an exemplar analysis Performing an exemplar analysis Training the network and evaluating its performance Training the network and evaluating its performance We are now going to look into each of these topics.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 4 Data Representation Most networks process information in the form of input pattern vectors.Most networks process information in the form of input pattern vectors. These networks produce output pattern vectors that are interpreted by the embedding application.These networks produce output pattern vectors that are interpreted by the embedding application. All networks process one of two types of signal components: analog (continuously variable) signals or discrete (quantized) signals.All networks process one of two types of signal components: analog (continuously variable) signals or discrete (quantized) signals. In both cases, signals have a finite amplitude; their amplitude has a minimum and a maximum value.In both cases, signals have a finite amplitude; their amplitude has a minimum and a maximum value.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 5 Data Representation The main question is: How can we appropriately capture these signals and represent them as pattern vectors that we can feed into the network? We should aim for a data representation scheme that maximizes the ability of the network to detect (and respond to) relevant features in the input pattern. Relevant features are those that enable the network to generate the desired output pattern.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 6 Data Representation Similarly, we also need to define a set of desired outputs that the network can actually produce. Often, a “natural” representation of the output data turns out to be impossible for the network to produce. We are going to consider internal representation and external interpretation issues as well as specific methods for creating appropriate representations.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 7 Internal Representation Issues As we said before, in all network types, the amplitude of input signals and internal signals is limited: analog networks: values usually between 0 and 1 analog networks: values usually between 0 and 1 binary networks: only values 0 and 1allowed binary networks: only values 0 and 1allowed bipolar networks: only values –1 and 1allowed bipolar networks: only values –1 and 1allowed Without this limitation, patterns with large amplitudes would dominate the network’s behavior. A disproportionately large input signal can activate a neuron even if the relevant connection weight is very small.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 8 Creating Data Representations The patterns that can be represented by an ANN most easily are binary patterns. Even analog networks “like” to receive and produce binary patterns – we can simply round values < 0.5 to 0 and values 0.5 to 1. To create a binary input vector, we can simply list all features that are relevant to the current task. Each component of our binary vector indicates whether one particular feature is present (1) or absent (0).
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 9 Creating Data Representations With regard to output patterns, most binary-data applications perform classification of their inputs. The output of such a network indicates to which class of patterns the current input belongs. Usually, each output neuron is associated with one class of patterns. For any input, only one output neuron should be active (1) and the others inactive (0), indicating the class of the current input.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 10 Creating Data Representations In other cases, classes are not mutually exclusive, and more than one output neuron can be active at the same time. Another variant would be the use of binary input patterns and analog output patterns for “classification”. In that case, again, each output neuron corresponds to one particular class, and its activation indicates the probability (between 0 and 1) that the current input belongs to that class.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 11 Creating Data Representations For non-binary (e.g., tertiary) features: Use multiple binary inputs to represent non-binary states (e.g., 001 for “red”, 010 for “green”, 100 for “blue” for representing three possible colors).Use multiple binary inputs to represent non-binary states (e.g., 001 for “red”, 010 for “green”, 100 for “blue” for representing three possible colors). Treat each feature in the pattern as an individual subpattern.Treat each feature in the pattern as an individual subpattern. Represent each subpattern with as many positions (units) in the pattern vector as there are possible states for the feature.Represent each subpattern with as many positions (units) in the pattern vector as there are possible states for the feature. Then concatenate all subpatterns into one long pattern vector.Then concatenate all subpatterns into one long pattern vector.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 12 Creating Data Representations Another way of representing n-ary data in a neural network is using one neuron per feature, but scaling the (analog) value to indicate the degree to which a feature is present. Good examples: the brightness of a pixel in an input image the brightness of a pixel in an input image the output of an edge filter the output of an edge filter Poor examples: the letter (1 – 26) of a word the letter (1 – 26) of a word the type (1 – 6) of a chess piece the type (1 – 6) of a chess piece
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 13 Creating Data Representations This can be explained as follows: The way NNs work (both biological and artificial ones) is that each neuron represents the presence/absence of a particular feature. Activations 0 and 1 indicate absence or presence of that feature, respectively, and in analog networks, intermediate values indicate the extent to which a feature is present. Consequently, a small change in one input value leads to only a small change in the network’s activation pattern.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 14 Creating Data Representations Therefore, it is appropriate to represent a non-binary feature by a single analog input value only if this value is scaled, i.e., it represents the degree to which a feature is present. This is the case for the brightness of a pixel or the output of an edge detector. It is not the case for letters or chess pieces. For example, assigning values to individual letters (a = 0, b = 0.04, c = 0.08, …, z = 1) implies that a and b are in some way more similar to each other than are a and z. Obviously, in most contexts, this is not a reasonable assumption.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 15 Creating Data Representations It is also important to notice that, in artificial (not natural!), completely connected networks the order of features that you specify for your input vectors does not influence the outcome. For the network performance, it is not necessary to represent, for example, similar features in neighboring input units. All units are treated equally; neighborhood of two neurons does not imply to the network that these represent similar features. Of course once you specified a particular order, you cannot change it any more during training or testing.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 16 Exemplar Analysis When building a neural network application, we must make sure that we choose an appropriate set of exemplars (training data): The entire problem space must be covered. The entire problem space must be covered. There must be no inconsistencies (contradictions) in the data. There must be no inconsistencies (contradictions) in the data. We must be able to correct such problems without compromising the effectiveness of the network. We must be able to correct such problems without compromising the effectiveness of the network.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 17 Ensuring Coverage For many applications, we do not just want our network to classify any kind of possible input. Instead, we want our network to recognize whether an input belongs to any of the given classes or it is “garbage” that cannot be classified. To achieve this, we train our network with both “classifiable” and “garbage” data (null patterns). For the the null patterns, the network is supposed to produce a zero output, or a designated “null neuron” is activated.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 18 Ensuring Coverage In many cases, we use a 1:1 ratio for this training, that is, we use as many null patterns as there are actual data samples. We have to make sure that all of these exemplars taken together cover the entire input space. If it is certain that the network will never be presented with “garbage” data, then we do not need to use null patterns for training.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 19 Ensuring Consistency Sometimes there may be conflicting exemplars in our training set. A conflict occurs when two or more identical input patterns are associated with different outputs. Why is this problematic?
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 20 Ensuring Consistency Assume a BPN with a training set including the exemplars (a, b) and (a, c). Whenever the exemplar (a, b) is chosen, the network adjust its weights to present an output for a that is closer to b. Whenever (a, c) is chosen, the network changes its weights for an output closer to c, thereby “unlearning” the adaptation for (a, b). In the end, the network will associate input a with an output that is “between” b and c, but is neither exactly b or c, so the network error caused by these exemplars will not decrease. For many applications, this is undesirable.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 21 Ensuring Consistency To identify such conflicts, we can apply a (binary) search algorithm to our set of exemplars. How can we resolve an identified conflict? Of course, the easiest way is to eliminate the conflicting exemplars from the training set. However, this reduces the amount of training data that is given to the network. Eliminating exemplars is the best way to go if it is found that these exemplars represent invalid data, for example, inaccurate measurements. In general, however, other methods of conflict resolution are preferable.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 22 Ensuring Consistency Another method combines the conflicting patterns. For example, if we have exemplars (0011, 0101), (0011, 0010), we can replace them with the following single exemplar: (0011, 0111). The way we compute the output vector of the new exemplar based on the two original output vectors depends on the current task. It should be the value that is most “similar” (in terms of the external interpretation) to the original two values.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 23 Ensuring Consistency Alternatively, we can alter the representation scheme. Let us assume that the conflicting measurements were taken at different times or places. In that case, we can just expand all the input vectors, and the additional values specify the time or place of measurement. For example, the exemplars (0011, 0101), (0011, 0010) could be replaced by the following ones: (100011, 0101), (010011, 0010).
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 24 Ensuring Consistency One advantage of altering the representation scheme is that this method cannot create any new conflicts. Expanding the input vectors cannot make two or more of them identical if they were not identical before.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 25 Training and Performance Evaluation A more insightful way of performance evaluation is partial-set training. The idea is to split the available data into two sets – the training set and the test set. The network’s performance on the second set indicates how well the network has actually learned the desired mapping. We should expect the network to interpolate, but not extrapolate. Therefore, this test also evaluates our choice of training samples.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 26 Training and Performance Evaluation If the test set only contains one exemplar, this type of training is called “hold-one-out” training. It is to be performed sequentially for every individual exemplar. This, of course, is a very time-consuming process. A less extreme version of hold-one-out training is cross validation, in which we split the dataset into n subsets. Each subset serves as the test set once, with the other (n – 1) subsets forming the training set. This means that n training processes are performed; this is referred to as n-fold cross validation.