Networks and N-dimensions
When to start? As we have seen, there is a continuous pattern of interest in network-style analysis, starting at least as early as McCulloch and Pitts. The major step in the 1950s was made by Frank Rosenblatt, who invented the perceptron, a formal device which learned using reinforcement.
1958 paper: "A relatively small number of theorists...have been concerned with the problems of how an imperfect neural network, containing many random connections, can be made to perform reliably those functions which might be represented by idealized wiring diagrams...
Unfortunately, the language of symbolic logic and Boolean algebra is less well suited for such investigaitons. The need for a suitable language for the mathematical analysis of events in systems where only the gross organization can be characterized, and the precise structure is unknown, has led the author to formulate the current model in terms of probability theory rather than symbolic logic."
One simple version of the perceptron had three components: S-units (sensory); A-units (association); and a Response unit. The S-units form a retina (so to speak). Each A-unit is connected to all S-units, and its input is the sum of the S-units' activation, weighted by the connection between the S-unit and the A-unit:
Input tothe i th A unit = w ij x j (summing over j) If this is over the i th unit's threshold, then the unit turns on; else, it's off. Similar thing for the R-unit. Perceptron learning scheme:
Some beautiful graphics from the ISIS/Univ of Southampton on the web
Rosenblatt 1962 showed that any category that was linearly separable could be learned by a perceptron in a finite number of learning steps, but Minsky and Papert 1969 showed severe limitations on what a perceptron could learn. But in the years that followed, it was discovered that by making the hidden units respond continuously, not in an on/off-threshold fashion, these limitations fell away (back-propagation of error).
linear separability
The term perceptron refers, in the narrow sense, to the unit that Rosenblatt studied, with a set of unchanging, random connections between the first two layers. The more general concept -- sometimes called a Linear Threshold Unit -- is the second half of the perceptron. It can be thought of as a vector -- a seqence of n real numbers (between -1 and 1)...
It can be thought of as a vector -- a seqence of n real numbers (between -1 and 1) -- called jointly the weights -- which has an effect on the n input units: The i th input unit is multiplied by the i th real number of the weights; then you add all these numbers together; if it exceeds the response unit's threshold, the response unit starts beeping (so to speak) -- it responds.
That's what's normally written: [Input vector]. [weights] = the sum of input (n) x weight (n).
If there are two input units, the input units' activity can be represented on a 2- dimensional graph, and the region in which the the response unit is on is bounded by a straight line, whose slope is -w 1 / w If there are 3 dimensions, the domain is bounded by a plane surface in 3-space; in general, the region is bounded by an n- 1 surface in n+1-space.
If the distinction that you care about can be viewed as a distinction between regions that can be separated by an n- plane in n+1space, then it's called linearly separable, and a perceptron will handle it just fine. Conjecture: all morphological syncretism is composed of linearly separable arrays.
Perceptron learning: New weight of a connection from i to j = old weight of that connection + n * (actual output - desired output) * (i's activation) where n marks the "local" speed of learning.