Connectionist Modelling Summer School Lecture Three
Using an Error Signal Orthogonality Constraint Number of patterns limited by dimensionality of network. Input patterns must be orthogonal to each other Similarity effects. Perceptron Convergence Rule Learning in a single weight network Assume a teacher signal t out Adaptation of Connection and Threshold (Rosenblatt 1958) Note that threshold always changes if incorrect output. Blame is apportioned to a connection in proportion to the activity of the input line. x y z Input Neurons Output Neurons w a in a out
Using an Error Signal Perceptron Convergence Rule “ The perceptron convergence rule guarantees to find a solution to a mapping problem, provided a solution exists.” (Minsky & Papert 1969 ) An Example of Perceptron Learning Boolean Or Training the network InputOutput a out w 20 w 21 InOutW 20 W 21 θ a out δ ΔθΔθ ΔwΔw
Gradient Descent Least Mean Square Error (LMS) Define the error measure as the square of the discrepancy between the actual output and the desired output. (Widrow-Hoff 1960) Plot an error curve for a single weight network Make weight adjustments by performing gradient descent – always move down the slope. Calculating the Error Signal Note that Perceptron Convergence and LMS use similar learning algorithms – the Delta Rule Error Landscapes Gradient descent algorithms adapt by moving downhill in a multi- dimensional landscape – the error surface. Ball bearing analogy. In a smooth landscape, the bottom will always be reached. However, bottom may not correspond to zero error. Weight Value Error
Past Tense Revisited Vocabulary Discontinuity –Up to 10 epochs – 8 irregulars + 2 regulars. Thereafter – 420 verbs – mostly regular. –Justification: Irregulars are more frequent than regulars Lack of Evidence –Vocabulary spurt at 2 years whereas overregularizations occur at 3 years. Furthermore, vocabulary spurt consists mostly of nouns. –Pinker and Prince (1988) show that regulars and irregulars are relatively balanced in early productive vocabularies
Longitudinal evidence Stages or phases in development? –Initial error-free performance. –Protracted period of overregularisation but at low rates (typically < 10%). –Gradual recovery from error. –Rate of overregularisation is much less the rate of regularisation of regular verbs. 1992
Longitudinal evidence Error Characteristics –High frequency irregulars are robust to overregularisation. –Some errors seem to be phonologically conditioned. –Irregularisations.
Single system account Multi-layered Perceptrons –Hidden unit representation –Error correction technique –Plunkett & Marchman 1991 –Type/Token distinction –Continuous training set
Single system account Incremental Vocabularies –Plunkett & Marchman (1993) –Initial small training set –Gradual expansion Overregularisation –Initial error-free performance. –Protracted period of overregularisation but at low rates (typically < 5%). –High frequency irregulars are robust to overregularisation.