The Appeal of Parallel Distributed Processing J.L. McClelland, D.E. Rumelhart, and G.E. Hinton 인지과학 협동과정 강소영
Contents 1. Introduction 2. Parallel Distributed Processing 3. Examples of PDP Models 4. Representation and Learning In PDP Models 5. Origins of Parallel Distributed Processing
1. Introduction Multiple Simultaneous Constraints Reaching and Grasping The Mutual Influence of Syntax and Semantics Simultaneous Mutual Constraints in Word Recognition Understanding the Interplay of Multiple Sources of Knowledge
The Mutual Influence of Syntax and Semantics Syntactic constraint The boy the man chased kissed the girl Semantic constraint I saw the grand canyon flying to New York I saw the sheep grazing in the field Mutual constraint within each of these domains I like the joke I like the drive I like to joke I like to drive
Simultaneous Mutual Constraints in Word Recognition Selfridge’s example Paradox: How can we get the process started? Solution: our perceptual system is capable of exploring all these possibilities without committing itself to one until all of the constraints are taken into account
Understanding the Interplay of Multiple Sources of Knowledge Knowledge Structure scripts (Schank 1976) frames (Minsky 1975) schemata (Norman and Bobrow 1976; Rumelhart 1975) Most everyday situations cannot be rigidly assigned to just a single script Interplay between a number of different sources of information ex) birthday party at a restaurant The generative capacity of human understanding in novel situations --> interact with each other
2. Parallel Distributed Processing Properties of the tasks that people are good at. A number of different pieces of information must be kept in mind at once. Each plays a part, constraining others and being constrained by them Assumption of PDP model: interactions of a large number of simple processing elements each sending excitatory and inhibitory signals to other units. Elements of model unit, activation, interaction among units
PDP Models: Cognitive Science or Neuroscience The appeal of PDP : Computationally sufficient and psychologically accurate mechanistic accounts of the phenomena of human cognition PDP models have radically altered the way we think about the time course of processing the nature of representation the mechanisms of learning
Microstructure of Cognition Parallel Distributed Model offer alternatives to serial models of the microstructure of cognition. They do not deny that there is a macrostructure Objects referred to in macrostructural models of cognitive processing are seen as approximate descriptions of emergent properties of the microstructure
3. Examples of PDP Models Recent application of PDP Motor control, perception, memory, language PDP mechanisms are used to provide natural accounts of the exploitation of multiple, simultaneous, mutual constraint
3.1 Motor Control Hinton’s stick person Two constraints on the task the tip of the forearm must touch the object center of gravity over the foot Each processor receives two information how far the tip of the hand was from the target where the center of gravity was with respect to the foot Combination of joint angles
3.2 Perception Stereoscopic Vision Random Dot Stereogram --> Depth Perception Perceptual Completion of Familiar Patterns Completion of Novel Patterns
Marr and Poggio (1976) explain the perception of depth in random-dot stereograms Two general principles about the visual world Stereoscopic Vision
Perceptual Completion of Familiar Patterns Perception is influenced by familiarity Less time ambiguous lower level information to fill in missing lower-level information phonemic restoration effect visual perception of words (McClelland and Rumelhart 1981) Assumption of model detectors for the visual features
Two hypotheses or activation mutually consistant support each other mutually inconsistant weaken each other two kinds of inconsistency between-level inconsistency between-level inhibition mutual exclusion competitive inhibition
Completion of Novel Patterns Result of word perception model exhibits perceptual facilitation to pronounceable nonwords as well as words general principles or rules can emerge from the interactions of simple processing elements. does not implement exactly any of the systems of orthographic rules that have been proposed by linguists or psychologists PDP models may provide more accurate accounts of the details of human performance than models based on a set of rules representing human competence
3.3 Retrieving Information From Memory Content Addressability Graceful Degradation Default Assignment Spontaneous Generalization
Jets and Sharks Model
4. Representation and Learning In PDP Models What is the stored knowledge that gives rise to that pattern of activation? The difference between PDP models and other models of cognitive processes others: knowledge is stored as a static copy of a pattern PDP: the patterns themselves are not stored what is stored is the connection strengths between units that allow these patterns to be re-created
Local Versus Distributed Representation Distributed Representation The knowledge about any individual pattern is not stored in the connections of a special unit reserved for that pattern, but is distributed over the connections among a large number of processing units. Units are conceptual primitives Units have no particular meaning as individuals Pattern Associator --> Hebbian Rule
Attractive Properties of Pattern Associator Models Uncorrelated patterns do not interact with each other, but more similar ones do if we present the same pair of patterns over and over, but each time we add a little random noise to each element of each member of the pair, the system will automatically learn to associate the central tendency of the two patterns and will learn to ignore the noise What will be stored will be an average of the similar patterns with the slight variations removed.
Extracting the Structure of an Ensemble of Patterns Distributed Model if there are regularities in the correspondences between pairs of patterns, the model will naturally extract these regularities. Language Learning Model - learning past tense creation of regular past tenses of new verbs overregularization of the irregular verbs same phenomena as what is shown in children’s past tense acquisition we can see how the acquisition of performance that conforms to linguistic rules can emerge from a simple, local, connection strength modulation process
5. Origins of Parallel Distributed Processing Jackson(1869/1958) and Luria(1966) distributed, multilevel conceptions of processing systems dynamic functional system Hebb(1949) and Lashley(1950) “there are no special cells reserved for special memories” Rosenblatt(1959, 1962) and Selfridge(1955) perceptron Pandemonium : importance of interactive processing Anderson, Grossberg, Longuet-Higgins (60’s, 70’s) concept learning competitive learning mechanism distributed memory models
Marr and Poggio(1976) Morton’s logogen model(1969) one of the first models to capture concretely the principle of interaction of different sources of information Marseln-Wilson(1978) empirical demonstrations of interaction between different levels of language processing Levin’s Proteus model(1976) virtues of activetion-competition model Feldman and Ballard(1982) Hofstadter(1979, 1985) Sutton and Barto(1981) --> delta rule Hopfield(1982)