Organization and Emergence of Semantic Knowledge: A Parallel-Distributed Processing Approach James L. McClelland Department of Psychology and Center for Mind, Brain, and Computation Stanford University
Some Phenomena in Conceptual Development Progressive differentiation of concepts Illusory correlations and U-shaped developmental trajectories Conceptual reorganization Domain- and property-specific constraints on generalization Acquired sensitivity to an object’s causal properties What underlies these phenomena?
Naïve Domain Theories? Mechanisms of learning are thought to be too weak –They learn by contiguity and generalize by similarity… But generalization is domain dependent... so it is proposed instead that development begins with initial constraints, in the form of innately pre-specified proto-theories that guide inference and learning.
An Alternative View: Sensitivity to Coherent Covariation Coherent Covariation: –The tendency of properties of objects to co- occur in clusters. e.g. –Has wings –Can fly –Is light Or –Has roots –Has rigid cell walls –Can grow tall
Our Answer in More Detail Domain general mechanisms sensitive to experience underlie the development and elaboration of conceptual knowledge. These mechanisms exploit the principles of parallel-distributed processing. Models built on these principles are sensitive to coherent covariation. This sensitivity is the main cause of all of the phenomena.
Principles of Parallel Distributed Processing Processing occurs via interactions among neuron-like processing units via weighted connections. A representation is a pattern of activation. The knowledge is in the connections. Learning occurs through gradual connection adjustment, driven by experience. Both representation and processing are affected. H I N T /h/ /i/ /n/ /t/
Principles of Parallel Distributed Processing Processing occurs via interactions among neuron-like processing units via weighted connections. A representation is a pattern of activation. The knowledge is in the connections. Learning occurs through gradual connection adjustment, driven by experience. Both representation and processing are affected. H I N T /h/ /i/ /n/ /t/
Principles of Parallel Distributed Processing Processing occurs via interactions among neuron-like processing units via weighted connections. A representation is a pattern of activation. The knowledge is in the connections. Learning occurs through gradual connection adjustment, driven by experience. Both representation and processing are affected. H I N T /h/ /i/ /n/ /t/
Principles of Parallel Distributed Processing Processing occurs via interactions among neuron-like processing units via weighted connections. A representation is a pattern of activation. The knowledge is in the connections. Learning occurs through gradual connection adjustment, driven by experience. Both representation and processing are affected. H I N T /h/ /i/ /n/ /t/
Differentiation in Development and in a simple PDP network
The Rumelhart Model
Quillian’s Hierarchical Propositional Model
The Rumelhart Model: Target output for ‘robin can’ input
The Training Data: All propositions true of items at the bottom level of the tree, e.g.: Robin can {fly, move, grow}
ajaj aiai w ij net i = a j w ij w ki Forward Propagation of Activation
k ~ (t k -a k ) w ij i ~ k w ki w ki ajaj Back Propagation of Error () Error-correcting learning: At the output layer:w ki = k a i At the prior layer: w ij = j a j … aiai
The Rumelhart Model
Waves of differentiation reflect coherent covariation of properties across items. Patterns of coherent covariation are reflected in the principal components of the property covariance matrix. Figure shows attribute loadings on the first three principal components: –1. Plants vs. animals –2. Birds vs. fish –3. Trees vs. flowers Same color = features covary in component Diff color = anti-covarying features What Drives Progressive Differentiation?
“Now wait just a minute…” Didn’t you tell the network the taxonomic organization directly? –Pine ISA Tree, Plant –Robin ISA Bird, Animal Yes we did. –We do think names kids hear for things affect their conceptual representations. But labels aren’t necessary as long as an item’s properties exhibit coherent covariation.
Coherence Training Patterns No labels are provided Each item and each property occurs with equal frequency Properties Coherent Incoherent Items
Contexts: Note: coherence is present between, not within training experiences! Properties Coherent Incoherent Items IS CAN HAS
Effects of Coherence on Learning Coherent Properties Incoherent Properties
Effect of Coherence on Representation
Effects of Coherent Variation on Learning in Connectionist Models Attributes that vary together create the acquired concepts that populate the taxonomic hierarchy, and determine which properties are central and which are incidental to a given concept. –Labeling of these concepts or their properties is in no way necessary. –But it is easy to learn names for such concepts. Arbitrary properties (those that do not co-vary with others) are very difficult to learn. –And it is harder to learn names for concepts that are only differentiated by such arbitrary properties.
Where are we on that list of Phenomena? Progressive differentiation of concepts Illusory correlations and U-shaped developmental trajectories Conceptual reorganization Domain- and property-specific constraints on generalization Acquired sensitivity to an object’s causal properties
Illusory Correlations Rochel Gelman found that children think that all animals have feet. –Even animals that look like small furry balls and don’t seem to have any feet at all.
A typical property that a particular object lacks e.g., pine has leaves An infrequent, atypical property
Conceptual Reorganization (Carey, 1985) Carey demonstrates that young children ‘discover’ the unity of plants and animals as living things with many shared properties only around the age of 10. She suggests that the coalescence of the concept of living thing depends on learning about diverse aspects of plants and animals including –Nature of life sustaining processes –What it means to be dead vs. alive –Reproductive properties Can reorganization occur in a connectionist net?
Conceptual Reorganization in the Model Suppose superficial appearance information, which is not coherent with much else, is always available… And there is a pattern of coherent covariation across information that is contingently available in different contexts. The model forms initial representations based on superficial appearances. Later, it discovers the shared structure that cuts across the different contexts, reorganizing its representations.
Organization of Conceptual Knowledge Early and Late in Development
Inference and Generalization in the PDP Model A semantic representation for a new item can be derived by error propagation from given information, using knowledge already stored in the weights. Crucially: –The similarity structure, and hence the pattern of generalization depends on the knowledge already stored in the weights.
Start with a neutral representation on the representation units. Use backprop to adjust the representation to minimize the error.
The result is a representation similar to that of the average bird…
Use the representation to infer what this new thing can do.
Inference and Generalization in the PDP Model A semantic representation for a new item can be derived by error propagation from given information, using knowledge already stored in the weights. Crucially: –The similarity structure, and hence the pattern of generalization, depends on the knowledge already stored in the weights.
Domain Specificity What constraints are required for development and elaboration of domain-specific knowledge? –Are domain specific constraints required? –Or are there general principles that allow for acquisition of conceptual knowledge of all different types?
Differential Importance (Marcario, 1991) 3-4 yr old children see a puppet and are told he likes to eat, or play with, a certain object (e.g., top object at right) –Children then must choose another one that will “be the same kind of thing to eat” or that will be “the same kind of thing to play with”. –In the first case they tend to choose the object with the same color. –In the second case they will tend to choose the object with the same shape.
–Can the knowledge that one kind of property is important for one type of thing while another is important for a different type of thing be learned?
Adjustments to Training Environment Among the plants: –All trees are large –All flowers are small –Either can be bright or dull Among the animals: –All birds are bright –All fish are dull –Either can be small or large In other words: –Size covaries with properties that differentiate different types of plants –Brightness covaries with properties that differentiate different types of animals
Testing Feature Importance After partial learning, model is shown eight test objects: –Four “Animals”: All have skin All combinations of bright/dull and large/small –Four “Plants”: All have roots All combinations of bright/dull and large/small Representations are generated by using back-propagation to representation.
Similarities of Obtained Representations Size is relevant for Plants Brightness is relevant for Animals
In Rogers and McClelland (2004) we also address: –Conceptual differentiation in prelinguistic infants. –Many of the phenomena addressed by classic work on semantic knowledge from the 1970’s: Basic level Typicality Frequency Expertise –Disintegration of conceptual knowledge in semantic dementia –How the model can be extended to capture causal properties of objects and explanations. –What properties a network must have to be sensitive to coherent covariation.
Coherence Requires Convergence A A A
Damage to temporal pole is associated with semantic dementia, a domain-general loss of semantic information Imaging and lesion studies suggest that other brain areas are associated with more specific types of information. We suggest that the temporal pole serves as the convergent semantic representation in the brain. –With bi-directional connections to regions containing modality specific information. The interface with language occurs via connections between language areas and temporal pole. language Semantic Representation in the Brain
In summary –Sensitivity to coherent co-variation in experience can explain many aspects of conceptual development. –PDP networks subject to a domain-general architectural constraint provide the necessary mechanisms. –Our simulations do not prove domain general learning methods will turn out to be fully sufficient. –There is still room for domain- or content-specific constraints And the framework is fully compatible with their integration. –But our findings suggest it may be worth exploring how far we can go without them.
Thanks for your attention!
Proposed Architecture for the Organization of Semantic Memory color form motion action valance Temporal pole name Medial Temporal Lobe
Generalization of different property types At different points in training, the network is taught one of: –Maple can queem –Maple is queem –Maple has queem Only weights from hidden to output are allowed to change. Network is then tested to see how strongly ‘queem’ is activated then same relation is paired with other items. queem
Generalization to other concepts after training with can, has, or is queem
Overview The PDP Framework for Processing, Representation and Learning Complimentary Learning Systems in Hippocampus and Neocortex Differentiation and Reorganization of Conceptual Knowledge Inference and Generalization How the Complimentary Learning Systems Cooperate What kinds of innate constraints are necessary?
Modeling Inductive Inference (Osherson et al, 1990) General: –If a dolphin, a whale, and a zebra have biotin in their blood, how strong is the implication that all mammals have biotin in their blood? Specific: –If a seal and a cow have biotin in their blood, how strong is the implication that a horse has biotin in its blood?
PDP (as in Rogers & McClelland, 2004) Train a network on the item-feature matrix (50 animals have a 0 or 1 for each of 85 features) Animals Hidden Layers Features
PDP Add a new feature, and train the net using the given examples. Only allow the weights to the new feature node to change, and train to a threshold of Animals Hidden Layers Features
Results Using Osherson et al’s similarity-based model with the networks hidden representations, which emphasize coherent covariation, results in improved performance Kemp, Perfors and Tenenbaum’s Bayes Tree model seems to do even better, but we suspect possible over-fitting. Specific General Sim Sim-NN Bayes Tree
sparrow Hippocampus context Use the hippocampal memory system to store a memory for the learning episode. If the pattern can be reinstated at a later time, it can be used to support further inferences.
Relation-specific representations IS Representations (top) reflect idiosyncratic appearance properties. HAS representations are similar to the context- general representations (middle). Can representations collapse differences between plants, since there is little that plants can do. The fish are all the same, because there’s no difference in what they can do.
What About Causal Knowledge? Young children can attribute causal powers to objects based on single observations of scenarios in which the objects participate. –Gopnik’s ‘blickett’ experiments Causal powers are central to children’s generalization of category membership: –They assign the same name to objects with different appearance properties but similar causal powers. Do we need an innate mechanism for causal inference, as Gopnik suggests, to address these findings and other aspects of children’s causal reasoning abilities?
My Perspective Causal relations are not that different than other kinds of relations. Domain general mechanisms that are sensitive to experience underlie the development and elaboration of causal as well as other forms of conceptual knowledge. These mechanisms acquire sensitivity to causal structure through gradual learning.
Extension of the Model to Causal Inference (In my dreams?) Networks can learn to form internal representations that capture “causal powers” of objects, based on the consequences of their participation in events. Appearance properties don’t covary that that well with the causal powers of objects. Furthermore, the names of (man-made) objects co-vary with their causal powers, not with their appearance. –Radio –Telephone –Razor –Switch Thus, it would be natural for networks to learn to generalize names for objects based on their causal powers, rather than their appearance. Item Context (External and Internal) Sequelae
“But haven’t you still left something under the rug”? No, not really –everything is right out in the open. Here’s the situation: –Each example of an item always activates the same input unit. –Each context always activates the same context unit. –Each property always activates the same property unit. Each item, context, and property unit is like one of Fodor’s atomic concept representations: –“A representation R expresses the property P in virtue of its being a law that things that are P cause tokenings of R.” Such stipulations are by no means unproblematic But everyone has this problem, including Jerry
This Problem is Solved by Distributed Representations The localist input, context and output units can be replaced with distributed patterns of activation –(Rogers & McClelland, Chapter 5; Rogers et al, 2005; Dilkina and McClelland). The units correspond to atomic ‘microconcepts’ –Each item, context, and property is represented by a (possibly somewhat variable) ensemble of them. –The number of possible concepts that can be distinguished is now far greater (2 N vs N). Networks can learn: –Which microconcepts are important (and which combinations are important) –Which microconcepts should be treated as equivalent –Which microconcepts should be ignored All of this depends on patterns of covariation. This is a very good thing for everyone (even Jerry!): –it makes it possible for a system with finite resources to cover the space of possible concepts that might turn out to be needed.