Download presentation
Presentation is loading. Please wait.
Published byPierce Dixon Modified over 9 years ago
1
Bayesian and Connectionist Approaches to Learning Tom Griffiths, Jay McClelland Alison Gopnik, Mark Seidenberg
2
Who Are We and What Do We Study? We are Cognitive and developmental psychologists who use mathematical and computational models together with experimental studies of children and adults We are Cognitive and developmental psychologists who use mathematical and computational models together with experimental studies of children and adults We study Human cognitive processes ranging from object recognition, language processing, and reading to semantic cognition, naïve physics and causal reasoning We study Human cognitive processes ranging from object recognition, language processing, and reading to semantic cognition, naïve physics and causal reasoning
3
Our Question How do probabilistic/Bayesian and connectionist/neural network models relate?
4
Brains all round…
5
Schedule Tom Griffiths Tom Griffiths Probabilistic/Bayesian Approaches Probabilistic/Bayesian Approaches Jay McClelland Jay McClelland Connectionist/Neural Network Approaches Connectionist/Neural Network Approaches Alison Gopnik Alison Gopnik Causal Reasoning Causal Reasoning Mark Seidenberg Mark Seidenberg Language Acquision Language Acquision Open Discussion Open Discussion Robotics, Machine Learning, Other Applications… Robotics, Machine Learning, Other Applications…
6
Emergent Functions of Simple Systems J. L. McClelland Stanford University
7
Topics Emergent probabilistic optimization in neural networks Emergent probabilistic optimization in neural networks Relationship between competence/rational approaches and mechanistic (including connectionist) approaches Relationship between competence/rational approaches and mechanistic (including connectionist) approaches Some models that bring connectionist and probabilistic approaches into proximal contact Some models that bring connectionist and probabilistic approaches into proximal contact
8
Connectionist Units Calculate Posteriors based on Priors and Evidence Given Given A unit representing hypothesis h i, with binary inputs j representing the state of various elements of evidence e, where for all j p(e j ) is assumed conditionally independent given h i A unit representing hypothesis h i, with binary inputs j representing the state of various elements of evidence e, where for all j p(e j ) is assumed conditionally independent given h i A bias on the unit equal to log(prior i /(1-prior i )) A bias on the unit equal to log(prior i /(1-prior i )) Weights to the unit from each input equal to log(p(e j |h i )/(log(p(e j |not h i )) Weights to the unit from each input equal to log(p(e j |h i )/(log(p(e j |not h i )) If If the output of the unit is computed, taking the logistic function of the net input net i = bias i + j a j w ij a i = 1/[1+exp( -net i )] Then Then a i = p(h i |e) a i = p(h i |e) A set units for mutually exclusive alternatives can assign the posterior probability to each in a similar way, using the softmax activation function a i = exp( net i )/ i’ exp( net i’ ) A set units for mutually exclusive alternatives can assign the posterior probability to each in a similar way, using the softmax activation function a i = exp( net i )/ i’ exp( net i’ ) If = 1, this constitutes probability matching. If = 1, this constitutes probability matching. As increases, more and more of the activation goes to the most likely alternative(s). As increases, more and more of the activation goes to the most likely alternative(s). Unit i Input from unit j w ij
9
Emergent Outcomes from Local Computations (Hopfield, ’82, Hinton & Sejnowski, ’83) If w ij = w ji and if units are updated asynchronously, setting a i = 1 if net i >0, a i = 0 otherwise A network will settle to a state s which is a local maximum in a measure Rumelhart et al (1986) called G If w ij = w ji and if units are updated asynchronously, setting a i = 1 if net i >0, a i = 0 otherwise A network will settle to a state s which is a local maximum in a measure Rumelhart et al (1986) called G G(s) = i<j w ij a i a j + i a i (bias i + ext i ) G(s) = i<j w ij a i a j + i a i (bias i + ext i ) If each unit sets its activation to 1 with probability logistic( net i ) then p(s) = exp( G(s))/ s’ (exp( G(s’)) If each unit sets its activation to 1 with probability logistic( net i ) then p(s) = exp( G(s))/ s’ (exp( G(s’))
10
A Tweaked Connectionist Model (McClelland & Rumelhart, 1981) that is Also a Graphical Model Each pool of units in the IA model is equivalent to a Dirichlet variable (c.f. Dean, 2005). Each pool of units in the IA model is equivalent to a Dirichlet variable (c.f. Dean, 2005). This is enforced by using softmax to set one of the a i in each pool to 1 with probability: p j = e net j / j’ e net j’ This is enforced by using softmax to set one of the a i in each pool to 1 with probability: p j = e net j / j’ e net j’ Weight arrays linking the variables are equivalent of the ‘edges’ encoding conditional relationships between states of these different variables. Weight arrays linking the variables are equivalent of the ‘edges’ encoding conditional relationships between states of these different variables. Biases at word level encode prior p(w). Biases at word level encode prior p(w). Weights are bi-directional, but encode generative constraints (p(l|w), p(f|l)). Weights are bi-directional, but encode generative constraints (p(l|w), p(f|l)). At equilibrium with = 1, network’s probability of being in state s equals p(s|I). At equilibrium with = 1, network’s probability of being in state s equals p(s|I).
11
But that’s not the true PDP approach to Perception/Cognition/etc… We want to learn how to represent the world and constraints among its constituents from experience, using (to the fullest extent possible) a domain-general approach. We want to learn how to represent the world and constraints among its constituents from experience, using (to the fullest extent possible) a domain-general approach. In this context, the prototypical connectionist learning rules correspond to probability maximization or matching In this context, the prototypical connectionist learning rules correspond to probability maximization or matching Back Propagation Algorithm: Back Propagation Algorithm: w ij = i a j Maximizes p(o i |I) for each output unit. Maximizes p(o i |I) for each output unit. Boltmann Machine Learning Algorithm: Boltmann Machine Learning Algorithm: w ij = (a i + a j + - a i - a j - ) Learns to match probabilities of entire output states o given current Input. That is, it minimizes Learns to match probabilities of entire output states o given current Input. That is, it minimizes ∫ p(o|I) log(p(o|I)/q(o|I)) do I o
12
Recent Developments Hinton’s deep belief networks are fully distributed learned connectionist models that use a restricted form of the Boltzmann machine (no intra-layer connections). They are fast and beat other machine learning methods. Adding generic constraints (sparsity, locality) allow such networks to learn efficiently and generalize very well in demanding task contexts. Hinton, Osindero, and Teh (2006). A fast learning algorithm for deep belief networks. Neural Computation, 18, 1527-54.
13
Topics Emergent probabilistic optimization in neural networks Emergent probabilistic optimization in neural networks Relationship between competence/rational approaches and mechanistic (including connectionist) approaches Some models that bring connectionist and probabilistic approaches into proximal contact Some models that bring connectionist and probabilistic approaches into proximal contact
14
Two perspectives People are rational, their behavior is optimal. People are rational, their behavior is optimal. They seek explicit internal models of the structure of the world, within which to reason. They seek explicit internal models of the structure of the world, within which to reason. Optimal structure type for each domain Optimal structure type for each domain Optimal structure instance within type Optimal structure instance within type People evolved through an optimization process, and are likely to approximate optimality/rationality within limits. Fundamental aspects of natural/intuitive cognition may depend largely on implicit knowledge. Natural structure (e.g. language) does not exactly correspond to any specific structure type. Culture/School encourages us to think and reason explicitly, and gives us tools for this; we do so under some circumstances. Many connectionist models do not directly address this kind of thinking; eventually they should be elaborated to do so.
15
Two Perspectives, Cont’d Resource limits and implementation constraints are unknown, and should be ignored in determining what is rational/optimal. Resource limits and implementation constraints are unknown, and should be ignored in determining what is rational/optimal. Inference is still hard, and prior domain-specific constraints are therefore essential. Inference is still hard, and prior domain-specific constraints are therefore essential. Human behavior won’t be understood without considering the constraints it operates under. Determining what is optimal sans constraints is always useful, even so Such an effort should not presuppose individual humans intend to derive an explicit model. Inference is hard, and domain specific priors can help, but domain-general mechanisms subject to generic constraints deserve full exploration. In some cases such models may closely approximate what might be the optimal explicit model. But that model might only be an approximation and the domain-specific constraints might not be necessary.
16
Perspectives on Development A competence-level approach can ask, what is the best representation a child could have given the data gathered to date? A competence-level approach can ask, what is the best representation a child could have given the data gathered to date? The entire data sample is retained, and the optimal model is re-estimated The entire data sample is retained, and the optimal model is re-estimated The developing child is an on-line learning system; the parameters of the mind are adjusted as each new experience comes in, and the experiences themselves are rapidly lost.
17
Is a Convergence Possible? Yes! Yes! It is possible to ask what is optimal/rational within any set of constraints. It is possible to ask what is optimal/rational within any set of constraints. Time Time Architecture Architecture Algorithm Algorithm Reliability and dynamics of the hardware Reliability and dynamics of the hardware It is then possible to ask how close some mechanism actually comes to achieving optimality, within the specified constraints. It is then possible to ask how close some mechanism actually comes to achieving optimality, within the specified constraints. It is also possible to ask how close it comes to explaining actual human performance, including performance in learning and response to experience during development. It is also possible to ask how close it comes to explaining actual human performance, including performance in learning and response to experience during development.
18
Topics Emergent probabilistic optimization in neural networks Emergent probabilistic optimization in neural networks Relationship between competence/rational approaches and mechanistic (including connectionist) approaches Relationship between competence/rational approaches and mechanistic (including connectionist) approaches Some models that bring connectionist and probabilistic approaches into proximal contact
19
Models that Bring Connectionist and Probabilistic Approaches into Proximal Contact Graphical IA model of Context Effects in Perception Graphical IA model of Context Effects in Perception In progress; see Movellan & McClelland, 2001. In progress; see Movellan & McClelland, 2001. Leaky Competing Accumulator Model of Decision Dynamics Leaky Competing Accumulator Model of Decision Dynamics Usher and McClelland, 2001, and the large family of related decision making models Usher and McClelland, 2001, and the large family of related decision making models Models of Unsupervised Category Learning Models of Unsupervised Category Learning Competitive Learning, OME, TOME (Lake et al, ICDL08). Competitive Learning, OME, TOME (Lake et al, ICDL08). Subjective Likelihood Model of Recognition Memory Subjective Likelihood Model of Recognition Memory McClelland and Chappell, 1998; c.f. REM, Steyvers and Shiffrin, 1997), and a forthcoming variant using distributed item representations. McClelland and Chappell, 1998; c.f. REM, Steyvers and Shiffrin, 1997), and a forthcoming variant using distributed item representations.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.