Emergent Functions of Simple Systems

Slides:



Advertisements
Similar presentations
Thomas Trappenberg Autonomous Robotics: Supervised and unsupervised learning.
Advertisements

ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
A Tutorial on Learning with Bayesian Networks
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
CS590M 2008 Fall: Paper Presentation
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Network Goodness and its Relation to Probability PDP Class Winter, 2010 January 13, 2010.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Naive Bayes Classifier
Bayesian and Connectionist Approaches to Learning Tom Griffiths, Jay McClelland Alison Gopnik, Mark Seidenberg.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Emergence of Semantic Knowledge from Experience Jay McClelland Stanford University.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Similarity and Attribution Contrasting Approaches To Semantic Knowledge Representation and Inference Jay McClelland Stanford University.
The Essence of PDP: Local Processing, Global Outcomes PDP Class January 16, 2013.
BCS547 Neural Decoding.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Emergent Functions of Simple Systems J. L. McClelland Stanford University.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Bayesian inference in neural networks
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
Chapter 7. Classification and Prediction
Learning Deep Generative Models by Ruslan Salakhutdinov
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
Deep Feedforward Networks
Network States as Perceptual Inferences
Perception, interaction, and optimality
Does Naïve Bayes always work?
Qian Liu CSE spring University of Pennsylvania
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Naive Bayes Classifier
James L. McClelland SS 100, May 31, 2011
Conditional Random Fields for ASR
Multimodal Learning with Deep Boltzmann Machines
Classification with Perceptrons Reading:
Machine Learning Basics
CS 4/527: Artificial Intelligence
Data Mining Lecture 11.
Bayesian inference in neural networks
Network States as Perceptual Inferences
Emergence of Semantics from Experience
A Classical Model of Decision Making: The Drift Diffusion Model of Choice Between Two Alternatives At each time step a small sample of noisy information.
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.
Revealing priors on category structures through iterated learning
Professor Marie desJardins,
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2007
Connectionist Units, Probabilistic and Biologically Inspired
Some Basic Aspects of Perceptual Inference Under Uncertainty
Class #21 – Monday, November 10
Michal Rosen-Zvi University of California, Irvine
Learning linguistic structure with simple recurrent neural networks
Parametric Methods Berlin Chen, 2005 References:
Belief Networks CS121 – Winter 2003 Belief Networks.
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Naive Bayes Classifier
Toward a Great Class Project: Discussion of Stoianov & Zorzi’s Numerosity Model Psych 209 – 2019 Feb 14, 2019.
Mathematical Foundations of BME Reza Shadmehr
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Emergent Functions of Simple Systems J. L. McClelland Stanford University

Topics Emergent probabilistic optimization in neural networks Relationship between competence/rational approaches and mechanistic (including connectionist) approaches Some models that bring connectionist and probabilistic approaches into proximal contact

Connectionist Units Calculate Posteriors based on Priors and Evidence Given A unit representing hypothesis hi, with binary inputs j representing the state of various elements of evidence e, where for all j p(ej) is assumed conditionally independent given hi A bias on the unit equal to log(priori/(1-priori)) Weights to the unit from each input equal to log(p(ej|hi)/(log(p(ej|not hi)) If the output of the unit is computed from the logistic function ai = 1/[1+exp( biasi + Sj aj wij)] Then ai = p(hi|e) Unit i Input from unit j wij

Choosing one of N alternatives A collection of connectionist units representing mutually exclusive alternative hypotheses can assign the posterior probability to each in a similar way, using the softmax activation function neti = biasi + Sj aj wij ai = exp(gneti)/Si’ exp(gneti’) If g = 1, this constitutes probability matching. As g increases, more and more of the activation goes to the most likely alternative(s).

Emergent Outcomes from Local Computations (Hopfield, ’82, Hinton & Sejnowski, ’83) If wij = wji and if units are updated asynchronously, setting ai = 1 if neti >0, ai = 0 otherwise A network will settle to a state s which is a local maximum in a measure Rumelhart et al (1986) called G G(s) = Si<j wij aiaj + Si ai(biasi + exti) If each unit sets its activation to 1 with probability logistic(gneti) then p(s) = exp(gG(s))/Ss’(exp(gG(s’))

A Tweaked Connectionist Model (McClelland & Rumelhart, 1981) that is Also a Graphical Model Each pool of units in the IA model is equivalent to a Dirichlet variable (c.f. Dean, 2005). This is enforced by using softmax to set one of the ai in each pool to 1 with probability: pj = egnetj/Sj’egnetj’ Weight arrays linking the variables are equivalent of the ‘edges’ encoding conditional relationships between states of these different variables. Biases at word level encode prior p(w). Weights are bi-directional, but encode generative constraints (p(l|w), p(f|l)). At equilibrium with g = 1, network’s probability of being in state s equals p(s|I).

But that’s not the true PDP approach to Perception/Cognition/etc… We want to learn how to represent the world and constraints among its constituents from experience, using (to the fullest extent possible) a domain-general approach. In this context, the prototypical connectionist learning rules correspond to probability maximization or matching Back Propagation Algorithm: Treats output units (or n-way pools) as conditionally independent given Input Maximizes p(oi|I) for each output unit. I o

Overcoming the Independence Assumption The Boltzmann Machine learning algorithm learns to match probabilities of entire output states o given current Input. That is, it minimizes ∫p(o|I) log(p(o|I)/q(o|I)) do where: p(o|I) is sampled from the environment (plus phase) q(o|I) is net’s estimate of p(o|I) obtained by settling with the input only (minus phase) The algorithm is beautifully simple and local: Dwij = e (ai+aj+ - ai-aj-)

Recent Developments Hinton’s deep belief networks are fully distributed learned connectionist models that use a restricted form of the Boltzmann machine (no intra-layer connections) and learn state-of-the-art models very fast. Generic constraints (sparsity, locality) allow such networks to learn efficiently and generalize very well in demanding task contexts. Hinton, Osindero, and Teh (2006). A fast learning algorithm for deep belief networks. Neural Computation, 18, 1527-54.

Topics Emergent probabilistic optimization in neural networks Relationship between competence/rational approaches and mechanistic (including connectionist) approaches Some models that bring connectionist and probabilistic approaches into proximal contact

One take on the relationship between rational analysis and human behavior Characterizing what’s optimal is always a great thing to do Optimality is always relative to some framework; what that framework should be isn’t always obvious. It is possible to construct a way of seeing virtually anything as optimal post hoc (c.f. Voltaire’s Candide). Optimization is also relative to a set of constraints Time Memory Processing speed Available mechanisms Simplifying assumptions … The question of whether people do behave optimally (according to some framework and constraints) in any particular situation is an empirical question. The question of why and how people can/do so behave in some situations and not in others is worth understanding more thoroughly.

Two perspectives People evolved through an optimization process, and are likely to approximate optimality/rationality within limits. Many aspects of natural/intuitive cognition may depend largely on implicit knowledge. Natural structure (e.g. language) does not exactly correspond to any specific structure type. Culture/School encourages us to think and reason explicitly, and gives us tools for this; we do so under some circumstances. People are rational, their behavior is optimal. They seek explicit internal models of the structure of the world, within which to reason. Optimal structure type for each domain Optimal structure instance within type Resource limits and implementation constraints are unknown, and should be ignored in determining what is rational/optimal. Inference is still hard, and prior domain-specific constraints are therefore essential.

Same experienced structure leads to different outcomes under different performance conditions (Sternberg & McClelland, in prep) Box appears… Then one or two objects appear Then a dot may or may not appear RT condition: Respond as fast as possible when dot appears Prediction condition: Predict whether a dot will appear, get feedback after prediction. Each event in box occur several times, interleaved, with reversal of outcome on 10% of trials. Half of participants are instructed in Causal Powers model, half not. All participants learn explicit relations. Only Instructed Prediction subjects show Blocking and Screening. AB+,A+ CD+,C- EF+ GH-,G- fillers

Two perspectives People evolved through an optimization process, and are likely to approximate optimality/rationality within limits. Many aspects of natural/intuitive cognition may depend largely on implicit knowledge. Natural structure (e.g. language) does not exactly correspond to any specific structure type. Culture/School encourages us to think and reason explicitly, and gives us tools for this; we do so under some circumstances. Many connectionist models do not directly address this kind of thinking; eventually they should be elaborated to do so. Human behavior won’t be understood without considering the constraints it operates under. Determining what is optimal sans constraints is always useful, even so Such an effort should not presuppose individual humans intend to derive an explicit model. Inference is hard, and domain specific priors can help, but domain-general mechanisms subject to generic constraints deserve full exploration. In some cases such models may closely approximate what might be the optimal explicit model. But that model might only be an approximation and the domain-specific constraints might not be necessary. People are rational, their behavior is optimal. They seek explicit internal models of the structure of the world, within which to reason. Optimal structure type for each domain Optimal structure instance within type Resource limits and implementation constraints are unknown, and should be ignored in determining what is rational/optimal. Inference is still hard, and prior domain-specific constraints are therefore essential.

What is happening here? Prediction participants have both a causal framework and the time to reason explicitly about which objects have the power to make the dot appear and which do not. Recall of (e.g.) C- during a CD prediction trial, in conjunction with the causal powers story, licenses the inference to D+ This inference does not occur without the both the time to think and the appropriate cover story.

The Rumelhart Sematic Attribution Model is Approximated by a Gradually Changing Mixture of Increasingly Specific Naïve Bayes Classifiers (Roger Grosse, 2007) Very young Correlation of network’s attributions with indicated classifier Still young Older

Topics Emergent probabilistic optimization in neural networks Relationship between competence/rational approaches and mechanistic (including connectionist) approaches Some models that bring connectionist and probabilistic approaches into proximal contact

Some models that bring connectionist and probabilistic approaches into proximal contact Graphical IA model of Context Effects in Perception In progress; see Movellan & McClelland, 2001. Leaky Competing Accumulator Model of Decision Dynamics Usher and McClelland, 2001, and the large family of related decision making models Models of Unsupervised Category Learning Competitive Learning, OME, TOME (Lake et al, ICDL08). Subjective Likelihood Model of Recognition Memory McClelland and Chappell, 1998; c.f. REM, Steyvers and Shiffrin, 1997), and a forthcoming variant using distributed item representations.