Information Processing CLPS0020: Introduction to Cognitive Science Professor Dave Sobel Fall 2016
What is information processing? Recall Skinner’s Behaviorism We act based on responses to stimuli – cognition is propensities to behave Information Processing is a reaction to that theory – we process and respond to information, not react to it like a reflex Automatic (reflexive) vs. Mindful (controlled) processes
MindBrain::SoftwareHardware Metaphor is of human beings as computers Information = data Input through perception Output through action Stored in memory (ROM) Processed by a CPU Working Memory (RAM) Decision/Inference Processes Other systems (attention/imagery) and methods (cognitive neuroscience) kludged onto this account.
The Magic Number 7 Example: How much information can you hold in short term memory? Whoa! What’s information? Let’s define it as a “piece of meaningful knowledge and can be independent of other pieces of knowledge” Notice: Context So, in a simple experiment, like digit span, it’s a number What’s short term memory? The Modal Model
The Modal Model Developed by Atkinson & Shiffrin (1968) Long Term Memory (LTM) is a warehouse for information Short Term Memory is a “loading dock” through which information goes into LTM (encoding) or comes out of LTM (retrieval) Have you already noticed that this metaphor is about processing information?
The Magic Number 7 Example: How much information can you hold in short term memory? 6 4 8 2 4 9 4 2 9 8 6 1 7 5 3 8 6 7 4 6 2 9 Answer 7 +/- 2
Miller 1956 Miller observed that in any task in which information was manipulated (usually categories in a discrimination task), human participants could track 7 +/- 2 categories Digit Span Words in free recall Associations of stimuli with responses Artificial category labels Many other examples
MindBrain::SoftwareHardware Metaphor is of human beings as computers Information = data Input through perception Output through action Stored in memory (ROM) Processed by a CPU Working Memory (RAM) Decision/Inference Processes Can we build software that runs on hardware that isn’t our brain? Recall: Turing Machines Contemporary Idea: Computational Models
Back to Turing Machines Remember: Symbol processors – take input, look up rules, generate output. Modern Computers are built on this principle (Symbolic AI) “Computational-level” explanation of the mind Example How do you learn phonetics of plurals (Berko, 1958) A wug, two ? A dax, two ? A blicket, two ? Rules: S1->/s/, S2->/z/. S3/es/
Why are Turing Machines appealing? Turing Machines manipulate symbols and have clear, analyzable rules Serial processing. One thing after another. Discrete representation. We know what is represented (i.e., Semantics) Those representations combine in only particular ways (i.e., Syntax) Syntax can relate to semantics in some cases, but also can be independent. “Mary kissed John” vs. “John kissed Mary” mean different things “Colorless green ideas sleep furiously.” Proper syntax, not semantics
What if this metaphor is completely wrong? Where do the mappings between symbol and meaning come from? Perhaps they do not exist Alternative View: Cognition is an emergent process of many dumb processors working together Uses the brain as a metaphor - many neurons working continuously and in parallel Redundant No syntax or semantics (they’re illusions based on the algorithms evolving into a stable state)
Initial Idea: Perceptrons (Minsky & Pappert, 1969) Returns 1 if a chair, 0 if not Perceptrons were a model of categorization How do you get a child to learn what a chair is? Well, you could show it stuff. Some of the stuff are chairs, some of the stuff are not chairs. You label the chairs “chairs” and you label the non chairs “not a chair” Chair? Input 1 Input 2 Two ways in which the input is represented (could be n)
How does it work? Each nodes has an activation (a) that propagates through the system according to its connections. Simplest version: a = 1 or 0 Each connection has a weight Activation is combined by weight (e.g., a*w, and w [0:1]) Output node has a threshold level (t) If sum of activation given connection weights is above threshold, it fires (i.e., returns 1), otherwise it does not (i.e., returns 0) 1 or 0 Chair? t w1 w2 Input 1 Input 2 a1 a2
How does it learn? Standard training: Error Correction Chicken-sexing: Change weights to try and minimize the error Backpropagation (Delta Rule) If a solution exists, this algorithm is guaranteed to find it. Chair? Input 1 Input 2
The Trouble with Perceptrons Blicket? Consider blickets, which are white or square, but not white and square White? Square? There is no threshold level such that the model won’t learn that white squares aren’t blickets. Perceptrons can’t model XOR, or any category that is linearly separable Shape: Square or not Color: White or not
Historically Historically, people put this idea on a shelf for about 15 years. Perceptrons were considered nonstarters, and people focused on symbolic models. With moderate, but limited success Until, McClosky & Rumelhart (1986) Multi-layer perceptrons (neural networks)
Example Positive w Negative w White? Blicket? Square? White? Blicket?
And here’s the funny thing… Neural Networks provide good models of memory, attention, categorization (and language processing, particularly things like pluralization and past tense) Pretty much all of human cognition No symbol manipulation or processing Brain like in architecture – lots of dumb processes talking to each other, with graceful degradation Why do we hold on to the mind as computer metaphor? There are limitations on what they can learn and do The explanation is not satisfactory (perhaps algorithmic?) But how do you build a neural network? On a computer, which is a symbolic processor. Do we really know what the brain is doing?