Download presentation
Presentation is loading. Please wait.
Published byShana Bond Modified over 9 years ago
1
Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder
2
Why Care About Human Memory? The neural architecture of human vision has inspired computer vision. Perhaps the cognitive architecture of memory can inspire the design of RAM systems. Understanding human memory essential for ML systems that predict what information will be accessible or interesting to people at any moment. E.g., selecting material for students to review to maximize long-term retention (Lindsey et al., 2014)
3
The World’s Most Boring Task Stimulus X -> Response a Stimulus Y -> Response b frequency response latency
4
Sequential Dependencies Dual Priming Model (Wilder, Jones, & Mozer, 2009; Jones, Curran, Mozer, & Wilder, 2013) Recent trial history leads to expectation of next stimulus Responses latencies are fast when reality matches expectation Expectation is based on exponentially decaying traces of two different stimulus properties
5
Examining Longer-Term Dependencies (Wilder, Jones, Ahmed, Curran, & Mozer, 2013)
6
Declarative Memory Cepeda, Vul, Rohrer, Wixted, & Pashler (2008) studytest
7
Forgetting Is Influenced By The Temporal Distribution Of Study Spaced studyMassed study produces more robust & durable learning than
8
Experimental Paradigm To Study Spacing Effect
9
Cepeda, Vul, Rohrer, Wixted, & Pashler (2008) Intersession Interval (Days) % Recall
10
Optimal Spacing Between Study Sessions as a Function of Retention Interval
11
Predicting The Spacing Curve characterization of student and domain intersession interval Multiscale Context Model predicted rec all forgetting after one session Intersession Interval (Days) % Recall
12
Multiscale Context Model (Mozer et al., 2009) Neural network Explains spacing effects Multiple Time Scale Model (Staddon, Chelaru, & Higa, 2002) Cascade of leaky integrators Explains rate-sensitive habituation Kording, Tenenbaum, Shadmehr (2007) Kalman filter Explains motor adaptation
13
Key Features Of Models Each time an event occurs in the environment… A memory of this event is stored via multiple traces Traces decay exponentially at different rates Memory strength is weighted sum of traces Slower scales are downweighted relative to faster scales Slower scales store memory (learn) only when faster scales fail to predict event trace strength mediumslow fast + +
14
time event occurrence
15
Exponential Mixtures ➜ Scale Invariance Infinite mixture of exponentials gives exactly power function Finite mixture of exponentials gives good approximation to power function With, can fit arbitrary power functions ++=
16
Relationship To Memory Models In Ancient NN Literature Focused back prop (Mozer, 1989), LSTM (Hochreiter & Schmidhuber, 1997) Little/no decay Multiscale backprop (Mozer, 1992), Tau net (Nguyen & Cottrell, 1997) Learned decay constants No enforced dominance of fast scales over slow scales Hierarchical recurrent net (El Hihi & Bengio, 1995) Fixed decay constants History compression (Schmidhuber, 1992; Schmidhuber, Mozer, & Prelinger, 1993) Event based, not time based
17
Sketch of Multiscale Memory Module x t : activation of ‘event’ in input to be remembered, in [0,1] m t : memory trace strength at time t Activation rule (memory update) based on error, Activation rule consistent with the 3 models (for Koerding model, ignore KF uncertainty) This update is differentiable ➜ can back prop through memory module Redistributes activation across time scales in a manner that is dependent on temporal distribution of input events Could add output gate as well to make it even more LSTM-like + ∆ fixed learned +1 -1 xtxt mtmt
18
Sketch of Multiscale Memory Module Pool of self-recurrent neurons with fixed time constants Input is the response of a feature-detection neuron This memory module stores the particular feature that is detected When the feature is present, the memory updates Update depends on error between is a feature detected at time t When feature detected, memory state compared to input, and a correction is made to memory to represent input strongly + ∆ fixed learned +1 -1
19
Why Care About Human Memory? Understanding human memory essential for ML systems that predict what information will be accessible or interesting to people at any moment. E.g., shopping patterns E.g., pronominal reference E.g., music preferences
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.