Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder.

Slides:



Advertisements
Similar presentations
Flexible Shaping: How learning in small steps helps Hierarchical Organization of Behavior, NIPS 2007 Kai Krueger and Peter Dayan Gatsby Computational Neuroscience.
Advertisements

Internal models, adaptation, and uncertainty
Dougal Sutherland, 9/25/13.
Neural Network Models in Vision Peter Andras
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Computer Vision Lecture 18: Object Recognition II
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Learning in Recurrent Networks Psychology 209 February 25, 2013.
Probabilistic inference in human semantic memory Mark Steyvers, Tomas L. Griffiths, and Simon Dennis 소프트컴퓨팅연구실오근현 TRENDS in Cognitive Sciences vol. 10,
CSC321: Neural Networks Lecture 3: Perceptrons
Quantifying Generalization from Trial-by-Trial Behavior in Reaching Movement Dan Liu Natural Computation Group Cognitive Science Department, UCSD March,
Lecture 14 – Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Michigan State University1 Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways Zhengping Ji Embodied Intelligence.
Neural Networks Marco Loog.
Overview of Long-Term Memory laura leventhal. Reference Chapter 14 Chapter 14.
Artificial Neural Networks Ch15. 2 Objectives Grossberg network is a self-organizing continuous-time competitive network.  Continuous-time recurrent.
Artificial Neural Networks
Spatial Pyramid Pooling in Deep Convolutional
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Biological Modeling of Neural Networks: Week 15 – Population Dynamics: The Integral –Equation Approach Wulfram Gerstner EPFL, Lausanne, Switzerland 15.1.
Temporal Spacing of Learning: Can It Help Reduce Forgetting? Hal Pashler University of California, San Diego Dept of Psychology.
Jochen Triesch, UC San Diego, 1 Short-term and Long-term Memory Motivation: very simple circuits can store patterns of.
Global Workspace Theory and LIDA ---- the role of conscious events in cognitive architectures. This powerpoint is available for educational use, from:
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Advances in Modeling Neocortex and its impact on machine intelligence Jeff Hawkins Numenta Inc. VS265 Neural Computation December 2, 2010 Documentation.
A New Theory of Neocortex and Its Implications for Machine Intelligence TTI/Vanguard, All that Data February 9, 2005 Jeff Hawkins Director The Redwood.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, & Adrian Sannier ASME IDETC & CIE, September 13, 2006.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Akram Bitar and Larry Manevitz Department of Computer Science
L. Manevitz U. Haifa 1 Neural Networks: Capabilities and Examples L. Manevitz Computer Science Department HIACS Research Center University of Haifa.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Dynamic Decision Making Laboratory Carnegie Mellon University 1 Social and Decision Sciences Department ACT-R models of training Cleotilde Gonzalez and.
Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model Ed Vul Mike Frank George Alvarez.
CHEE825 Fall 2005J. McLellan1 Nonlinear Empirical Models.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
Ghent University An overview of Reservoir Computing: theory, applications and implementations Benjamin Schrauwen David Verstraeten and Jan Van Campenhout.
RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.
Symbolic Reasoning in Spiking Neurons: A Model of the Cortex/Basal Ganglia/Thalamus Loop Terrence C. Stewart Xuan Choo Chris Eliasmith Centre for Theoretical.
Neural Networks: An Introduction and Overview
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
CCSI 5922 Neural Networks and Deep Learning: Introduction 1
CSE 190 Neural Networks: The Neural Turing Machine
Department of Computer Science University of Colorado, Boulder
第 3 章 神经网络.
Spring Courses CSCI 5922 – Probabilistic Models (Mozer) CSCI Mind Reading Machines (Sidney D’Mello) CSCI 7000 – Human Centered Machine Learning.
Backpropagation in fully recurrent and continuous networks
Different Units Ramakrishna Vedantam.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Kalman Filters and Linear Dynamical Systems and Optimal Adaptation To A Changing Body (Koerding, Tenenbaum, Shadmehr)
RNNs: Going Beyond the SRN in Language Prediction
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.
An Integrated Theory of the Mind
Understanding LSTM Networks
Neural Networks Geoff Hulten.
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Attention.
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
by Khaled Nasr, Pooja Viswanathan, and Andreas Nieder
Introduction to Neural Network
Neural Networks: An Introduction and Overview
Deep learning: Recurrent Neural Networks CV192
Akram Bitar and Larry Manevitz Department of Computer Science
Presentation transcript:

Exploiting Cognitive Constraints To Improve Machine-Learning Memory Models Michael C. Mozer Department of Computer Science University of Colorado, Boulder

Why Care About Human Memory? The neural architecture of human vision has inspired computer vision. Perhaps the cognitive architecture of memory can inspire the design of RAM systems. Understanding human memory essential for ML systems that predict what information will be accessible or interesting to people at any moment.  E.g., selecting material for students to review to maximize long-term retention (Lindsey et al., 2014)

The World’s Most Boring Task Stimulus X -> Response a Stimulus Y -> Response b frequency response latency

Sequential Dependencies Dual Priming Model (Wilder, Jones, & Mozer, 2009; Jones, Curran, Mozer, & Wilder, 2013)  Recent trial history leads to expectation of next stimulus  Responses latencies are fast when reality matches expectation  Expectation is based on exponentially decaying traces of two different stimulus properties

Examining Longer-Term Dependencies (Wilder, Jones, Ahmed, Curran, & Mozer, 2013)

Declarative Memory Cepeda, Vul, Rohrer, Wixted, & Pashler (2008) studytest

Forgetting Is Influenced By The Temporal Distribution Of Study Spaced studyMassed study produces more robust & durable learning than

Experimental Paradigm To Study Spacing Effect

Cepeda, Vul, Rohrer, Wixted, & Pashler (2008) Intersession Interval (Days) % Recall

Optimal Spacing Between Study Sessions as a Function of Retention Interval

Predicting The Spacing Curve characterization of student and domain intersession interval Multiscale Context Model predicted rec all forgetting after one session Intersession Interval (Days) % Recall

Multiscale Context Model (Mozer et al., 2009)  Neural network  Explains spacing effects Multiple Time Scale Model (Staddon, Chelaru, & Higa, 2002)  Cascade of leaky integrators  Explains rate-sensitive habituation Kording, Tenenbaum, Shadmehr (2007)  Kalman filter  Explains motor adaptation

Key Features Of Models Each time an event occurs in the environment… A memory of this event is stored via multiple traces Traces decay exponentially at different rates Memory strength is weighted sum of traces Slower scales are downweighted relative to faster scales Slower scales store memory (learn) only when faster scales fail to predict event trace strength mediumslow fast + +

time event occurrence

Exponential Mixtures ➜ Scale Invariance Infinite mixture of exponentials gives exactly power function Finite mixture of exponentials gives good approximation to power function With, can fit arbitrary power functions ++=

Relationship To Memory Models In Ancient NN Literature Focused back prop (Mozer, 1989), LSTM (Hochreiter & Schmidhuber, 1997)  Little/no decay Multiscale backprop (Mozer, 1992), Tau net (Nguyen & Cottrell, 1997)  Learned decay constants  No enforced dominance of fast scales over slow scales Hierarchical recurrent net (El Hihi & Bengio, 1995)  Fixed decay constants History compression (Schmidhuber, 1992; Schmidhuber, Mozer, & Prelinger, 1993)  Event based, not time based

Sketch of Multiscale Memory Module x t : activation of ‘event’ in input to be remembered, in [0,1] m t : memory trace strength at time t Activation rule (memory update) based on error,  Activation rule consistent with the 3 models (for Koerding model, ignore KF uncertainty)  This update is differentiable ➜ can back prop through memory module  Redistributes activation across time scales in a manner that is dependent on temporal distribution of input events Could add output gate as well to make it even more LSTM-like + ∆ fixed learned xtxt mtmt

Sketch of Multiscale Memory Module Pool of self-recurrent neurons with fixed time constants Input is the response of a feature-detection neuron  This memory module stores the particular feature that is detected  When the feature is present, the memory updates Update depends on error between is a feature detected at time t When feature detected, memory state compared to input, and a correction is made to memory to represent input strongly + ∆ fixed learned +1 -1

Why Care About Human Memory? Understanding human memory essential for ML systems that predict what information will be accessible or interesting to people at any moment.  E.g., shopping patterns  E.g., pronominal reference  E.g., music preferences