Download presentation
Presentation is loading. Please wait.
Published byWalter Long Modified over 9 years ago
1
Discovering Optimal Training Policies: A New Experimental Paradigm Robert V. Lindsey, Michael C. Mozer Institute of Cognitive Science Department of Computer Science University of Colorado, Boulder Harold Pashler Department of Psychology UC San Diego
2
Common Experimental Paradigm In Human Learning Research Propose several instructional conditions to compare based on intuition or theory E.g., spacing of study sessions in fact learning Equal: 1 – 1 – 1 Increasing: 1 – 2 – 4 Run many participants in each condition Perform statistical analyses to establish reliable difference between conditions
3
What Most Researchers Interested In Improving Instruction Really Want To Do Find the best training policy (study schedule) Abscissa: space of all training policies Performance function defined over policy space
4
Approach Perform single-participant experiments at selected points in policy space (o) Use function approximation techniques to estimate shape of the performance function Given current estimate, select promising policies to evaluate next. promising = has potential to be the optimum policy linear regression Gaussian process regression
5
Gaussian Process Regression Assumes only that functions are smooth Uses data efficiently Accommodates noisy data Produces estimates of both function shape and uncertainty
6
Simulated Experiment
8
Embellishments On Off-The-Shelf GP Regression Active selection heuristic: upper confidence bound GP is embedded in generative task model GP represents skill level (-∞ +∞) Mapped to population mean accuracy on test (0 1) Mapped to individual’s mean accuracy, allowing for interparticipant variability Mapped to # correct responses via binomial sampling Hierarchical Bayesian approach to parameter selection Interparticipant variability GP smoothness (covariance function)
9
Concept Learning Task
15
GLOPNOR = Graspability Ease of picking up & manipulating object with one hand Based on norms from Salmon, McMullen, & Filliter (2010)
16
Two-Dimensional Policy Space Fading policy Repetition/alternation policy
17
Two-Dimensional Policy Space
18
Policy Space fading policy repetition/ alternation policy
19
Experiment Training 25 trial sequence generated by chosen policy Balanced positive / negative Testing 24 test trials, ordered randomly, balanced No feedback, forced choice Amazon Mechanical Turk $0.25 / participant
20
Results # correct of 25
21
Best Policy Fade from easy to semi-difficulty Repetitions initially, alternations later
22
Results
23
Final Evaluation
24
Novel Experimental Paradigm Instead of running a few conditions each with many participants, … …run many conditions each with a different participant. Although individual participants provide a very noisy estimate of the population mean, optimization techniques allow us to determine the shape of the policy space.
25
What Next? Plea for more interesting policy spaces! Other optimization problems Abstract concepts from examples E.g., irony Motivation Manipulations Rewards/points, trial pace, task difficulty, time pressure Measure Voluntary time on task
26
Leftovers
27
Optimization E.g., time-varying repetition/alternation policy
28
How To Do Optimization? Reinforcement Learning POMDPs Function Approximation Gaussian Process Surrogate-Based Optimization
29
Approach (1) Using the current policy function estimate, choose a promising next policy to evaluate (2) Conduct a small experiment using that policy to obtain a (noisy) estimate of population mean performance for that policy (3) Use data collected so far to reestimate the shape of the policy function (4) Go to step 1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.