Ratbert: Nearest Sequence Memory Based Prediction Model Applied to Robot Navigation by Sergey Alexandrov iCML 2003
Defining the Problem ► Choosing a navigational action (simplified world: left, right, forward, back) ► Consequence of action unknown given the immediate state (expected observation) ► How to learn an unknown environment enough to accurately predict such consequences? ► Learning the entire model (POMDP) – for example, Baum-Welch (problem: slow) ► Goal-finding tasks – learning a path to a specific state (reinforcement problem) – for example, NSM (Nearest Sequence Memory) ► Generalized observation prediction – NSMP (Nearest Sequence Memory Predictor) Approaches
► Experience Seq n = {(o 1,a 1 )…(o n,a n )} ► NSMP(Seq n ) = observation predicted by executing a n ► Derived by examining k nearest matches (NNS) NSMP in Short oioi ? aiai o i+1 o2o2 o3o3 o2o2 o1o1 Example (k=4):
NSMP in Short (Cont.) ► Based on kNN applied to sequences of previous experience (NSM) ► Find k nearest (here: longest) sequence matches to immediately prior experience ► Calculate weights for each observation reached by the k sequence sections (tradeoff between long matches, and high frequency of matches) ► Probability of each observation = normalized weight ► Predicted observation is the observation with the highest probability
Testing ► Ratbert: Lego-based robot capable of simple navigation inside a small maze. Senses walls in front, left, right, and noisy distance. ► Software simulation based on Ratbert’s sensor inputs (larger environment, greater # of runs, longer sequences) ► Actions: {left, right, forward, back} Observations: {left, right, front, distance} ► For both trials, a training sequence was collected via random exploration, then a testing sequence was executed, comparing the predicted observation with the actual observation. For both, k was set to 4. ► Results compared to bigrams.
Results ► Plot: prediction rate vs. training sequence length. ► First graph is for Ratbert, second graph is for the software simulation. ► NSMP consistently produced a better, although not optimal, prediction rate.
Further Work ► Comparison to other probabilistic predictive models ► Determine optimal exploration method ► Examine situations that trip up the algorithm ► Go beyond “gridworld” concepts of left/right/forward/back to more realistic navigation ► Work on mapping real sensor data to discrete classes required by instance-based algorithms such as NSM/NSMP (for example, using single linkage hierarchical clustering until cluster distance <= sensor error)
Thank You