7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University of Colorado, Boulder
7/17/2002 Greg Grudic: Nonparametric Modeling 2 Outline Applications of Very High Dimension Nonparametric Modeling Define the problem domain One solution: Polynomial Cascade Algorithm Conclusion
7/17/2002 Greg Grudic: Nonparametric Modeling 3 Applications of High Dimensional Non-Parametric Models 1.Human-to-Robot Skill Transfer (ICRA96) 2.Mobile Robot Localization (IROS98) 3.Strength Prediction of Boards 4.Defect Classification in Lumber 5.Activity recognition for the cognitively impaired The same PC algorithm is used in all of these applications (with no parameter tuning)
7/17/2002 Greg Grudic: Nonparametric Modeling 4 Human-to-Robot Skill Transfer (ICRA96) Problem: Human demonstrates a task via teleoperation. –Object locate and approach task. –1024 raw pixel inputs and 2 actuator outputs. Learning Data: 4 demonstrations of task sequence. –2000 to 5000 learning examples (~2 to 5 min). Learning Time: ~5 min. on SPARC 20. Model Size / Evaluation Speed: < 500 Kb, ~5 Hz. Autonomous control of robot using model: –No failures in 30 random trials.
7/17/2002 Greg Grudic: Nonparametric Modeling 5 Mobile Robot Localization (IROS98) Goal: Use on-board camera images to obtain position/orientation Workspace: 6 x 5 meters in a research lab. Desired accuracy: 0.2 meters in pos. and 10 deg or. Inputs: 3 raw pixel images (160 by 120) => 19,200 inputs. Learning Data: 2000 image inputs and robot pos./or. Learning time: ~2 hours on SPARC 20. Model Size/Evaluation speed: ~2.0 MB, 7 Hz.
7/17/2002 Greg Grudic: Nonparametric Modeling 6 Strength Prediction of Boards Goal: Predict the strength of a board (2x4) using nondestructive scans (Slope of Grain, Elasticity, XRay). Current Wood Processing Industry Standard: correlation of 0.5 to The Learning Data: Scanned 300 boards and broke them each in 3 to 4 different places. Model Inputs: ~5000 statistical features. Learning time and model size: ~40 min. / ~1 MB. Model Accuracy (correlation): = 0.8.
7/17/2002 Greg Grudic: Nonparametric Modeling 7 Defect Classification in Lumber Problem: Classify board defects using “images”. < 10 ms per classification (Speed ~12 ft / sec). ~20 classes - 4 types of knots, pitch pockets, etc. Many attempted solutions: analytical methods, learning methods, etc. Model Inputs: > Learning Examples: > Model Accuracy: > 92%
7/17/2002 Greg Grudic: Nonparametric Modeling 8 Activity Recognition for the Cognitively Impaired Goal: –To keep track of what activity a person is doing using cameras e.g. which room is a person in; what are they doing; what have they completed? –Minimal engineering of environment Soln: Attach a video camera to the person as tasks are accomplished –Label camera images accordingly –Build a model that classifies the images ~4000 raw pixels as inputs Preliminary results: success rate of 90% for identifying 4 different tasks
7/17/2002 Greg Grudic: Nonparametric Modeling 9 Problem Domain Characteristics thousands of relevant input variables –each contributing a small but significant amount to the final model no subset of these variables can adequately describe the desired function the relevant variables are confounded by thousands of irrelevant variables
7/17/2002 Greg Grudic: Nonparametric Modeling 10 Why is this a difficult domain? Very large input space! Problem is intrinsically nonparametric –Don’t know which inputs are significant –Don’t know an optimal model structure Problems are in general nonlinear
7/17/2002 Greg Grudic: Nonparametric Modeling 11 Constructing Models from Data Given input/output examples of some phenomenon (regression/classification function) : Construct an approximate mapping such that, for some unseen :
7/17/2002 Greg Grudic: Nonparametric Modeling 12 Polynomial Cascade Algorithm: Conceptual Motivation (IJCAI 97) Problem # 1: Simultaneous construction of model infeasible. –Solution: use low dimensional projections (building blocks). simplest approach: 2 dimensional: Problem # 2: Finding the best low dimensional projection infeasible. –Soln: Don’t find the best - use selection criteria which are independent of dimension. simplest approach: random building block selection.
7/17/2002 Greg Grudic: Nonparametric Modeling 13 PC Algorithm: Conceptual Motivation (continued) Problem # 3: Low dimensional projections tend to be flat (i.e. ). –Soln: Subdivide the input space. simplest approach: random subdivision (bootstrap samples).
7/17/2002 Greg Grudic: Nonparametric Modeling 14 Polynomial Cascade Structure...
7/17/2002 Greg Grudic: Nonparametric Modeling 15 Main PC Algorithm Characteristics 1.Building blocks (3 rd order polynomials) 2. added one at a time, in order 3.Random (repeated) order of inputs: 4. constructed using a bootstrap sample
7/17/2002 Greg Grudic: Nonparametric Modeling 16 PC Algorithm: STEP 1: Initialize algorithm: –Learning data divided into training set and validation set –Random order of inputs STEP 2: Construct new section: ( Multiple levels ) –Use bootstrap sample to fit –Set to the normalized inverse MSE of on training set –Stop when error on validation set stops decreasing
7/17/2002 Greg Grudic: Nonparametric Modeling 17 PC Algorithm: (continued) STEP 3: Prune section: Prune back to the block which has smallest error on the validation set STEP 4: Update learning outputs. Replace outputs with residual errors: STEP 4: Check stopping condition: GOTO STEP 2 if further error reduction is possible, otherwise STOP
7/17/2002 Greg Grudic: Nonparametric Modeling 18 Why does PC work? Over fitting avoided via appropriate injection of randomness: i.e like Random Forests (Breiman, 1999) –Bootstrap sampling –Random order of inputs Irrelevant inputs not excluded from cascade –Treated as noise and averaged out No explicit variable selection is used
7/17/2002 Greg Grudic: Nonparametric Modeling 19 Why does PC work? (continued) Produces stable high dimensional models –Projections onto 2 dimensional structures Low dimensional projections are unlikely to be flat –Bootstrap sampling avoids –PC algorithm effectively deals with parity problems of greater than 2 dimensions e.g. 10 bit parity problem where, for all levels, without random sampling
7/17/2002 Greg Grudic: Nonparametric Modeling 20 PC Effective on Low Dimensional Problems (surprise?) Does as well or better than most algorithms on low dimensional regression problems (IJCAI97) Produces competitive models without the need for parameter tuning or kernel selection HOWEVER: –Models are not sparse!
7/17/2002 Greg Grudic: Nonparametric Modeling 21 Theoretical Results 1.PC’s are universal approximators 2.Conditions for convergence to zero error: Uncorrelated errors from level to level Similar to bagging and random forests (Breiman) 3.Rate of convergence (to some local error minimum), as a function of the number of learning examples, is independent of the dimension of the input space
7/17/2002 Greg Grudic: Nonparametric Modeling 22 Conclusion There are many application areas for very high dimension, nonlinear, nonparametric modeling algorithms! Cascaded low dimensional polynomials produce effective nonparametric models Polynomial Cascades are most effective in problem domains characterized by –thousands of relevant input variables each contributing a small but significant amount to the final model –no subset of these variables can adequately describe the desired function –the relevant variables are confounded by thousands of irrelevant variables