7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University.

Slides:



Advertisements
Similar presentations
Plans to improve estimators to better utilize panel data John Coulston Southern Research Station Forest Inventory and Analysis.
Advertisements

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Model Assessment, Selection and Averaging
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
x – independent variable (input)
Introduction to Predictive Learning
Sparse vs. Ensemble Approaches to Supervised Learning
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Reduced Support Vector Machine
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Principle of Locality for Statistical Shape Analysis Paul Yushkevich.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Sparse vs. Ensemble Approaches to Supervised Learning
Part I: Classification and Bayesian Learning
Ensemble Learning (2), Tree and Forest
Radial Basis Function Networks
What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.
An Introduction to Support Vector Machines Martin Law.
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Robust localization algorithms for an autonomous campus tour guide Richard Thrapp Christian Westbrook Devika Subramanian Rice University Presented at ICRA.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
An Introduction to Support Vector Machines (M. Law)
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Feature selection with Neural Networks Dmitrij Lagutin, T Variable Selection for Regression
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Data Mining and Decision Support
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Dimensions of Neural Networks Ali Akbar Darabi Ghassem Mirroshandel Hootan Nokhost.
Computacion Inteligente Least-Square Methods for System Identification.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
Data Science Credibility: Evaluating What’s Been Learned
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning with Spark MLlib
Chapter 7. Classification and Prediction
International Workshop
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
A Unifying View on Instance Selection
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Lecture 16. Classification (II): Practical Considerations
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University of Colorado, Boulder

7/17/2002 Greg Grudic: Nonparametric Modeling 2 Outline Applications of Very High Dimension Nonparametric Modeling Define the problem domain One solution: Polynomial Cascade Algorithm Conclusion

7/17/2002 Greg Grudic: Nonparametric Modeling 3 Applications of High Dimensional Non-Parametric Models 1.Human-to-Robot Skill Transfer (ICRA96) 2.Mobile Robot Localization (IROS98) 3.Strength Prediction of Boards 4.Defect Classification in Lumber 5.Activity recognition for the cognitively impaired The same PC algorithm is used in all of these applications (with no parameter tuning)

7/17/2002 Greg Grudic: Nonparametric Modeling 4 Human-to-Robot Skill Transfer (ICRA96) Problem: Human demonstrates a task via teleoperation. –Object locate and approach task. –1024 raw pixel inputs and 2 actuator outputs. Learning Data: 4 demonstrations of task sequence. –2000 to 5000 learning examples (~2 to 5 min). Learning Time: ~5 min. on SPARC 20. Model Size / Evaluation Speed: < 500 Kb, ~5 Hz. Autonomous control of robot using model: –No failures in 30 random trials.

7/17/2002 Greg Grudic: Nonparametric Modeling 5 Mobile Robot Localization (IROS98) Goal: Use on-board camera images to obtain position/orientation Workspace: 6 x 5 meters in a research lab. Desired accuracy: 0.2 meters in pos. and 10 deg or. Inputs: 3 raw pixel images (160 by 120) => 19,200 inputs. Learning Data: 2000 image inputs and robot pos./or. Learning time: ~2 hours on SPARC 20. Model Size/Evaluation speed: ~2.0 MB, 7 Hz.

7/17/2002 Greg Grudic: Nonparametric Modeling 6 Strength Prediction of Boards Goal: Predict the strength of a board (2x4) using nondestructive scans (Slope of Grain, Elasticity, XRay). Current Wood Processing Industry Standard: correlation of 0.5 to The Learning Data: Scanned 300 boards and broke them each in 3 to 4 different places. Model Inputs: ~5000 statistical features. Learning time and model size: ~40 min. / ~1 MB. Model Accuracy (correlation): = 0.8.

7/17/2002 Greg Grudic: Nonparametric Modeling 7 Defect Classification in Lumber Problem: Classify board defects using “images”. < 10 ms per classification (Speed ~12 ft / sec). ~20 classes - 4 types of knots, pitch pockets, etc. Many attempted solutions: analytical methods, learning methods, etc. Model Inputs: > Learning Examples: > Model Accuracy: > 92%

7/17/2002 Greg Grudic: Nonparametric Modeling 8 Activity Recognition for the Cognitively Impaired Goal: –To keep track of what activity a person is doing using cameras e.g. which room is a person in; what are they doing; what have they completed? –Minimal engineering of environment Soln: Attach a video camera to the person as tasks are accomplished –Label camera images accordingly –Build a model that classifies the images ~4000 raw pixels as inputs Preliminary results: success rate of 90% for identifying 4 different tasks

7/17/2002 Greg Grudic: Nonparametric Modeling 9 Problem Domain Characteristics thousands of relevant input variables –each contributing a small but significant amount to the final model no subset of these variables can adequately describe the desired function the relevant variables are confounded by thousands of irrelevant variables

7/17/2002 Greg Grudic: Nonparametric Modeling 10 Why is this a difficult domain? Very large input space! Problem is intrinsically nonparametric –Don’t know which inputs are significant –Don’t know an optimal model structure Problems are in general nonlinear

7/17/2002 Greg Grudic: Nonparametric Modeling 11 Constructing Models from Data Given input/output examples of some phenomenon (regression/classification function) : Construct an approximate mapping such that, for some unseen :

7/17/2002 Greg Grudic: Nonparametric Modeling 12 Polynomial Cascade Algorithm: Conceptual Motivation (IJCAI 97) Problem # 1: Simultaneous construction of model infeasible. –Solution: use low dimensional projections (building blocks). simplest approach: 2 dimensional: Problem # 2: Finding the best low dimensional projection infeasible. –Soln: Don’t find the best - use selection criteria which are independent of dimension. simplest approach: random building block selection.

7/17/2002 Greg Grudic: Nonparametric Modeling 13 PC Algorithm: Conceptual Motivation (continued) Problem # 3: Low dimensional projections tend to be flat (i.e. ). –Soln: Subdivide the input space. simplest approach: random subdivision (bootstrap samples).

7/17/2002 Greg Grudic: Nonparametric Modeling 14 Polynomial Cascade Structure...

7/17/2002 Greg Grudic: Nonparametric Modeling 15 Main PC Algorithm Characteristics 1.Building blocks (3 rd order polynomials) 2. added one at a time, in order 3.Random (repeated) order of inputs: 4. constructed using a bootstrap sample

7/17/2002 Greg Grudic: Nonparametric Modeling 16 PC Algorithm: STEP 1: Initialize algorithm: –Learning data divided into training set and validation set –Random order of inputs STEP 2: Construct new section: ( Multiple levels ) –Use bootstrap sample to fit –Set to the normalized inverse MSE of on training set –Stop when error on validation set stops decreasing

7/17/2002 Greg Grudic: Nonparametric Modeling 17 PC Algorithm: (continued) STEP 3: Prune section: Prune back to the block which has smallest error on the validation set STEP 4: Update learning outputs. Replace outputs with residual errors: STEP 4: Check stopping condition: GOTO STEP 2 if further error reduction is possible, otherwise STOP

7/17/2002 Greg Grudic: Nonparametric Modeling 18 Why does PC work? Over fitting avoided via appropriate injection of randomness: i.e like Random Forests (Breiman, 1999) –Bootstrap sampling –Random order of inputs Irrelevant inputs not excluded from cascade –Treated as noise and averaged out No explicit variable selection is used

7/17/2002 Greg Grudic: Nonparametric Modeling 19 Why does PC work? (continued) Produces stable high dimensional models –Projections onto 2 dimensional structures Low dimensional projections are unlikely to be flat –Bootstrap sampling avoids –PC algorithm effectively deals with parity problems of greater than 2 dimensions e.g. 10 bit parity problem where, for all levels, without random sampling

7/17/2002 Greg Grudic: Nonparametric Modeling 20 PC Effective on Low Dimensional Problems (surprise?) Does as well or better than most algorithms on low dimensional regression problems (IJCAI97) Produces competitive models without the need for parameter tuning or kernel selection HOWEVER: –Models are not sparse!

7/17/2002 Greg Grudic: Nonparametric Modeling 21 Theoretical Results 1.PC’s are universal approximators 2.Conditions for convergence to zero error: Uncorrelated errors from level to level Similar to bagging and random forests (Breiman) 3.Rate of convergence (to some local error minimum), as a function of the number of learning examples, is independent of the dimension of the input space

7/17/2002 Greg Grudic: Nonparametric Modeling 22 Conclusion There are many application areas for very high dimension, nonlinear, nonparametric modeling algorithms! Cascaded low dimensional polynomials produce effective nonparametric models Polynomial Cascades are most effective in problem domains characterized by –thousands of relevant input variables each contributing a small but significant amount to the final model –no subset of these variables can adequately describe the desired function –the relevant variables are confounded by thousands of irrelevant variables