Active Learning and the Importance of Feedback in Sampling Rui Castro Rebecca Willett and Robert Nowak.

Slides:



Advertisements
Similar presentations
On allocations that maximize fairness Uriel Feige Microsoft Research and Weizmann Institute.
Advertisements

Properties of Least Squares Regression Coefficients
Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference,
Fast Algorithms For Hierarchical Range Histogram Constructions
Yasuhiro Fujiwara (NTT Cyber Space Labs)
1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
Online Vertex-Coloring Games in Random Graphs Revisited Reto Spöhel (joint work with Torsten Mütze and Thomas Rast; appeared at SODA ’11)
Multiscale Analysis for Intensity and Density Estimation Rebecca Willett’s MS Defense Thanks to Rob Nowak, Mike Orchard, Don Johnson, and Rich Baraniuk.
電腦視覺 Computer and Robot Vision I Chapter2: Binary Machine Vision: Thresholding and Segmentation Instructor: Shih-Shinh Huang 1.
Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.
A posteriori Error Estimate - Adaptive method Consider the boundary value problem Weak form Discrete Equation Error bounds ( priori error )
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Q-Learning and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IMS/Bernoulli: July, 2004.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Contents Description of the big picture Theoretical background on this work The Algorithm Examples.
Segmentation Divide the image into segments. Each segment:
Ensemble Learning: An Introduction
Presenting: Assaf Tzabari
A Finite Sample Upper Bound on the Generalization Error for Q-Learning S.A. Murphy Univ. of Michigan CALD: February, 2005.
reconstruction process, RANSAC, primitive shapes, alpha-shapes
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Active Appearance Models Suppose we have a statistical appearance model –Trained from sets of examples How do we use it to interpret new images? Use an.
Finding a maximum independent set in a sparse random graph Uriel Feige and Eran Ofek.
CS Subdivision I: The Univariate Setting Peter Schröder.
Ensemble Learning (2), Tree and Forest
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
1 Patch Complexity, Finite Pixel Correlations and Optimal Denoising Anat Levin, Boaz Nadler, Fredo Durand and Bill Freeman Weizmann Institute, MIT CSAIL.
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
The Visibility Problem In many environments, most of the primitives (triangles) are not visible most of the time –Architectural walkthroughs, Urban environments.
Model Construction: interpolation techniques 1392.
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Course 13 Curves and Surfaces. Course 13 Curves and Surface Surface Representation Representation Interpolation Approximation Surface Segmentation.
Images Similarity by Relative Dynamic Programming M. Sc. thesis by Ady Ecker Supervisor: prof. Shimon Ullman.
Multivariate Dyadic Regression Trees for Sparse Learning Problems Xi Chen Machine Learning Department Carnegie Mellon University (joint work with Han Liu)
CS654: Digital Image Analysis Lecture 30: Clustering based Segmentation Slides are adapted from:
CS654: Digital Image Analysis
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Coarse-to-Fine Image Reconstruction Rebecca Willett In collaboration with Robert Nowak and Rui Castro.
Image Segmentation Shengnan Wang
On the limits of partial compaction Nachshon Cohen and Erez Petrank Technion.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Yi Jiang MS Thesis 1 Yi Jiang Dept. Of Electrical and Computer Engineering University of Florida, Gainesville, FL 32611, USA Array Signal Processing in.
Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.
Robust Estimation Course web page: vision.cis.udel.edu/~cv April 23, 2003  Lecture 25.
Computational Intelligence: Methods and Applications Lecture 15 Model selection and tradeoffs. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
11-1 Lyapunov Based Redesign Motivation But the real system is is unknown but not necessarily small. We assume it has a known bound. Consider.
Towards Robust Revenue Management: Capacity Control Using Limited Demand Information Michael Ball, Huina Gao, Yingjie Lan & Itir Karaesmen Robert H Smith.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Dense-Region Based Compact Data Cube
Multiscale Likelihood Analysis and Inverse Problems in Imaging
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Multiscale Likelihood Analysis and Image Reconstruction
Distinct Distances in the Plane
The general linear model and Statistical Parametric Mapping
LECTURE 09: BAYESIAN LEARNING
Introduction to Sensor Interpretation
Introduction to Sensor Interpretation
Presentation transcript:

Active Learning and the Importance of Feedback in Sampling Rui Castro Rebecca Willett and Robert Nowak

Motivation – “twenty questions” Goal: Accurately “learn” a concept, as fast as possible, by strategically focusing in regions of interest

Learning by asking carefully chosen questions, constructed using information gleaned from previous observations Active Sampling in Regression

Sample locations are chosen a priori, before any observations are made Passive Sampling

Sample locations are chosen as a function of previous observations Active Sampling

Problem Formulation

Passive vs. Active Passive Sampling: Active Sampling:

Estimation and Sampling Strategies Goal: The estimator : The sampling strategy :

Classical Smoothness Spaces Functions with homogeneous complexity over the entire domain - Hölder smooth function class

Smooth Functions – minimax lower bound Theorem (Castro, RW, Nowak ’05) The performance one can achieve with active learning is the same achievable with passive learning!!!

Inhomogeneous Functions Homogenous functions spread-out complexity Inhomogeneous functions localized complexity The relevant features of inhomogeneous functions are very localized in space, making active sampling promising

Piecewise Constant Functions – d ≥ 2 best possible rate

Passive Learning in the PC Class Estimation using Recursive Dyadic Partitions (RDP) Prune the partition, adapting to the dataRecursively divide the domain into hypercubesDecorate each partition set with a constantDistribute sample points uniformly over [0,1] d

RDP-based Algorithm Choose an RDP that fits the data well, but it is not overly complicated empirical risk measures fit of the data Complexity penalty This estimator can be computed efficiently using a tree-pruning algorithm.

Error Bounds Oracle bounding techniques, akin to the work of Barron’91, can be used to upper bound the performance of our estimator approximation errorcomplexity penalty balancing the two terms

Active Sampling in the PC class Active Sampling Key: learn the location of the boundary Use Recursive Dyadic Partitions to find the boundary

Active Sampling in the PC Class Stage 1: “Oversample” at coarse resolution n/2 samples uniformly distributed Limit the resolution: many more samples than cells biased, but very low variance result (high approximation error, but low estimation error) “boundary zone” is reliably detected

Active Sampling in the PC Class Stage 2: Critically sample in boundary zone n/2 samples uniformly distributed within boundary zone construct fine partition around boundary prune partition according to standard multiscale methods high resolution estimate of boundary

Main Theorem * Cusp-free boundaries cannot behave like the graph of |x| 1/2 at the origin, but milder “kinks” like |x| at 0 are allowable. Main Theorem (Castro ’05): *

Sketch of the Proof - Approach

Controlling the Bias Not a problem after shift Potential Problem Area Cells intersecting the boundary may be pruned if ‘aligned’ with cell edge Solution: Repeat Stage 1 d-times, using d slightly offset partitions Small cells remaining in any of the d+1 partitions are passed on to Stage 2

Iterating the approach yields a L-step method Compare with minimax lower bound: Multi-Stage Approach

Passive Sampling: Learning PC Functions - Summary Active Sampling: This rates are nearly achieved using RDP-based estimators, that are easily implemented and have low computational complexity.

Spatially adaptive estimators based on “sparse” model selection (e.g., wavelet thresholding) may provide automatic mechanisms for guiding active learning processes Instead of choosing “where-to-sample” one can also choose “where-to- compute” to actively reduce computation. Can active learning provably work in even more realistic situations and under little or no prior assumptions ? Spatial Adaptivity and Active Learning

Piecewise Constant Functions – d =1 Consider first the simplest non-homogenous function class step function This is a parametric class

Passive Sampling Distribute sample points uniformly over [0,1] and use a maximum likelihood estimator

Active Sampling

Learning Rates – d =1 Passive Sampling: Active Sampling: (Burnashev & Zigangirov ’74)

Sketch of the Proof - Stage 1 Intuition tells us that this should be the error we experience away from the boundary Error due to approximation of the boundary regions estimation error

Sketch of the Proof - Stage 1 Key: Limit the resolution of the RDPs 1/k This is the performance away from the boundary 1/k

Sketch of the Proof - Stage 1 Are we finding more than the boundary? Lemma: At least we are not detecting too many areas outside the boundary.

Sketch of the Proof - Stage 2 n/2 more samples distributed uniformly over the boundary Total error contribution from boundary zone:

Sketch of the Proof – Overall Error Error away from the boundary Balancing the two errors yields Error in the boundary region