Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Slides:



Advertisements
Similar presentations
Efficient classification for metric data Lee-Ad GottliebWeizmann Institute Aryeh KontorovichBen Gurion U. Robert KrauthgamerWeizmann Institute TexPoint.
Advertisements

On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Structural reliability analysis with probability- boxes Hao Zhang School of Civil Engineering, University of Sydney, NSW 2006, Australia Michael Beer Institute.
Small Subgraphs in Random Graphs and the Power of Multiple Choices The Online Case Torsten Mütze, ETH Zürich Joint work with Reto Spöhel and Henning Thomas.
Yue Han and Lei Yu Binghamton University.
1 Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: UAI-2012 Catalina Island,
Matrix Concentration Nick Harvey University of British Columbia TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
A general agnostic active learning algorithm
1 Truthful Mechanism for Facility Allocation: A Characterization and Improvement of Approximation Ratio Pinyan Lu, MSR Asia Yajun Wang, MSR Asia Yuan Zhou,
Visual Recognition Tutorial
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A.
The Rate of Convergence of AdaBoost Indraneel Mukherjee Cynthia Rudin Rob Schapire.
The Nature of Statistical Learning Theory by V. Vapnik
The Mean Square Error (MSE):. Now, Examples: 1) 2)
Active Learning of Binary Classifiers
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Computational Learning Theory
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research.
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
Parametric Inference.
Dasgupta, Kalai & Monteleoni COLT 2005 Analysis of perceptron-based active learning Sanjoy Dasgupta, UCSD Adam Tauman Kalai, TTI-Chicago Claire Monteleoni,
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Maximum Likelihood (ML), Expectation Maximization (EM)
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department
SVM Support Vectors Machines
Visual Recognition Tutorial
Small Subgraphs in Random Graphs and the Power of Multiple Choices The Online Case Torsten Mütze, ETH Zürich Joint work with Reto Spöhel and Henning Thomas.
Experimental Evaluation
Eigenvectors of random graphs: nodal domains James R. Lee University of Washington Yael Dekel and Nati Linial Hebrew University TexPoint fonts used in.
Diffusion Maps and Spectral Clustering
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. Concept Learning Reference : Ch2 in Mitchell’s book 1. Concepts: Inductive learning hypothesis General-to-specific.
Small subgraphs in the Achlioptas process Reto Spöhel, ETH Zürich Joint work with Torsten Mütze and Henning Thomas TexPoint fonts used in EMF. Read the.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Sparse Matrix Factorizations for Hyperspectral Unmixing John Wright Visual Computing Group Microsoft Research Asia Sept. 30, 2010 TexPoint fonts used in.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Universit at Dortmund, LS VIII
Online Passive-Aggressive Algorithms Shai Shalev-Shwartz joint work with Koby Crammer, Ofer Dekel & Yoram Singer The Hebrew University Jerusalem, Israel.
Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison November 14, 2015 TexPoint.
C&O 355 Mathematical Programming Fall 2010 Lecture 16 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines Tao Department of computer science University of Illinois.
CPSC 536N Sparse Approximations Winter 2013 Lecture 1 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Tight Bound for the Gap Hamming Distance Problem Oded Regev Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
5.3 Algorithmic Stability Bounds Summarized by: Sang Kyun Lee.
SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad.
Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison March 3, 2016 TexPoint.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Nearly optimal classification for semimetrics
New Characterizations in Turnstile Streams with Applications
Regularized risk minimization
Spectral Clustering.
Computational Learning Theory
Computational Learning Theory
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
Presentation transcript:

Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel Transductive Rademacher Complexity and its Applications TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AAAA A A A AA A A A A A

Induction vs. Transduction Inductive learning: Distribution of examples training set learning algorithm hypothesislabels unlabeled examples Transductive learning (Vapnik ’74,’98): training set test set learning algorithm labels of the test set Goal: minimize

Distribution-free Model [Vapnik ’74,’98] X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X  Given: “Full sample” of unlabeled examples, each with its true (unknown) label.

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X  Full sample is partitioned:  training set ( m points)  test set ( u points) Distribution-free Model [Vapnik ’74,’98]

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X  Labels of the training examples are revealed.  Given: “Full sample” of unlabeled examples, each with its true (unknown) label.  Full sample is partitioned:  training set ( m points)  test set ( u points) Distribution-free Model [Vapnik ’74,’98]

 Labels of the training points are revealed. Goal: Label test examples X ? ? X ? ? ? ? X ? ? ? ? ? ? ? X ? X ? ? ? ? ? ? ? X ? ? ? ? ? ? X ? ? ?  Given: “Full sample” of unlabeled examples, each with its true (unknown) label.  Full sample is partitioned:  training set ( m points)  test set ( u points) Distribution-free Model [Vapnik ’74,’98]

Rademacher complexity Induction Hypothesis space : set of functions. - training points. - i.i.d. random variables, Rademacher: Transduction (version 1) Hypothesis space : set of vectors,. - full sample with training and test points. - distributed as in induction. Rademacher:

Transductive Rademacher complexity Version 1: - full sample with training and test points. - transductive hypothesis space. - i.i.d. random variables distributed by :. Rademacher complexity: Version 2: sparse distribution,, of Rademacher variables We develop risk bounds with. Lemma 1:.

Risk bound Notation: - 0/1 error of on test examples. - empirical -margin error of on training examples. Theorem: For any, with probability at least over the random partition of the full sample into, for all hypotheses it holds that. Proof: based on and inspired by the results of [McDiarmid, ‘89], [Bartlett and Mendelson, ‘02] and [Meir and Zhang, ‘03]. Previous results: [Lanckriet et al., ‘04] - case of.

Inductive vs. Transductive hypothesis spaces Induction: To use the risk bounds, the hypothesis space should be defined before observing the training set. Transduction: The hypothesis space can be defined after observing, but before observing the actual partition. Conclusion: Transduction allows for the choosing a data-dependent hypothesis space. For example, it can be optimized to have low Rademacher complexity. This cannot be done in induction!

Another view on transductive algorithms learner compute matrix vector Example: - inverse of graph Laplacian iff ; otherwise. Unlabeled-Labeled Decomposition (ULD)

Bounding Rademacher complexity Hypothesis space : the set of all, obtained by operating transductive algorithm on all possible partitions. Notation:, - set of ‘s generated by. - all singular values of. Lemma 2: Lemma 2 justifies the spectral transformations performed to improve the performance of transductive algorithms ([Chapelle et al.,’02], [Joachims,’03], [Zhang and Ando,‘05])..

Bounds for graph-based algorithms Consistency method [Zhou, Bousquet, Lal, Weston, Scholkopf, ‘03]: where are singular values of. Similar bounds for the algorithms of [Joachims,’03], [Belkin et al., ‘04], etc.

Topics not covered Bounding the Rademacher complexity when is a kernel matrix. For some algorithms: data-dependent method of computing probabilistic upper and lower bounds on Rademacher complexity. Risk bound for transductive mixtures.

Direction for future research Tighten the risk bound to allow effective model selection: Bound depending on 0/1 empirical error. Usage of variance information to obtain better convergence rate. Local transductive Rademacher complexity. Clever data-dependent choice of low-Rademacher hypothesis spaces.

Monte Carlo estimation of transductive Rademacher complexity Rademacher:. Draw uniformly vectors of Rademacher variables,. By Hoeffding inequality: for any, with prob. at least,. How to compute the supremum? For the Consistency Method of [Zhou et al., ‘03] can be computed in time. Symmetric Hoeffding inequality probabilistic lower bound on the transductive Rademacher complexity.

Induction vs. Transduction: differences Induction Unknown underlying distribution Transduction No unknown distribution. Each example has unique label. Test examples not known. Will be sampled from the same distribution. Test examples are known. Generate a general hypothesis. Want generalization! Only classify given examples. No generalization! Independent training examples. Dependent training and test examples.

Justification of spectral transformations, - set of ‘s generated by. - all singular values of. Lemma 2:. Lemma 2 justifies the spectral transformations performed to improve the performance of transductive algorithms ([Chapelle et al.,’02], [Joachims,’03], [Zhang and Ando,‘05]).