Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications.

Similar presentations


Presentation on theme: "Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications."— Presentation transcript:

1 Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AAAA A A A AA A A A A A

2 Induction vs. Transduction Inductive learning: Distribution of examples training set learning algorithm hypothesislabels unlabeled examples Transductive learning (Vapnik ’74,’98): training set test set learning algorithm labels of the test set Goal: minimize

3 Distribution-free Model [Vapnik ’74,’98] X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X  Given: “Full sample” of unlabeled examples, each with its true (unknown) label.

4 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X  Full sample is partitioned:  training set ( m points)  test set ( u points) Distribution-free Model [Vapnik ’74,’98]

5 X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X  Labels of the training examples are revealed.  Given: “Full sample” of unlabeled examples, each with its true (unknown) label.  Full sample is partitioned:  training set ( m points)  test set ( u points) Distribution-free Model [Vapnik ’74,’98]

6  Labels of the training points are revealed. Goal: Label test examples X ? ? X ? ? ? ? X ? ? ? ? ? ? ? X ? X ? ? ? ? ? ? ? X ? ? ? ? ? ? X ? ? ?  Given: “Full sample” of unlabeled examples, each with its true (unknown) label.  Full sample is partitioned:  training set ( m points)  test set ( u points) Distribution-free Model [Vapnik ’74,’98]

7 Rademacher complexity Induction Hypothesis space : set of functions. - training points. - i.i.d. random variables, Rademacher: Transduction (version 1) Hypothesis space : set of vectors,. - full sample with training and test points. - distributed as in induction. Rademacher:

8 Transductive Rademacher complexity Version 1: - full sample with training and test points. - transductive hypothesis space. - i.i.d. random variables distributed by :. Rademacher complexity: Version 2: sparse distribution,, of Rademacher variables We develop risk bounds with. Lemma 1:.

9 Risk bound Notation: - 0/1 error of on test examples. - empirical -margin error of on training examples. Theorem: For any, with probability at least over the random partition of the full sample into, for all hypotheses it holds that. Proof: based on and inspired by the results of [McDiarmid, ‘89], [Bartlett and Mendelson, ‘02] and [Meir and Zhang, ‘03]. Previous results: [Lanckriet et al., ‘04] - case of.

10 Inductive vs. Transductive hypothesis spaces Induction: To use the risk bounds, the hypothesis space should be defined before observing the training set. Transduction: The hypothesis space can be defined after observing, but before observing the actual partition. Conclusion: Transduction allows for the choosing a data-dependent hypothesis space. For example, it can be optimized to have low Rademacher complexity. This cannot be done in induction!

11 Another view on transductive algorithms learner compute matrix vector Example: - inverse of graph Laplacian iff ; otherwise. Unlabeled-Labeled Decomposition (ULD)

12 Bounding Rademacher complexity Hypothesis space : the set of all, obtained by operating transductive algorithm on all possible partitions. Notation:, - set of ‘s generated by. - all singular values of. Lemma 2: Lemma 2 justifies the spectral transformations performed to improve the performance of transductive algorithms ([Chapelle et al.,’02], [Joachims,’03], [Zhang and Ando,‘05])..

13 Bounds for graph-based algorithms Consistency method [Zhou, Bousquet, Lal, Weston, Scholkopf, ‘03]: where are singular values of. Similar bounds for the algorithms of [Joachims,’03], [Belkin et al., ‘04], etc.

14 Topics not covered Bounding the Rademacher complexity when is a kernel matrix. For some algorithms: data-dependent method of computing probabilistic upper and lower bounds on Rademacher complexity. Risk bound for transductive mixtures.

15 Direction for future research Tighten the risk bound to allow effective model selection: Bound depending on 0/1 empirical error. Usage of variance information to obtain better convergence rate. Local transductive Rademacher complexity. Clever data-dependent choice of low-Rademacher hypothesis spaces.

16

17 Monte Carlo estimation of transductive Rademacher complexity Rademacher:. Draw uniformly vectors of Rademacher variables,. By Hoeffding inequality: for any, with prob. at least,. How to compute the supremum? For the Consistency Method of [Zhou et al., ‘03] can be computed in time. Symmetric Hoeffding inequality probabilistic lower bound on the transductive Rademacher complexity.

18 Induction vs. Transduction: differences Induction Unknown underlying distribution Transduction No unknown distribution. Each example has unique label. Test examples not known. Will be sampled from the same distribution. Test examples are known. Generate a general hypothesis. Want generalization! Only classify given examples. No generalization! Independent training examples. Dependent training and test examples.

19 Justification of spectral transformations, - set of ‘s generated by. - all singular values of. Lemma 2:. Lemma 2 justifies the spectral transformations performed to improve the performance of transductive algorithms ([Chapelle et al.,’02], [Joachims,’03], [Zhang and Ando,‘05]).


Download ppt "Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel 24.08.2007 Transductive Rademacher Complexity and its Applications."

Similar presentations


Ads by Google