Download presentation
Presentation is loading. Please wait.
Published bySuzan Gaines Modified over 9 years ago
1
Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua On Agnostic Boosting and Parity Learning
2
Defs Agnostic Learning = learning with adversarial noise Boosting = turn weak learner into strong learner Parities = parities of subsets of the bits f:{0,1} n →{0,1}. f(x)=x 1 x 3 x 7 1. Agnostic Boosting Turning a weak agnostic learner to a strong agnostic learner 2. 2 O(n/logn) -time algorithm for agnostically learning parities over any distribution Outline
3
Agnostic Booster Agnostic boosting Weak learner. For any noise rate < ½ produces a better-than-trivial hypothesis Strong Learner. Produces almost-optimal hypothesis Runs weak learner as black box
4
Learning with Noise Learning without noise well understood* with random noisewell understood* (SQ model) with agnostic noise * up to well-studied open problems (i.e. we know where we’re stuck) It’s, like, a really hard model!!!
5
Agnostic Learning: some known results classGround distributionnotes Halfspaces Kalai, Klivans, Mansour, Servedio] uniform, log-concave Parities [Feldman, Gopalan, Khot, Ponnuswami] uniform2 O(n/logn) Decision Trees [Gopalan, Kalai, Klivans] uniformwith MQ Disjunctions Kalai, Klivans, Mansour, Servedio] all distributions2 O(√n) ???all distributions
6
Agnostic Learning: some known results classGround distributionnotes Halfspaces Kalai, Klivans, Mansour, Servedio] uniform, log-concave Parities [Feldman, Gopalan, Khot, Ponnuswami] uniform2 O(n/logn) Decision Trees [Gopalan, Kalai, Klivans] uniformwith MQ Disjunctions Kalai, Klivans, Mansour, Servedio] all distributions2 O(√n) ???all distributions Due to hardness, or lack of tools??? Agnostic boosting: strong tool, makes it easier to design algorithms.
7
Why care about agnostic learning? More relevant in practice Impossibility results might be useful for building cryptosystems Non-noisy learning ≈ CSP Agnostic Learning ≈ MAX-CSP
8
Noisy learning No noise Random noise Adversarial (≈agnostic) noise f f f Learning algorithm. Should approximate g up to error + g Learning algorithm. Should approximate f up to error Learning algorithm. Should approximate f up to error f:{0,1} n →{0,1} from class F. alg gets samples where x is drawn from distribution D. allowed to corrupt -fraction % noise
9
Agnostic learning (geometric view) F opt opt + g PROPER LEARNING f Parameters: F, metric Input: oracle for g Goal: return some element of blue ball
10
Agnostic boosting weak learner err D (g,h)· ½ - w.h.p. h opt · ½ - g definition DD
11
Agnostic boosting weak learner err D (g,h)· ½ - w.h.p. h opt · ½ - g Agnostic Booster w.h.p. h’ Sample s from g Runs weak learner poly( times err D (g,h’) · opt + DD
12
Agnostic boosting ( , )-weak learner err D (g,h)· ½ - w.h.p. h opt · ½ - g Agnostic Booster w.h.p. h’ Sample s from g Runs weak learner poly( times err D (g,h’) · opt + + DD
13
Agnostic Booster Agnostic boosting Weak learner. For any noise rate < ½ produces a better-than-trivial hypothesis Strong Learner. Produces almost-optimal hypothesis
14
“Approximation Booster” Analogy poly-time MAX-3-SAT algorithm that when opt=7/8+ε produces solution with value 7/8+ε 100 algorithm for MAX-3-SAT produces solution with value opt + running time poly(n,
15
Gap ½01 No hardness gap close to ½ booster no gap anywhere (additive PTAS)
16
Agnostic boosting New Analysis for Mansour-McAllester booster. uses branching programs; nodes are weak hypotheses Previous Agnostic Boosting: Ben-David+Long+Mansour, and Gavinsky, defined agnostic boosting differently. Their result cannot be used for our application
17
Booster h1h1 x h 1 (x)=1 h 1 (x)=0 1 0
18
Booster: Split step h1h1 x h 1 (x)=1 h 1 (x)=0 1 h2h2 h 2 (x)=1 h 2 (x)=0 1 0 different distribution h1h1 h 1 (x)=1 h 1 (x)=0 h2’h2’ h 2 ‘(x)=1 h 2 ‘(x)=0 1 0 0 different distribution choose the “better” option
19
Booster: Split step h1h1 x h 1 (x)=1 h 1 (x)=0 1 h2h2 h 2 (x)=1 h 2 (x)=0 0 h3h3 h 3 (x)=1 H 3 (x)=0 1 0
20
Booster: Split step h1h1 x h 1 (x)=1 h 1 (x)=0 h2h2 h 2 (x)=1 h 2 (x)=0 0 h3h3 h 3 (x)=1 H 3 (x)=0 1 0 h4h4 h 4 (x)=1 H 4 (x)=0 1 0 …
21
Booster: Merge step h1h1 x h 1 (x)=1 h 1 (x)=0 h2h2 h 2 (x)=1 h 2 (x)=0 0 h3h3 h 3 (x)=1 H 3 (x)=0 1 0 h4h4 h 4 (x)=1 H 4 (x)=0 1 0 Merge if “similar”
22
Booster: Merge step h1h1 x h 1 (x)=1 h 1 (x)=0 h2h2 h 2 (x)=1 h 2 (x)=0 0 h3h3 h 3 (x)=1 H 3 (x)=0 0 h4h4 h 4 (x)=1 H 4 (x)=0 1 0
23
Booster: Another split step h1h1 x h 1 (x)=1 h 1 (x)=0 h2h2 h 2 (x)=1 h 2 (x)=0 0 h3h3 h 3 (x)=1 H 3 (x)=0 0 h4h4 h 4 (x)=1 H 4 (x)=0 0 h5h5 01 …
24
Booster: final result h1h1 x h1h1 h1h1 h1h1 h1h1 h1h1 h1h1 h1h1 h1h1 h1h1 h1h1 0 1
25
Agnostically learning parities
26
Application: Parity with Noise Uniform distribution Any distribution Random Noise2 O(n/logn) [Blum Kalai Wasserman] Agnostic learning 2 O(n/logn) [Feldman Gopalan Khot Ponnuswami], via Fourier 2 O(n/logn) This work* * non-proper learner. hypothesis is circuit with 2 O(n/logn) gates Feldman et al give black-box reduction to random-noise case. We give direct result Theorem: ε, have weak learner that for noise ½-ε produces an hypothesis which is wrong on ½-(2ε) n 0.001 /2 fraction of space. Running time 2 O(n/logn)
27
Corollary: Learners for many classes (without noise) Can learn without noise any class with “guaranteed correlated parity”, in time 2 O(n/logn) e.g. DNF, any others? A weak parity learner that runs in 2 O(n 0.32 ) time would beat the best algorithm known for learning DNF Good evidence that parity with noise is hard efficient cryptosystems [Hopper-Blum, Blum-Furst-etal, and many others] ?
28
Main Idea: 1. Take Learner which resists random noise (BKW) 2. Add Randomness to its behavior, until you get a Weak Agnostic learner. “Between two evils, I pick the one I haven’t tried before” – Mae West “Between two evils, I pick uniformly at random” – CS folklore Idea of weak agnostic parity learner
29
Summary Problem: It is difficult but perhaps possible to design agnostic learning algorithms. Proposed Solution: Agnostic Boosting. Contributions: 1. Right(er) definition for weak agnostic learner 2. Agnostic boosting 3. Learning Parity with noise in hardest noise model 4. Entertaining STOC ’08 participants
30
Open Problems 1. Find other applications for Agnostic Boosting 2. Improve PwN algorithms. Get proper learner for parity with noise Reduce PwN with agnostic noise to PwN with random noise 3. Get evidence that PwN is hard Prove that if parity with noise is easy then FACTORING is easy. 128$ reward!
31
May the parity be with you! The end.
32
Sketch of weak parity learner
33
Weak parity learner Sample labeled points from distribution, sample unlabeled x, let’s guess f(x) + ++ to next round Bucket according to last 2n/logn bits
34
Weak parity learner + ++ LAST ROUND: =0 √n vectors with sum=0. gives guess for f(x)
35
Weak parity learner + ++ LAST ROUND: =0 √n vectors with sum=0. gives guess for f(x) by symmetry, prob. of mistake = %mistakes Claim: %mistakes (Cauchy-Schwartz)
36
Intuition behind two main parts
37
Intuition behind Boosting
38
decrease weight increase weight
39
Intuition behind Boosting decrease weight increase weight 2 1 0 Run, reweight, run, reweight, …. Take majority of hypotheses. Algorithmic & Efficient Yao-von Neumann Minimax Principle 1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.