Presentation is loading. Please wait.

Presentation is loading. Please wait.

Batch online learning Toyota Technological Institute (TTI)transductive [Littlestone89] i.i.d.i.i.d. Sham KakadeAdam Kalai.

Similar presentations


Presentation on theme: "Batch online learning Toyota Technological Institute (TTI)transductive [Littlestone89] i.i.d.i.i.d. Sham KakadeAdam Kalai."— Presentation transcript:

1 batch online learning Toyota Technological Institute (TTI)transductive [Littlestone89] i.i.d.i.i.d. Sham KakadeAdam Kalai

2 Family of functions F ( e.g. halfspaces ) Batch learning vs. –+ dist.  over X £ {–,+} – – + – – – – – – – + + + + – – – – – – – – + + + + + + + + + + + + + + + + + + – ––– + + + h Online learning –+ (x 1,y 1 )…(x n,y n ) 2 X £ {–,+}X h1h1 arbitrary– x1x1 X [Kearns,Sch- Agnostic model [Kearns,Sch- apire,Sellie94] apire,Sellie94] Alg. H (x 1,y 1 ),…,(x n,y n )h 2 F Def. H learns F if, 8  E[err(h)] · min f 2F err(f)+n -c and H runs in time poly(n) dist  ERM = “best on data”

3 h2h2 Family of functions F ( e.g. halfspaces ) Batch learning vs. –+ dist.  over X £ {–,+} – – + – – – – – – – + + + + – – – – – – – – + + + + + + + + + + + + + + + + + + – ––– + + + X h Online learning –+ (x 1,y 1 )…(x n,y n ) 2 X £ {–,+}X arbitrary– x1x1 + x2x2 ERM = “best on data”

4 Family of functions F ( e.g. halfspaces ) Batch learning vs. –+ dist.  over X £ {–,+} – – + – – – – – – – + + + + – – – – – – – – + + + + + + + + + + + + + + + + + + – ––– + + + X h Online learning –+ (x 1,y 1 )…(x n,y n ) 2 X £ {–,+}X h3h3 arbitrary– x1x1 + x2x2 + x3x3 Goal: err(alg) · min f 2F err(f) + ERM = “best on data”

5 – x1x1 + x2x2 + x3x3 Family of functions F ( e.g. halfspaces ) Batch learning vs. –+ dist.  over X £ {–,+} – – + – – – – – – – + + + + – – – – – – – – + + + + + + + + + + + + + + + + + + – ––– + + + X h Online learning –+ (x 1,y 1 )…(x n,y n ) 2 X £ {–,+}X h3h3 arbitrary – x1x1 + x2x2 + x3x3 – x4x4 + x5x5+ x6x6 Goal: err(alg) · min f 2F err(f) + ERM = “best on data”

6 [Ben-David,Kushilevitz,Mansour95] h3h3 h2h2 Family of functions F ( e.g. halfspaces ) Batch learning vs. –+ dist.  over X £ {–,+} ERM = “best on data” – – + – – – – – – – + + + + – – – – – – – – + + + + + + + + + + + + + + + + + + – ––– + + + X h Online learning Goal: err(alg) · min f 2F err(f) + –+ (x 1,y 1 )…(x n,y n ) 2 X £ {–,+}X arbitrary Transductive............................................... equivalent “proper” learning output h (i) 2F h1h1– x1x1 + x2x2 + x3x3 Analogous definition: Alg. H (x 1,y 1 ),…,(x i-1,y i-1 ) hi 2 Fhi 2 F H learns F if, 8 (x 1,y 1 ),…,(x n,y n ): E[err(H)] · min f 2F err(f)+n -c and H runs in time poly(n) {x 1,x 2,…,x n }

7 Our results Theorem 1. In online trans. setting, H HERM requires one ERM computation per sample. Theorem 2. These are equivalent for proper learning:, F is agnostically learnable, ERM agnostically learns F (ERM can be done efficiently and VC( F ) is finite), F is online transductively learnable, H, HERM online transductively learns F H HERM = Hallucination + ERM H 4

8 Online ERM algorithm – x 1 = (0,0) y 1 = – + x 2 = (0,0) y 2 = + – x 3 = (0,0) y 3 = – + x 4 = (0,0) y 4 = + … Choose h i 2 F with minimal errors on (x 1,y 1 ),…,(x i-1,y i-1 ) h i = argmin f 2F | { j<i| f(x j )  y j } | + h 1 (x) = + – h 2 (x) = – + h 3 (x) = + – h 4 (x) = – … –+ F = {–,+} X = { (0,0) } (sucks)

9 Online ERM algorithm err(ERM) · min f 2F err(f) + P i 2 {1, …,n} [h i  h i+1 ] Choose h i 2 F with minimal errors on (x 1,y 1 ),…,(x i-1,y i-1 ) h i = argmin f 2F | { j<i| f(x j )  y j } | Online “stability” lemma: [KVempala01] Proof by induction on n = #examples easy !

10 random from {1,2,…,R} H Online HERM algorithm Inputs:  ={x 1,x 2,…,x n }, int R + – For each x 2  hallucinate r x copies of (x, + ) & r x copies of (x, – ) Choose h i 2 F that minimizes errors on hallucinated data + (x 1,y 1 ),…,(x i-1,y i-1 ) + - P r x i,r x i [h i  h i+1 ] · R -1 + - Stability: 8 i, (x i, + ), (x i, + ) +++ (x i, + ),(x i, + ),…,(x i, + ) rxirxi + … James Hannan

11 random from {1,2,…,R} H Online HERM algorithm Inputs:  ={x 1,x 2,…,x n }, int R + – For each x 2  hallucinate r x copies of (x, + ) & r x copies of (x, – ) Choose h i 2 F that minimizes errors on hallucinated data + (x 1,y 1 ),…,(x i-1,y i-1 ) + - P r x i,r x i [h i  h i+1 ] · R -1 + - Stability: 8 i, Online “stability” lemma Hallucination cost Theorem 1 For R=n ¼ : It requires one ERM computation per example. H 4

12 Being more adaptive (shifting bounds) (x i,y i ),…(x i+W,y i+W ) (x 1,y 1 ),…,(x i,y i ),…(x i+W,y i+W ),…(x n,y n ) window 4

13 Related work Inequivalence of batch and online learning in noiseless setting –ERM black box is noiseless –For computational reasons! Inefficient alg. for online trans. learning: –List all · (n+1) VC( F ) labelings (Sauer’s lemma) –Run weighted majority  [Ben-David,Kushilevitz,Mansour95]  [Blum90,Balcan06]  [Littlestone,Warmuth92]

14 Alg. for removing iid assumption, efficiently, using unlabeled data Interesting way to use unlabeled data online, reminiscent of bootstrap/bagging Adaptive version: can do well on every window Find “right” algorithm/analysisFind “right” algorithm/analysis Conclusions


Download ppt "Batch online learning Toyota Technological Institute (TTI)transductive [Littlestone89] i.i.d.i.i.d. Sham KakadeAdam Kalai."

Similar presentations


Ads by Google