Download presentation
Presentation is loading. Please wait.
Published byThomas Pitts Modified over 9 years ago
1
Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami
2
Spam Problem 10 MillionLotteryCheapPharmacyJunkIs Spam YES NOYESNOSPAM NOYES NOYESNOT SPAM YES SPAM NO YES SPAM YESNOYESNOYESNOT SPAM NOYESNO SPAM 1 1 0 1 0 2 X 1 + 3 X 1 + 3 X 0 + 1 X 1 + 7 X 0 = 6 6 > 3 Output SPAM PERCEPTRON 3 2 1 3 3 7
3
Halfspace Learning Problem Input: Training Samples Vectors : W 1,W 2,…W m {-1,1 } n Labels : l 1, l 2,…l m {-1,1} + + + + + + + + - - - - - - - - X Y Output: Separating Halfspace:(A, θ) A ∙ W i < θ if l i =-1 A ∙ W i ≥ θ if l i =1 θ - Threshold SPAM NOT SPAM
4
Perspective Perceptron classifiers are the simplest neural networks – widely used for classification. Perceptron learning algorithms can learn if the data is perfectly separated. + + + + + + + + - - - - - - - - X - + - + SPAM NOT SPAM - + + -
5
Inseparability Who said Halfspaces can classify SPAM vs NOT SPAM? Data is inherently inseparable - Agnostic Learning Even if data is separable, what about Noise? inherent in many forms of data PAC learning
6
In Presence of Noise Agreement : fraction of the examples classified correctly + + + + + + + + - - - - - - - - X Y - + - + Classifies correctly 16 of the 20 examples : Agreement = 0.8 or 80% ‘Find the hyperplane that maximizes the agreement with training examples’ Halfspace Maximum Agreement (HSMA) Problem
7
Related Work : Positive Results Random Classification Noise [Blum-Freize-Kannan-Vempala 96] : a PAC learning algorithm that outputs a decision list of halfspaces [Cohen 97] : a proper learning algorithm(outputs a halfspace) for learning halfspaces Distribution of examples [Kalai-Klivans-Mansour-Servedio 05] : an algorithm that finds a close to optimal halfspace when examples are from uniform or any log-concave distribution. Each label flipped with probability less than 1/2
8
Related Work : Negative Results [Amaldi-Kann 98, Ben-David-Eiron-Long 92] HSMA is NP-hard to approximate with some constant factor [261/262, 415/418] [Bshouty-Burroghs 02] HSMA is NP-hard to approximate better than 84/85 [Arora-Babai-Stern-Sweedyk 97, Amaldi-Kann 98] NP-hard to minimize disagreements within a factor of 2 O(log n)
9
Open Problem Given that 99.9% of the examples are correct : No algorithm known that finds a halfspace with agreement of 51% No hardness result ruled out getting an agreement of 99% Closing this gap was stated as an open problem by [Blum-Frieze-Kannan-Vempala 96] Highlighted in recent work by [Feldman 06] on (1-ε,1/2 +δ) tight hardness of learning monomials
10
Our Result For any ε,δ > 0, given a set of training examples, it is NP-hard to distinguish between following two cases: There is a halfspace with agreement 1- ε No halfspace has agreement greater than ½ + δ Even with 99.9% of examples non-noisy, the best we can do is output a random/trivial halfspace!
11
Remarks [Feldman-Gopalan-Khot-Ponnuswami 06] independently showed a similar result. – Our Hardness result holds even for boolean examples {-1,1} n (their result holds for R n ) – [Feldman et al.]’s hardness result gives stronger hardness in the sub-constant regime We also show: Given a set of linear equations over integers that is 1-ε satisfiable it is NP-hard to find an assignment that satisfies more than δ fraction of the equations
12
Linear Inequalities Let halfspace be a 1 x 1 + a 2 x 2 +… +a n x n ≥ θ Suppose W 1 = (-1, 1, -1, 1) l 1 = 1 Constraint : a 1 (-1) + a 2 (1) + a 3 (-1)+ a 4 (1) ≥ θ Learning a Halfspace Solving a system of linear inequalities Unknowns A = (a 1,a 2,a 3,a 4 ) θ a 1 + a 2 + a 3 + a 4 ≥ θ a 1 + a 2 + a 3 - a 4 < θ a 1 + a 2 - a 3 + a 4 < θ a 1 + a 2 - a 3 + a 4 ≥ θ a 1 - a 2 + a 3 - a 4 ≥ θ a 1 - a 2 + a 3 + a 4 < θ a 1 + a 2 - a 3 - a 4 < θ
13
Label Cover Problem U, V : set of vertices E : set of edges {1,2… R} : set of labels π e : constraint on edge e An assignment A satisfies an edge e = (u,v) E if π e (A(u)) = A(v) 123..R123..R 123..R123..R πeπe U V u v Find an assignment A that satisfies maximum number of edges 3 π e (3)=2 7 5 2 3 3 1 4 1 2 5 6
14
Hardness of Label Cover There exists γ > 0 such that Given a label cover instance Г =(U,V,E,R,π), it is NP-hard to distinguish between : Г is completely satisfiable No assignment satisfies more than 1/R γ fraction of the edges. [Raz 98]
15
a 1 + a 2 + a 3 + a 4 ≥ θ a 1 + a 2 + a 3 - a 4 < θ a 1 + a 2 - a 3 + a 4 < θ a 1 + a 2 - a 3 + a 4 ≥ θ a 1 - a 2 + a 3 - a 4 ≥ θ a 1 - a 2 + a 3 + a 4 < θ a 1 + a 2 - a 3 - a 4 < θ Aim U V Variables : a 1,a 2,a 3,a 4, θ SATISFIABLE 1/R γ SATISFIABLE Homogenous inequalities with +1, -1 coefficients
16
Variables For each vertex u, R variables : u 1,u 2,…,u R U V 123..R123..R If u is assigned label k then u k = 1 and u j = 0 for all j ≠k
17
Equation Tuples All vertices are assigned exactly one label 123..R123..R 123..R123..R πeπe u v Most of the variables are zero For all u u 1 + u 2 +.. u R = 1 For all u,v u 1 + u 2 +.. u R - (v 1 + v 2 +.. v R ) = 0 For all constraints π e all 1 ≤ k ≤ R ∑u i = v k summation over all i, π e (i) = k u 1 – v 1 = 0 u 2 + u 3 – v 2 = 0 Pick randomly t variables u i u i = 0 OVER ALL RANDOM CHOICES EQUATION TUPLE
18
There is an assignment that satisfies most of the equation tuples Equation Tuples SATISFIABLE 1/R γ SATISFIABLE Suppose u 2 + u 3 – v 2 = 0 is an equation |u 2 + u 3 – v 2 | > ε (u 1 + u 2 +.. u R ) Scaling Factor : u 1 + u 2 +.. u R
19
Next Step u 1 – v 1 = 0 u 2 + u 3 – v 2 = 0 u 1 + u 2 + u 3 – v 1 –v 2 – v 3 = 0 u 1 = 0 u 3 + v 1 – v 2 = 0 One Unsatisfied equation Most tuples have C equations that are not even approximately satisfied Introduce Several copies of the variables Add consistency checks between the different copies of the same variable Each variable appears exactly once in a tuple, with coefficient +1, -1
20
Recap SATISFIABLE 1/R γ SATISFIABLE Most tuples have C equations that are not even approximately satisfied Most tuples are completely satisfied Each variable appears exactly once in a tuple, with coeffcient +1, -1 Using linear inequalities distinguish between a tuple that is Completely Satisfied Atleast C of its equations are not even approximately satisfied
21
Observation A – B < 0 A + B ≥ 0 B > 0 |A| < B u 1 – v 1 = 0 u 4 + u 5 – v 2 = 0 u 6 + u 2 + u 7 – v 4 –v 5 – v 6 = 0 u 3 = 0 u 8 + v 3 – v 7 = 0 X 1+ X -1+ X 1+ X -1+ u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 - u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 = u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 - u 1 + u 2 +.. u R < 0 u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 + u 1 + u 2 +.. u R ≥ 0 Pick one of the equation tuples at random Scaling Factor : u 1 + u 2 +.. u R
22
Good Case u 1 – v 1 u 2 + u 3 – v 2 u 1 + u 2 + u 3 – v 1 –v 2 – v 3 u 1 u 3 + v 1 – v 2 = 0 u 1 - u 2 + u 3 +u 4 +u 5 -u 6 - u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 = 0 The assignment also satisfies, u 1 + u 2 +.. u R = 1 BOTH INEQUALITIES SATISFIED With high probability over the choice of tuples u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 - u 1 + u 2 +.. u R < 0 u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 + u 1 + u 2 +.. u R ≥ 0
23
Bad Case u 1 – v 1 u 2 + u 3 – v 2 u 1 + u 2 + u 3 – v 1 –v 2 – v 3 u 1 u 3 + v 1 – v 2 > ε (u 1 + u 2 +.. u R ) > ε (u 1 + u 2 +.. u R ) For large enough C,With high probability over choice of +1,-1 combination, | u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 | > (u 1 + u 2 +.. u R ) ATMOST ONE OF INEQUALITIES SATISFIED With high probability over choice of equaton tuple, u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 - u 1 + u 2 +.. u R < 0 u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 + u 1 + u 2 +.. u R ≥ 0
24
For any vector v = (v 1,v 2,…v n ) with sufficiently many large coordinates(> ε ), at least 1- δ fraction of the vectors u S satisfy |u∙v| > 1 Interesting Set of Vectors All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1} n such that Construction using 4-wise independent family and random grouping of coordinates
25
Construction V1V2V3V4V5V6V7V1V2V3V4V5V6V7 > ε 1 1 1 ∙ -V 1 +V 2 -V 3 +V 4 -V 5 +V 6 +V 7 = > 1 Four-wise independent family : some constant probability All 2 n combinations : probability close to 1
26
=S 5 ε ε = L 1 1 =S 10 =S 2 = S 1 ε ε ε ε Construction V 1 V 2 V 3 V 4 V 5 V 6 V 7.. V 89 V 99 V 100 V 101 V 102 V 103 V 1 …. V 8 V 2.. V 5 V 100 … V 101 V 103 V 9 … V 89 V 6.. 1 … 1 …. 1 … 1 1 +1 1 ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε All 2 n combinations 4-wise independent set By independence of grouping By Chernoff Bounds All 2 n combinations
27
Conclusion Either an assumption on the distribution of examples or the noise is necessary for efficient halfspace learning algorithms. [Raghavendra-Venkatesan] Similar hardness result for learning Support vector machines in presence of adversarial noise.
28
THANK YOU
29
Details All possible {-1,1} combinations is an exponentially large set. No variable should occur more than once in an equation tuple, to ensure that ultimately the inequalities all have coefficients in {-1,1} Construction using 4-wise independent family and random grouping of coordinates Use different copies of the variables for different equations, and careful choice of consistency checks
30
For any vector v = (v 1,v 2,…v n ) with sufficiently many large coordinates(> ε ), atmost δ fraction of the vectors u S satisfy |u∙v| < 1 Interesting Set of Vectors All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1} n such that Construction using 4-wise independent family and random grouping of coordinates
31
Equation Tuple u 1 – v 1 = 0 u 2 + u 3 – v 2 = 0 u 1 + u 2 + u 3 – v 1 –v 2 – v 3 = 0 u 1 = 0 u 3 + v 1 – v 2 = 0 ε -Satisfaction An assignment A is said to ε- satisfy an equation E tuple if it satisfies all the equations in the tuple u 2 + u 3 – v 2 < ε (u 1 + u 2 + u 3 )
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.