Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami.

Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami

Spam Problem 10 MillionLotteryCheapPharmacyJunkIs Spam YES NOYESNOSPAM NOYES NOYESNOT SPAM YES SPAM NO YES SPAM YESNOYESNOYESNOT SPAM NOYESNO SPAM 1 1 0 1 0 2 X 1 + 3 X 1 + 3 X 0 + 1 X 1 + 7 X 0 = 6 6 > 3 Output SPAM PERCEPTRON 3 2 1 3 3 7

Halfspace Learning Problem Input: Training Samples Vectors : W 1,W 2,…W m {-1,1 } n Labels : l 1, l 2,…l m {-1,1} + + + + + + + + - - - - - - - - X Y Output: Separating Halfspace:(A, θ) A ∙ W i < θ if l i =-1 A ∙ W i ≥ θ if l i =1 θ - Threshold SPAM NOT SPAM

Perspective Perceptron classifiers are the simplest neural networks – widely used for classification. Perceptron learning algorithms can learn if the data is perfectly separated. + + + + + + + + - - - - - - - - X - + - + SPAM NOT SPAM - + + -

Inseparability Who said Halfspaces can classify SPAM vs NOT SPAM? Data is inherently inseparable - Agnostic Learning Even if data is separable, what about Noise? inherent in many forms of data PAC learning

In Presence of Noise Agreement : fraction of the examples classified correctly + + + + + + + + - - - - - - - - X Y - + - + Classifies correctly 16 of the 20 examples : Agreement = 0.8 or 80% ‘Find the hyperplane that maximizes the agreement with training examples’ Halfspace Maximum Agreement (HSMA) Problem

Related Work : Positive Results Random Classification Noise [Blum-Freize-Kannan-Vempala 96] : a PAC learning algorithm that outputs a decision list of halfspaces [Cohen 97] : a proper learning algorithm(outputs a halfspace) for learning halfspaces Distribution of examples [Kalai-Klivans-Mansour-Servedio 05] : an algorithm that finds a close to optimal halfspace when examples are from uniform or any log-concave distribution. Each label flipped with probability less than 1/2

Related Work : Negative Results [Amaldi-Kann 98, Ben-David-Eiron-Long 92] HSMA is NP-hard to approximate with some constant factor [261/262, 415/418] [Bshouty-Burroghs 02] HSMA is NP-hard to approximate better than 84/85 [Arora-Babai-Stern-Sweedyk 97, Amaldi-Kann 98] NP-hard to minimize disagreements within a factor of 2 O(log n)

Open Problem Given that 99.9% of the examples are correct : No algorithm known that finds a halfspace with agreement of 51% No hardness result ruled out getting an agreement of 99% Closing this gap was stated as an open problem by [Blum-Frieze-Kannan-Vempala 96] Highlighted in recent work by [Feldman 06] on (1-ε,1/2 +δ) tight hardness of learning monomials

Our Result For any ε,δ > 0, given a set of training examples, it is NP-hard to distinguish between following two cases: There is a halfspace with agreement 1- ε No halfspace has agreement greater than ½ + δ Even with 99.9% of examples non-noisy, the best we can do is output a random/trivial halfspace!

Remarks [Feldman-Gopalan-Khot-Ponnuswami 06] independently showed a similar result. – Our Hardness result holds even for boolean examples {-1,1} n (their result holds for R n ) – [Feldman et al.]’s hardness result gives stronger hardness in the sub-constant regime We also show: Given a set of linear equations over integers that is 1-ε satisfiable it is NP-hard to find an assignment that satisfies more than δ fraction of the equations

Linear Inequalities Let halfspace be a 1 x 1 + a 2 x 2 +… +a n x n ≥ θ Suppose W 1 = (-1, 1, -1, 1) l 1 = 1 Constraint : a 1 (-1) + a 2 (1) + a 3 (-1)+ a 4 (1) ≥ θ Learning a Halfspace Solving a system of linear inequalities Unknowns A = (a 1,a 2,a 3,a 4 ) θ a 1 + a 2 + a 3 + a 4 ≥ θ a 1 + a 2 + a 3 - a 4 < θ a 1 + a 2 - a 3 + a 4 < θ a 1 + a 2 - a 3 + a 4 ≥ θ a 1 - a 2 + a 3 - a 4 ≥ θ a 1 - a 2 + a 3 + a 4 < θ a 1 + a 2 - a 3 - a 4 < θ

Label Cover Problem U, V : set of vertices E : set of edges {1,2… R} : set of labels π e : constraint on edge e An assignment A satisfies an edge e = (u,v) E if π e (A(u)) = A(v) 123..R123..R 123..R123..R πeπe U V u v Find an assignment A that satisfies maximum number of edges 3 π e (3)=2 7 5 2 3 3 1 4 1 2 5 6

Hardness of Label Cover There exists γ > 0 such that Given a label cover instance Г =(U,V,E,R,π), it is NP-hard to distinguish between : Г is completely satisfiable No assignment satisfies more than 1/R γ fraction of the edges. [Raz 98]

a 1 + a 2 + a 3 + a 4 ≥ θ a 1 + a 2 + a 3 - a 4 < θ a 1 + a 2 - a 3 + a 4 < θ a 1 + a 2 - a 3 + a 4 ≥ θ a 1 - a 2 + a 3 - a 4 ≥ θ a 1 - a 2 + a 3 + a 4 < θ a 1 + a 2 - a 3 - a 4 < θ Aim U V Variables : a 1,a 2,a 3,a 4, θ SATISFIABLE 1/R γ SATISFIABLE Homogenous inequalities with +1, -1 coefficients

Variables For each vertex u, R variables : u 1,u 2,…,u R U V 123..R123..R If u is assigned label k then u k = 1 and u j = 0 for all j ≠k

Equation Tuples All vertices are assigned exactly one label 123..R123..R 123..R123..R πeπe u v Most of the variables are zero For all u u 1 + u 2 +.. u R = 1 For all u,v u 1 + u 2 +.. u R - (v 1 + v 2 +.. v R ) = 0 For all constraints π e all 1 ≤ k ≤ R ∑u i = v k summation over all i, π e (i) = k u 1 – v 1 = 0 u 2 + u 3 – v 2 = 0 Pick randomly t variables u i u i = 0 OVER ALL RANDOM CHOICES EQUATION TUPLE

There is an assignment that satisfies most of the equation tuples Equation Tuples SATISFIABLE 1/R γ SATISFIABLE Suppose u 2 + u 3 – v 2 = 0 is an equation |u 2 + u 3 – v 2 | > ε (u 1 + u 2 +.. u R ) Scaling Factor : u 1 + u 2 +.. u R

Next Step u 1 – v 1 = 0 u 2 + u 3 – v 2 = 0 u 1 + u 2 + u 3 – v 1 –v 2 – v 3 = 0 u 1 = 0 u 3 + v 1 – v 2 = 0 One Unsatisfied equation Most tuples have C equations that are not even approximately satisfied Introduce Several copies of the variables Add consistency checks between the different copies of the same variable Each variable appears exactly once in a tuple, with coefficient +1, -1

Recap SATISFIABLE 1/R γ SATISFIABLE Most tuples have C equations that are not even approximately satisfied Most tuples are completely satisfied Each variable appears exactly once in a tuple, with coeffcient +1, -1 Using linear inequalities distinguish between a tuple that is Completely Satisfied Atleast C of its equations are not even approximately satisfied

Observation A – B < 0 A + B ≥ 0 B > 0 |A| < B u 1 – v 1 = 0 u 4 + u 5 – v 2 = 0 u 6 + u 2 + u 7 – v 4 –v 5 – v 6 = 0 u 3 = 0 u 8 + v 3 – v 7 = 0 X 1+ X -1+ X 1+ X -1+ u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 - u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 = u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 - u 1 + u 2 +.. u R < 0 u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 + u 1 + u 2 +.. u R ≥ 0 Pick one of the equation tuples at random Scaling Factor : u 1 + u 2 +.. u R

Good Case u 1 – v 1 u 2 + u 3 – v 2 u 1 + u 2 + u 3 – v 1 –v 2 – v 3 u 1 u 3 + v 1 – v 2 = 0 u 1 - u 2 + u 3 +u 4 +u 5 -u 6 - u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 = 0 The assignment also satisfies, u 1 + u 2 +.. u R = 1 BOTH INEQUALITIES SATISFIED With high probability over the choice of tuples u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 - u 1 + u 2 +.. u R < 0 u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 + u 1 + u 2 +.. u R ≥ 0

Bad Case u 1 – v 1 u 2 + u 3 – v 2 u 1 + u 2 + u 3 – v 1 –v 2 – v 3 u 1 u 3 + v 1 – v 2 > ε (u 1 + u 2 +.. u R ) > ε (u 1 + u 2 +.. u R ) For large enough C,With high probability over choice of +1,-1 combination, | u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 | > (u 1 + u 2 +.. u R ) ATMOST ONE OF INEQUALITIES SATISFIED With high probability over choice of equaton tuple, u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 - u 1 + u 2 +.. u R < 0 u 1 - u 2 + u 3 +u 4 +u 5 -u 6 -u 7 -u 8 –v 1 –v 2 – v 3 +v 4 +v 5 +v 6 +v 7 + u 1 + u 2 +.. u R ≥ 0

For any vector v = (v 1,v 2,…v n ) with sufficiently many large coordinates(> ε ), at least 1- δ fraction of the vectors u S satisfy |u∙v| > 1 Interesting Set of Vectors All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1} n such that Construction using 4-wise independent family and random grouping of coordinates

Construction V1V2V3V4V5V6V7V1V2V3V4V5V6V7 > ε 1 1 1 ∙ -V 1 +V 2 -V 3 +V 4 -V 5 +V 6 +V 7 = > 1 Four-wise independent family : some constant probability All 2 n combinations : probability close to 1

=S 5 ε ε = L 1 1 =S 10 =S 2 = S 1 ε ε ε ε Construction V 1 V 2 V 3 V 4 V 5 V 6 V 7.. V 89 V 99 V 100 V 101 V 102 V 103 V 1 …. V 8 V 2.. V 5 V 100 … V 101 V 103 V 9 … V 89 V 6.. 1 … 1 …. 1 … 1 1 +1 1 ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε All 2 n combinations 4-wise independent set By independence of grouping By Chernoff Bounds All 2 n combinations

Conclusion Either an assumption on the distribution of examples or the noise is necessary for efficient halfspace learning algorithms. [Raghavendra-Venkatesan] Similar hardness result for learning Support vector machines in presence of adversarial noise.

THANK YOU

Details All possible {-1,1} combinations is an exponentially large set. No variable should occur more than once in an equation tuple, to ensure that ultimately the inequalities all have coefficients in {-1,1} Construction using 4-wise independent family and random grouping of coordinates Use different copies of the variables for different equations, and careful choice of consistency checks

For any vector v = (v 1,v 2,…v n ) with sufficiently many large coordinates(> ε ), atmost δ fraction of the vectors u S satisfy |u∙v| < 1 Interesting Set of Vectors All possible {-1,1} combinations is exponentially large set. Construct a polynomial size subset S of {-1,1} n such that Construction using 4-wise independent family and random grouping of coordinates

Equation Tuple u 1 – v 1 = 0 u 2 + u 3 – v 2 = 0 u 1 + u 2 + u 3 – v 1 –v 2 – v 3 = 0 u 1 = 0 u 3 + v 1 – v 2 = 0 ε -Satisfaction An assignment A is said to ε- satisfy an equation E tuple if it satisfies all the equations in the tuple u 2 + u 3 – v 2 < ε (u 1 + u 2 + u 3 )

Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami.

Similar presentations

Presentation on theme: "Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami.

Similar presentations

Presentation on theme: "Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami."— Presentation transcript:

Similar presentations

About project

Feedback