Download presentation
Presentation is loading. Please wait.
1
A Quarter-Century of Efficient Learnability
Rocco Servedio Columbia University Valiant 60th Birthday Symposium Bethesda, Maryland May 30, 2009
2
1984 and of course...
4
Probably Approximately Correct learning [Valiant84]
[Valiant84] presents range of learning models, oracles D models (possibly complex) world Probably Approximately Correct learning [Valiant84] typically or Concept class of Boolean functions over domain X Unknown target concept to be learned from examples Unknown and arbitrary distribution over X Learner has access to i.i.d. draws from labeled according to each belongs to X, i.i.d. drawn from
5
PAC learning concept class
Learner’s goal: come up with hypothesis that will have high accuracy on future examples. Efficiently For any target function for any distribution over X, with probability learner outputs hypothesis that is -accurate w.r.t. Algorithm must be computationally efficient: should run in time
6
So, what can be learned efficiently?
PAC model, and its variants, provide a clean theoretical framework for studying the computational complexity of learning problems. From : “The results of learnability theory would then indicate the maximum granularity of the single concepts that can be acquired without programming.” “This paper attempts to explore the limits of what is learnable as allowed by algorithmic complexity….The identification of these limits is a major goal of the line of work proposed in this paper.”
7
25 years of efficient learnability
(Didn’t just ask the question “what can be learned efficiently” – he did a great deal towards answering it. (highlight some of these contributions and how the field has evolved since then) 25 years of efficient learnability In the rest of the 1980s, Valiant & colleagues gave remarkable results on the abilities and limitations of computationally efficient learning algorithms. This work introduced research directions and questions that continue to be intensively studied to this day. Rest of talk: survey some positive results (algorithms) negative results (two flavors of hardness results)
8
Positive results: learning k-DNF
Theorem [Valiant84]: k-DNF learnable in polynomial time for any k=O(1). k=2: View a k-DNF as a disjunction over “metavariables”, learn the disjunction using elimination. 25 years later: improving this to k is still a major open question! Much has been learned in trying for this improvement…
9
Poly-time PAC learning, general distributions
Decision lists (greedy alg.) [Rivest87] Halfspaces (poly-time LP) [Littlestone87, BEHW89, …] Parities, integer lattices (Gaussian elim.) [HelmboldSloanWarmuth92, FischerSimon92] Restricted types of branching programs (DL + parities) [ErgunKumarRubinfeld95, BshoutyTamonWilson98] Geometric concept classes (…random projections…) [BshoutyChenHomer94, BGMST98, Vempala99, …] and more… + + + + + - - + + - + - - - - - -
10
General-distribution PAC learning, cont
Quasi-poly / sub-exponential-time learning: poly-size decision trees [EhrenfeuchtHaussler89, Blum92] poly-size DNF [Bshouty96, TaruiTsukiji99, KlivansS01] intersections of few poly(n)-weight halfspaces [KlivansO’DonnellS02] “PTF method” (halfspaces + metavariables) - link with complexity theory x3 x5 x1 x1 x5 x4 1 -1 1 -1 1 -1 1 OR AND AND AND _ _ _ _ x2 x3 x5 x6 x3 x5 x1 x6 x7 - + + - + - - - + - - - - - - - - - - -
11
Distribution-specific learning
Theorem [KearnsLiValiant87]: monotone Boolean functions can be weakly learned (accuracy ) in poly time under the uniform distribution on Ushered in study of algorithms for uniform-distribution and distribution-specific learning: halfspaces [Baum90], DNF [Verbeurgt90, Jackson95], decision trees [KushilevitzMansour93], AC0 [LinialMansourNisan89, FurstJacksonSmith91], extended AC0 [JacksonKlivansS02], juntas [MosselO’DonnellS03], general monotone functions [BshoutyTamon96, BlumBurchLangford98, O’DonnellWimmer09], monotone decision trees [O’DonnellS06], intersections of halfspaces [BlumKannan94, Vempala97, KwekPitt98, KlivansO’DonnellS08], convex sets, much more… Key tool: Fourier analysis of Boolean functions Recently come full circle on monotone functions: [O’DonnellWimmer09]: poly time, accuracy: optimal! (by [BlumBurchLangford98]) 1 1
12
Other variants After [Valiant84], efficient learning algorithms studied in many settings: Learning in the presence of noise: malicious [Valiant85], agnostic [KearnsSchapireSellie93], random misclassification [AngluinLaird87],… Related models: Exact learning from queries and counterexamples [Angluin87], Statistical Query Learning [Kearns93], many others… PAC-style analyses of unsupervised learning problems: learning discrete distributions [KMRRSS94], learning mixture distributions [Dasgupta99, AroraKannan01, many others…] Evolvability framework [Valiant07, Feldman08, …] Nice algorithmic results in all these settings.
13
Limits of efficient learnability: is proper learning feasible?
learning algorithm for class must uses hypotheses from There are efficient proper learning algorithms for conjunctions, disjunctions, halfspaces, decision lists, parities, k-DNF, k-CNF. What about k-term DNF – can we learn using k-term DNF as hypotheses?
14
Proper learning is computationally hard
Theorem [PittValiant87]: If no poly-time algorithm can learn 3-term DNF using 3-term DNF hypotheses. Given a graph reduction produces distribution over labeled examples such that high-accuracy 3-term DNF iff is 3-colorable. Note: can learn 3-term DNF in poly time using 3-CNF hypotheses! “Often a change of representation can make a difficult learning task easy.” distribution over (011111, +) (001111, -) (101111, +) (010111, -) (110111, +) (011101, -) … … reduction
15
From 1987… This work showed computational barriers to learning with restricted representations in general, not just proper learning: Theorem [PittValiant87]: Learning k-term DNF using (2k-3)-term DNF hypotheses is hard. Opened door to whole range of hardness results: is hard to learn using hypotheses from
16
… to 2009 Great progress in recent years using sophisticated machinery from hardness of approximation. [ABFKP04]: Hard to learn n-term DNF using n100-size OR-of-halfspace hypotheses. [Feldman06]: Holds even if learner can make membership queries to target function. [KhotSaket08]: Hard to (even weakly) learn intersection of 2 halfspaces using 100 halfspaces as hypothesis If data is corrupted with 1% noise, then [FeldmanGopalanKhotPonnuswami08] : Hard to (even weakly) learn an AND using an AND as hypothesis. Same for halfspaces. [GopalanKhotSaket07, Viola08]: Hard to (even weakly) learn a parity even using degree-100 GF(2) polynomials as hypotheses Active area with lots of ongoing work.
17
Representation-Independent Hardness
Suppose there are no hypothesis restrictions: any poly-size circuit OK. Are there learning problems that are still hard for computational reasons? Yes: [Valiant84]: Existence of pseudorandom functions [GoldreichGoldwasserMicali84] implies that general Boolean circuits are (representation-independently) hard to learn.
18
PKC and hardness of learning
Key insight of [KearnsValiant89]: Public-key cryptosystems hard-to-learn functions. Adversary can create labeled examples of by herself…so must not be learnable from labeled examples, or else cryptosystem would be insecure! Theorem [KearnsValiant89]: Simple classes of functions – NC1, TC0, poly-size DFAs – are inherently hard to learn. Theorem [Regev05, KlivansSherstov06]: Really simple functions – poly-size OR of halfspaces – are inherently hard to learn. Closing the gap: Can these results be extended to show that DNF are inherently hard to learn? Or are DNF efficiently learnable?
19
Efficient learnability: Model and Results
Valiant provided an elegant model for the computational study of learning followed this up with foundational results on what is (and isn’t) efficiently learnable These fundamental questions continue to be intensively studied and cross-fertilize other topics in TCS. Thank you, Les!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.