A Quarter-Century of Efficient Learnability

Slides:



Advertisements
Similar presentations
1/15 Agnostically learning halfspaces FOCS /15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )
Advertisements

Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.
How to Solve Longstanding Open Problems In Quantum Computing Using Only Fourier Analysis Scott Aaronson (MIT) For those who hate quantum: The open problems.
Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin
Reductions to the Noisy Parity Problem TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A Vitaly Feldman Parikshit.
PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.
LEARNIN HE UNIFORM UNDER DISTRIBUTION – Toward DNF – Ryan ODonnell Microsoft Research January, 2006.
Submodular Set Function Maximization via the Multilinear Relaxation & Dependent Rounding Chandra Chekuri Univ. of Illinois, Urbana-Champaign.
An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution Jeffrey C. Jackson Presented By: Eitan Yaakobi Tamar.
Extremal properties of polynomial threshold functions Ryan O’Donnell (MIT / IAS) Rocco Servedio (Columbia)
Learning Juntas Elchanan Mossel UC Berkeley Ryan O’Donnell MIT Rocco Servedio Harvard.
Learning intersections and thresholds of halfspaces Adam Klivans (MIT/Harvard) Ryan O’Donnell (MIT) Rocco Servedio (Harvard)
Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua On Agnostic Boosting and Parity Learning.
1 Polynomial Time Probabilistic Learning of a Subclass of Linear Languages with Queries Yasuhiro TAJIMA, Yoshiyuki KOTANI Tokyo Univ. of Agri. & Tech.
Maria-Florina Balcan Modern Topics in Learning Theory Maria-Florina Balcan 04/19/2006.
Computational Worldview and the Sciences Leslie Valiant Harvard University.
Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR) TexPoint fonts used in EMF. Read the TexPoint manual before.
Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR)
A Quarter-Century of Efficient Learnability Rocco Servedio Columbia University Valiant 60 th Birthday Symposium Bethesda, Maryland May 30, 2009.
Learning, testing, and approximating halfspaces Rocco Servedio Columbia University DIMACS-RUTCOR Jan 2009.
Computational Learning Theory
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Probably Approximately Correct Model (PAC)
Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,
Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.
Correlation Immune Functions and Learning Lisa Hellerstein Polytechnic Institute of NYU Brooklyn, NY Includes joint work with Bernard Rosell (AT&T), Eric.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Fourier Analysis and Boolean Function Learning Jeff Jackson Duquesne University
1 On The Learning Power of Evolution Vitaly Feldman.
CS 4700: Foundations of Artificial Intelligence
Probably Approximately Correct Learning (PAC) Leslie G. Valiant. A Theory of the Learnable. Comm. ACM (1984)
Incorporating Unlabeled Data in the Learning Process
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
Complexity Classes Kang Yu 1. NP NP : nondeterministic polynomial time NP-complete : 1.In NP (can be verified in polynomial time) 2.Every problem in NP.
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
Random Sampling, Point Estimation and Maximum Likelihood.
Credentials Our understanding of this topic is based on the Work of many researchers. In particular: Rosa Arriaga Peter Bartlett Avrim Blum Bhaskar DasGupta.
Learning DFA from corrections Leonor Becerra-Bonache, Cristina Bibire, Adrian Horia Dediu Research Group on Mathematical Linguistics, Rovira i Virgili.
Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Potential-Based Agnostic Boosting Varun Kanade Harvard University (joint work with Adam Tauman Kalai (Microsoft NE))
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Resource bounded dimension and learning Elvira Mayordomo, U. Zaragoza CIRM, 2009 Joint work with Ricard Gavaldà, María López-Valdés, and Vinodchandran.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Learnability of DNF with Representation-Specific Queries Liu Yang Joint work with Avrim Blum & Jaime Carbonell Carnegie Mellon University 1© Liu Yang 2012.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Harmonic Analysis in Learning Theory Jeff Jackson Duquesne University.
© Jude Shavlik 2006 David Page 2007 CS 760 – Machine Learning (UW-Madison)Lecture #28, Slide #1 Theoretical Approaches to Machine Learning Early work (eg.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
For Friday No reading Homework: Program 3 due
Dana Ron Tel Aviv University
Computational Learning Theory
Vitaly Feldman and Jan Vondrâk IBM Research - Almaden
Computational Learning Theory
Introduction to Machine Learning
Circuit Lower Bounds A combinatorial approach to P vs NP
Background: Lattices and the Learning-with-Errors problem
Analysis and design of algorithm
Pseudo-derandomizing learning and approximation
Computational Learning Theory
Learning, testing, and approximating halfspaces
Computational Learning Theory
Chapter 11 Limitations of Algorithm Power
Fourier Analysis and Boolean Function Learning
including joint work with:
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
P, NP and NP-Complete Problems
Machine Learning: UNIT-3 CHAPTER-2
P, NP and NP-Complete Problems
Presentation transcript:

A Quarter-Century of Efficient Learnability Rocco Servedio Columbia University Valiant 60th Birthday Symposium Bethesda, Maryland May 30, 2009

1984 and of course...

Probably Approximately Correct learning [Valiant84] [Valiant84] presents range of learning models, oracles D models (possibly complex) world Probably Approximately Correct learning [Valiant84] typically or Concept class of Boolean functions over domain X Unknown target concept to be learned from examples Unknown and arbitrary distribution over X Learner has access to i.i.d. draws from labeled according to each belongs to X, i.i.d. drawn from

PAC learning concept class Learner’s goal: come up with hypothesis that will have high accuracy on future examples. Efficiently For any target function for any distribution over X, with probability learner outputs hypothesis that is -accurate w.r.t. Algorithm must be computationally efficient: should run in time

So, what can be learned efficiently? PAC model, and its variants, provide a clean theoretical framework for studying the computational complexity of learning problems. From : “The results of learnability theory would then indicate the maximum granularity of the single concepts that can be acquired without programming.” “This paper attempts to explore the limits of what is learnable as allowed by algorithmic complexity….The identification of these limits is a major goal of the line of work proposed in this paper.”

25 years of efficient learnability (Didn’t just ask the question “what can be learned efficiently” – he did a great deal towards answering it. (highlight some of these contributions and how the field has evolved since then) 25 years of efficient learnability In the rest of the 1980s, Valiant & colleagues gave remarkable results on the abilities and limitations of computationally efficient learning algorithms. This work introduced research directions and questions that continue to be intensively studied to this day. Rest of talk: survey some positive results (algorithms) negative results (two flavors of hardness results)

Positive results: learning k-DNF Theorem [Valiant84]: k-DNF learnable in polynomial time for any k=O(1). k=2: View a k-DNF as a disjunction over “metavariables”, learn the disjunction using elimination. 25 years later: improving this to k is still a major open question! Much has been learned in trying for this improvement…

Poly-time PAC learning, general distributions Decision lists (greedy alg.) [Rivest87] Halfspaces (poly-time LP) [Littlestone87, BEHW89, …] Parities, integer lattices (Gaussian elim.) [HelmboldSloanWarmuth92, FischerSimon92] Restricted types of branching programs (DL + parities) [ErgunKumarRubinfeld95, BshoutyTamonWilson98] Geometric concept classes (…random projections…) [BshoutyChenHomer94, BGMST98, Vempala99, …] and more… + + + + + - - + + - + - - - - - -

General-distribution PAC learning, cont Quasi-poly / sub-exponential-time learning: poly-size decision trees [EhrenfeuchtHaussler89, Blum92] poly-size DNF [Bshouty96, TaruiTsukiji99, KlivansS01] intersections of few poly(n)-weight halfspaces [KlivansO’DonnellS02] “PTF method” (halfspaces + metavariables) - link with complexity theory x3 x5 x1 x1 x5 x4 1 -1 1 -1 1 -1 1 OR AND AND AND _ _ _ _ x2 x3 x5 x6 x3 x5 x1 x6 x7 - + + - + - - - + - - - - - - - - - - -

Distribution-specific learning Theorem [KearnsLiValiant87]: monotone Boolean functions can be weakly learned (accuracy ) in poly time under the uniform distribution on Ushered in study of algorithms for uniform-distribution and distribution-specific learning: halfspaces [Baum90], DNF [Verbeurgt90, Jackson95], decision trees [KushilevitzMansour93], AC0 [LinialMansourNisan89, FurstJacksonSmith91], extended AC0 [JacksonKlivansS02], juntas [MosselO’DonnellS03], general monotone functions [BshoutyTamon96, BlumBurchLangford98, O’DonnellWimmer09], monotone decision trees [O’DonnellS06], intersections of halfspaces [BlumKannan94, Vempala97, KwekPitt98, KlivansO’DonnellS08], convex sets, much more… Key tool: Fourier analysis of Boolean functions Recently come full circle on monotone functions: [O’DonnellWimmer09]: poly time, accuracy: optimal! (by [BlumBurchLangford98]) 1 1

Other variants After [Valiant84], efficient learning algorithms studied in many settings: Learning in the presence of noise: malicious [Valiant85], agnostic [KearnsSchapireSellie93], random misclassification [AngluinLaird87],… Related models: Exact learning from queries and counterexamples [Angluin87], Statistical Query Learning [Kearns93], many others… PAC-style analyses of unsupervised learning problems: learning discrete distributions [KMRRSS94], learning mixture distributions [Dasgupta99, AroraKannan01, many others…] Evolvability framework [Valiant07, Feldman08, …] Nice algorithmic results in all these settings.

Limits of efficient learnability: is proper learning feasible? learning algorithm for class must uses hypotheses from There are efficient proper learning algorithms for conjunctions, disjunctions, halfspaces, decision lists, parities, k-DNF, k-CNF. What about k-term DNF – can we learn using k-term DNF as hypotheses?

Proper learning is computationally hard Theorem [PittValiant87]: If no poly-time algorithm can learn 3-term DNF using 3-term DNF hypotheses. Given a graph reduction produces distribution over labeled examples such that high-accuracy 3-term DNF iff is 3-colorable. Note: can learn 3-term DNF in poly time using 3-CNF hypotheses! “Often a change of representation can make a difficult learning task easy.” distribution over (011111, +) (001111, -) (101111, +) (010111, -) (110111, +) (011101, -) … … reduction

From 1987… This work showed computational barriers to learning with restricted representations in general, not just proper learning: Theorem [PittValiant87]: Learning k-term DNF using (2k-3)-term DNF hypotheses is hard. Opened door to whole range of hardness results: is hard to learn using hypotheses from

… to 2009 Great progress in recent years using sophisticated machinery from hardness of approximation. [ABFKP04]: Hard to learn n-term DNF using n100-size OR-of-halfspace hypotheses. [Feldman06]: Holds even if learner can make membership queries to target function. [KhotSaket08]: Hard to (even weakly) learn intersection of 2 halfspaces using 100 halfspaces as hypothesis If data is corrupted with 1% noise, then [FeldmanGopalanKhotPonnuswami08] : Hard to (even weakly) learn an AND using an AND as hypothesis. Same for halfspaces. [GopalanKhotSaket07, Viola08]: Hard to (even weakly) learn a parity even using degree-100 GF(2) polynomials as hypotheses Active area with lots of ongoing work.

Representation-Independent Hardness Suppose there are no hypothesis restrictions: any poly-size circuit OK. Are there learning problems that are still hard for computational reasons? Yes: [Valiant84]: Existence of pseudorandom functions [GoldreichGoldwasserMicali84] implies that general Boolean circuits are (representation-independently) hard to learn.

PKC and hardness of learning Key insight of [KearnsValiant89]: Public-key cryptosystems  hard-to-learn functions. Adversary can create labeled examples of by herself…so must not be learnable from labeled examples, or else cryptosystem would be insecure! Theorem [KearnsValiant89]: Simple classes of functions – NC1, TC0, poly-size DFAs – are inherently hard to learn. Theorem [Regev05, KlivansSherstov06]: Really simple functions – poly-size OR of halfspaces – are inherently hard to learn. Closing the gap: Can these results be extended to show that DNF are inherently hard to learn? Or are DNF efficiently learnable?

Efficient learnability: Model and Results Valiant provided an elegant model for the computational study of learning followed this up with foundational results on what is (and isn’t) efficiently learnable These fundamental questions continue to be intensively studied and cross-fertilize other topics in TCS. Thank you, Les!