Learnability of DNF with Representation-Specific Queries Liu Yang Joint work with Avrim Blum & Jaime Carbonell Carnegie Mellon University 1© Liu Yang 2012.

Slides:



Advertisements
Similar presentations
LEARNIN HE UNIFORM UNDER DISTRIBUTION – Toward DNF – Ryan ODonnell Microsoft Research January, 2006.
Advertisements

Mathematical Theories of Interaction with Oracles Liu Yang Carnegie Mellon University 1© Liu Yang 2013.
COMPLEXITY THEORY CSci 5403 LECTURE XVI: COUNTING PROBLEMS AND RANDOMIZED REDUCTIONS.
NP-Hard Nattee Niparnan.
Interactive Configuration
Max Cut Problem Daniel Natapov.
An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution Jeffrey C. Jackson Presented By: Eitan Yaakobi Tamar.
Learning Juntas Elchanan Mossel UC Berkeley Ryan O’Donnell MIT Rocco Servedio Harvard.
Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua On Agnostic Boosting and Parity Learning.
1 Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University.
Distribution-free testing algorithms for monomials with a sublinear number of queries Elya Dolev & Dana Ron Tel-Aviv University.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
1/17 Optimal Long Test with One Free Bit Nikhil Bansal (IBM) Subhash Khot (NYU)
Maria-Florina Balcan Modern Topics in Learning Theory Maria-Florina Balcan 04/19/2006.
Active Perspectives on Computational Learning and Testing Liu Yang Slide 1.
Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,
Correlation Immune Functions and Learning Lisa Hellerstein Polytechnic Institute of NYU Brooklyn, NY Includes joint work with Bernard Rosell (AT&T), Eric.
1 On approximating the number of relevant variables in a function Dana Ron & Gilad Tsur Tel-Aviv University.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Maria-Florina Balcan Learning with Similarity Functions Maria-Florina Balcan & Avrim Blum CMU, CSD.
Princeton University COS 433 Cryptography Fall 2005 Boaz Barak COS 433: Cryptography Princeton University Fall 2005 Boaz Barak Lecture 3: Computational.
CSE 421 Algorithms Richard Anderson Lecture 27 NP Completeness.
1 On The Learning Power of Evolution Vitaly Feldman.
1 Slides by Asaf Shapira & Michael Lewin & Boaz Klartag & Oded Schwartz. Adapted from things beyond us.
On Kernels, Margins, and Low- dimensional Mappings or Kernels versus features Nina Balcan CMU Avrim Blum CMU Santosh Vempala MIT.
Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations Prateek Bhakta, Sarah Miracle, Dana Randall and Amanda Streib.
Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
 1  Outline  stages and topics in simulation  generation of random variates.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Approximation algorithms for sequential testing of Boolean functions Lisa Hellerstein Polytechnic Institute of NYU Joint work with Devorah Kletenik (Polytechnic.
An Algorithmic Proof of the Lopsided Lovasz Local Lemma Nick Harvey University of British Columbia Jan Vondrak IBM Almaden TexPoint fonts used in EMF.
February 18, 2015CS21 Lecture 181 CS21 Decidability and Tractability Lecture 18 February 18, 2015.
Potential-Based Agnostic Boosting Varun Kanade Harvard University (joint work with Adam Tauman Kalai (Microsoft NE))
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Machine Learning Chapter 5. Evaluating Hypotheses
1/19 Minimizing weighted completion time with precedence constraints Nikhil Bansal (IBM) Subhash Khot (NYU)
Lattice-based cryptography and quantum Oded Regev Tel-Aviv University.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
CSC 413/513: Intro to Algorithms
Correlation Clustering Nikhil Bansal Joint Work with Avrim Blum and Shuchi Chawla.
Iftach Haitner and Eran Omri Coin Flipping with Constant Bias Implies One-Way Functions TexPoint fonts used in EMF. Read the TexPoint manual before you.
Learning abductive reasoning using random examples Brendan Juba Washington University in St. Louis.
Harmonic Analysis in Learning Theory Jeff Jackson Duquesne University.
Complexity Theory and Explicit Constructions of Ramsey Graphs Rahul Santhanam University of Edinburgh.
Chapter 7. Classification and Prediction
L is in NP means: There is a language L’ in P and a polynomial p so that L1 ≤ L2 means: For some polynomial time computable map r :  x: x  L1 iff.
Dana Ron Tel Aviv University
Richard Anderson Lecture 26 NP-Completeness
Vitaly Feldman and Jan Vondrâk IBM Research - Almaden
Introduction to Machine Learning
NP-Completeness (2) NP-Completeness Graphs 7/23/ :02 PM x x x x
Richard Anderson Lecture 26 NP-Completeness
Circuit Lower Bounds A combinatorial approach to P vs NP
Umans Complexity Theory Lectures
Lecture 18: Uniformity Testing Monotonicity Testing
Intro to Theory of Computation
Faster Space-Efficient Algorithms for Subset Sum
Richard Anderson Lecture 25 NP-Completeness
CSCI B609: “Foundations of Data Science”
Computational Learning Theory
Learning, testing, and approximating halfspaces
Computational Learning Theory
including joint work with:
Switching Lemmas and Proof Complexity
Lecture 23 NP-Hard Problems
Presentation transcript:

Learnability of DNF with Representation-Specific Queries Liu Yang Joint work with Avrim Blum & Jaime Carbonell Carnegie Mellon University 1© Liu Yang 2012

Learning DNF formulas DNF formulas: n: # of var.s; poly-sized DNF: # terms = n O(1) e.g. set of fn like f = (x1 ∧ x2) ∨ (x1 ∧ x4) -Natural form of knowledge representation -[Valiant 1984] a great challenge over 20 years -PAC-learning DNF appears to be very hard. (x1 ∧ x2) (x1 ∧ x4) 2© Liu Yang 2012

Best Known Alg.s in Standard Model Learning DNF with MQ - No effi. MQ-alg under general distri. - Can improper-learn DNF by MQ under Unif. [Jac94] PAC-learn General DNF - Not known if we can effi. learn DNF from random ex.s - cannot proper-learn 2-term DNF from random ex.s unless NP = RP [PV88] - Fastest alg [KS01] PAC-learning poly(n)-term DNF run in time PAC-learn DNF under Unif. - Fastest known alg runs in time n O(log n) [Ver90] - Learning k-juntas: best known alg needstime (n k ) w/(w+1), w < [MOS04] PAC-learn Monotone DNF under Unif. (partial results) - can effi. learn 2 O(√log n) -term monotone DNF under unif. [Ser04] ; - can effi. learn monotone decision tree under unif. [OS07] 3© Liu Yang 2012

Query: Similarity about TYPE What if you have similarity info about TYPE ? Card Fraud detection : fraudulent of same type ? - Identity theft/Skimming/Stolen cards Type of Query for DNF learning: - Binary: pair of POSITIVE ex.s from a random dataset, teacher says YES if they share a term Can we efficiently learn DNF with this query? 4© Liu Yang 2012 Fraud Detection

Outline Our model Hardness Results Positive Results: Uniform distri. Positive Results: General distri. 5© Liu Yang 2012

Notations Instance space X = {0, 1}^n Concept space C: collection of fn h: X -> {- 1,1} Distribution D over X Unknown target function h*: the true labeling function (Realizable case: h* in C) Err(h) = Px~D[h(x) ~= h*(x)] 6© Liu Yang 2012

Warm Up: Disjoint DNF w/Boolean queries Neighborhood Method - Nice seed for Ti: ex. sat. T i & no other term - Get all its neighbors in the graph and learn a conjunc. 7© Liu Yang 2012 T1 Lemma. Neighbor-method w.p. 1-δ, produce an ε- accu. DNF if, for each term Ti in target DNF having a pr. of sat ≥ε/2t, there is ≥ 1/poly(n, 1/ε) prob. a rand. ex. sat. T i & no other term.

Warm Up: Adaptively Construct Ex.s. Oracle(x, x’) -> #terms that x and x’ have in common. Move(x, x’) moves x’ away from x by 1 bit, trying to maintain at least one common term. LearnTerm(x) -> a term in target fn. © Liu Yang or – s.t. x sat it) X Y K(x,y) = 1 initially X Y K(x,y) = 0 change! X3 rel… Output x 3 ∧ x 7 Flip x3 back … X Y K(x,y) = 0 X7 rel…

Outline Our model Hardness Results Positive Results: Uniform Positive Results: General 9© Liu Yang 2012

Hardness Results Boolean (group learn) Thm. Learning DNF from random data under arb. distri. w/ boolean queries is as hard as learning DNF from random data under arb. distri. w/ only labels (no queries). 10© Liu Yang 2012 m K (giant 1, giant 2) = 1 Reduction from group-learn DNF in std. model to extended queries model. How to use our alg (A: learn from a poly #ex.s in extended model) to group-learn ?

Hardness Results Approx numerical query Thm. Learning DNF from random data under arbi. distri. w/ approx-numerical-queries is as hard as learning DNF from random data under arb. distri. w/ only labels i.e. if C is #terms x i and x j sat in common, oracle returns a value in [(1 – τ )C, (1 + τ )C]. © Liu Yang

Outline Our model Hardness Results Positive Results: Uniform distri. - Can learn DNF w/ numerical query - Can learn Junta w/ boolean query - Can learn DNF having ≤ 2 O(√log(n)) terms Positive Results: General distri. 12© Liu Yang 2012

Learning a sum of monotone terms Thm. Can efficiently learn a sum of monotone t terms over unif. Distri., using time & samples poly(t, n, 1/ε). - Fourier representation for single term T is simple: - Fourier coeffi. for S of f: - For each T i, L1-length is 1, we have L1(f) ≤t. => for any thresh θ, at most t/θ coeffi. of mag. ≥ θ. © Liu Yang otw f(x) = T 1 (x)+T 2 (x)+ … +T t (x) Observations:

Alg: learn DNF w/ numerical query Examine each parity fn of size 1 & est its Fourier coeffi. from data (up to accu. θ/4). Set θ = ε/(8t) Place all coeffi. of mag. ≥ θ/2 into a list L1. For j = 2, 3,... repeat: - For each parity fn Φ S in list L j-1 and each x i not in S, est Fourier coeffi. of - If est. is ≥ θ/2, add it to list L j (if not already in) - maintain list L j : size-j parity fns w/ coeffi. mag. ≥ θ. Construct fn g: weight sum of parities for identified coeff. Output fn h(x) = [g(x)] © Liu Yang

Alg: learn DNF w/ numerical query Thm. Under the uniform distribution, with numerical pairwise queries, we can learn any poly(n)-term DNF. © Liu Yang

r-Junta under Unif. (r = log n) Using Boolean Query Lemma. For x and y ind. unif. samples, if target fn has r relevant variables, and the ith variable is relevant in target fn, then © Liu Yang

Alg: Learn (log n)-Junta Thm. Under Unif., with boolean queries, can properly learn any DNF having O(log(n)) relevant var.s (Junta). © Liu Yang For each i, sample (16 r log(n/ δ )) rand. pairs (x, y); - Evaluate k(x, y) - Calculate diff. of emp. prob.s. - If (frac. of pairs with k(x, y) = 1 ∧ xi = yi - frac. of pairs with k(x, y) = 1 ∧ x i ≠ y i ) > (1/2)(1/4) r, then var i is rel., otw irrel.

Learn DNF having ≤ 2 O(√log(n)) terms © Liu Yang FindVarariables [Ser04]: monotone dnf For monotone f, simplified as: MQ Rand data LearnTerm

Learn DNF having ≤ 2 O(√log(n)) terms: Unif. Distri. -sample m = poly(t/eps) labeled ex.s x (1),…, x (m) at random. -for each j ≤ m, define -Kj is a monotone DNF after transforming feature space to: phi_j(y) = (I[y1=x(j)1], I[y2=x(j)2],…, I[yn=x(j)n]). call it a landmark DNF Terms in k j correspond to terms in target sat. by x (j), -Run FindVariables for each k j S f : union of returned sets of variables. © Liu Yang

Outline Our model Hardness Results Positive Results: Uniform distri. Positive Results: General distri. 20© Liu Yang 2012

Common Profile Approach - Take a sample of poly(n, 1/ ε, log(1/ δ )) rand. labeled ex.s, - K(x, y): #terms sat in common for a pair pos ex. (x, y) - “Profile”: set of terms T i in target DNF satisfied by x. - For each pos ex x, identify set S of ex.s y s.t..K(x; y) = K(x; x). So these points sat at least all the terms x sat. - For each S, learn a minimal conjunction consistent w/ these ex.s. © Liu Yang Implication: w/numerical-valued queries, under arb. distri. can proper-learn DNF w/ O(log(n)) rel. var.. Lemma. If target DNF has ≤ poly(n) possible profiles, common-profile-approach, w.p. ≥1 - δ, produce a DNF w/err rate ≤ ε.

BinaryNumeric O(log(n)) terms DNF under any distrib. ✔ 2-term DNF under any distrib. ✔✔ DNF: all var in O(log(n)) terms (Unif) ✔✔ Other Positive Results

Thanks ! 23© Liu Yang 2012