Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis MIT

What this talk is about Probability distributions over [N] = {1,2,…,N} Monotone increasing distribution: for all (Whole talk: “increasing” means “non-decreasing”) ` 12 N

k-modal distributions k-modal: k peaks and valleys A 3-modal distribution: A unimodal distribution: Another one: Monotone distribution: 0-modal

The learning problem Target distribution p is an unknown k-modal distribution over [N] Algorithm gets samples from p Goal: output a hypothesis h that’s -close to p in total variation distance Want algorithm that uses few samples & is computationally efficient. 12 N

The testing problem q is a known k-modal distribution over [N]. Goal: output “yes” w.h.p. if “no” w.h.p. if 12 N p is an unknown k-modal distribution over [N]. Algorithm gets samples from p. 12 N

Please note Testing problem is not: given samples from an unknown distribution p, determine if p is k-modal versus -far from every k-modal distribution. This problem requires samples, even for k=0. 1 N hard to distinguish vs uniform over random uniform over

Why study these questions? k-modal distributions seem natural would be nice if k-modal structure were exploitable by efficient learning / testing algorithms post hoc justification: solutions exhibit interesting connections between testing and learning

The general case: learning If we drop k-modal assumption, learning problem becomes: Learn an arbitrary distribution over [N] to total variation distance 1 N samples are necessary and sufficient

The general case: testing q is a known, arbitrary distribution over [N]. Goal: output “yes” if “no” if p is an unknown, arbitrary distribution over [N]. Algorithm gets samples from p. samples are necessary and sufficient [GR00, BFFKRW02, P08] If we drop k-modal assumption, testing problem becomes:

This work: main learning result We give an algorithm that learns any k-modal distribution over [N] to accuracy. It uses samples and runs in time. Close to optimal: -sample lower bound for any algorithm.

Main testing result We give an algorithm that solves the k-modal testing problem over [N] to accuracy. It uses samples and runs in time. Any testing algorithm must use samples. Testing is easier than learning!

Prior work k=0,1: [BKR04] gave -sample efficient algorithm for testing problem (p,q both available via sample access) k=0,1: [Birge87, Birge87a] gave -sample efficient algorithm for learning, and matching lower bound We’ll use this algorithm as a black box in our results

Outline of rest of talk Background: some tools Learning k-modal distributions Testing k-modal distributions

First tool: Learning monotone distributions Theorem [B87] There is an efficient algorithm that learns any monotone decreasing distribution over to accuracy. It uses samples and runs in time linear in its input size. [B87b] also gave lower bound for learning a monotone distribution.

Second tool: Learning a CDF – the Dvoretsky-Kiefer-Wolfowitz inequality Theorem: [DKW56] Let be any distribution over with CDF. Let be empirical estimate of obtained from samples. Then with probability. Morally, means you can partition into intervals each of mass under, using samples. Note: samples suffice (by easy Chernoff bound argument) true CDF empirical CDF

Learning k-modal distributions

The problem Learn an unknown k-modal distribution over [N]. 12 N

What should we shoot for? Easy lower bound: need samples. (have to solve monotone-distribution-learning problems over to accuracy ) Want an algorithm that uses roughly this many samples and takes time

The problem, again Goal: learn an unknown k-modal distribution over [N]. We know how to efficiently learn an unknown monotone distribution… Would be easy if we knew the k peaks/valleys… Guessing them exactly: infeasible Guessing them approximately: not too great either XXX

A first approach Break up [N] into many intervals: is not monotone for at most k of the intervals So running monotone distribution learner on each interval will usually give a good answer. …

First approach in more detail 1.Use [DKW] to divide [N] into intervals & obtain estimates such that (Assumes each point has mass at most or so; heavier points are easy to detect and deal with.) 2.Run monotone distribution learner on each to get (Actually run it twice: once for increasing, once for decreasing. Do hypothesis testing to pick one as.) 3.Combine hypotheses in obvious way: and

Sketch of analysis and 1.Use [DKW] to divide [N] into intervals & obtain estimates such that Takes samples 2.Run monotone distribution learner on each to get Takes samples 3.Combine hypotheses in obvious way: Total error from k non-monotone intervals from scaling factors from estimating ’s with ’s

Improving the approach came from running monotone distribution learner times rather than just times If we could somehow check – more cheaply than learning – whether an interval is monotone before running the learner, could run the learner fewer times and save… …this is a property testing problem! More sophisticated algorithm: two new ingredients.

First ingredient: testing k-modal distributions for monotonicity Consider the following property testing problem: Goal: output “yes” w.h.p. if p is monotone increasing “no” w.h.p. if p is -far from monotone increasing Algorithm gets samples from unknown k-modal distribution p over [N]. 1 n hard to distinguish Note: k-modal promise for p might save us from lower bound…

Efficiently testing k-modal distributions for monotonicity Goal: output “yes” w.h.p. if p is monotone increasing “no” w.h.p. if p is -far from monotone increasing Algorithm gets samples from unknown k-modal distribution p over [N]. Theorem: There is a -sample tester for this problem. We’ll use this to identify sub-intervals of [N] where p is monotone v close to …can we efficiently learn close-to-monotone distributions?

Second ingredient: agnostically learning monotone distributions Consider the following “agnostic learning” problem: Algorithm gets samples from unknown distribution p over [N] that is -close to monotone. Goal: output hypothesis distribution h such that If opt=0, this is the original “learn a monotone distribution” problem Want to handle general case as efficiently as opt=0 case

agnostically learning monotone distributions Algorithm gets samples from unknown distribution p over [N] that is opt-close to monotone. Goal: output hypothesis distribution h such that Theorem: There is a computationally efficient learning algorithm for this problem that uses samples.

agnostically learning monotone distributions Algorithm gets samples from unknown distribution p over [N] that is opt-close to monotone. Goal: output hypothesis distribution h such that Theorem: There is a computationally efficient learning algorithm for this semi-agnostic problem that uses samples. Semi- The [Birge87] monotone distribution learner does the job. We will take,, so versus doesn’t matter.

The learning algorithm: first phase 1.Use [DKW] to divide [N] into intervals & obtain estimates such that 2.Run testers on then etc., until first time both say “no” at Mark and continue. invocations of tester in total (Alternative: use binary search: invocations of tester in total.) …

The algorithm 2.Run testers on then etc., until first time both say “no” at Mark and continue. Each time an interval is marked, the block of unmarked intervals right before it is close-to-monotone; call this a superinterval (at least) one of the k peaks/valleys of p is “used up” …

The learning algorithm: second phase After this step, [N] is partitioned into superintervals each -close to monotone “marked” intervals, each of weight Rest of algorithm: 3.Run semi-agnostic monotone distribution learner on each superinterval to get -accurate hypothesis for 4.Output final hypothesis

Analysis of the algorithm Sample complexity: runs of tester: each uses samples runs of semi-agnostic monotone learner: each uses samples. Error rate: error from marked intervals total error from estimating ’s with ’s total error from scaling factors

I owe you a tester Theorem: There is a -sample tester for this problem. Algorithm gets samples from unknown k-modal distribution p over [N]. Goal: output “yes” w.h.p. if p is monotone increasing “no” w.h.p. if p is -far from monotone increasing

The testing algorithm Algorithm: Run [DKW] with accuracy Let be resulting empirical PDF. If such that then output “no”; otherwise output “yes” Completeness: p monotone increasing  test passes w.h.p. average value of over [a,b]

Soundness Soundness lemma: If is k-modal and have then is -close to monotone increasing. Algorithm: Run [DKW] with accuracy Let be resulting empirical PDF. If such that then output “no”; otherwise output “yes” To prove soundness lemma: show that under lemma’s hypothesis, can “correct” each peak/valley of by “spending” at most in variation distance.

Correcting a peak of p Lemma: If is k-modal and have then is -close to monotone increasing. Draw a line at height such that (mass of “hill” above line) = (missing mass of “valley” below line): Consider a peak of p: Correct the peak by bulldozing the hill into the valley:

Why it works n correction Lemma: If is k-modal and have then is -close to monotone increasing. Soand so

Summary Sample- and time- efficient algorithms for learning and testing k-modal distributions over [N]. Upper bounds pretty close to lower bounds for these problems. Testing is easier than learning Learning algorithms have a testing component

Future work More efficient algorithms for restricted classes of -modal distributions? [DDS11]: any sum of Bernoulli random variables is learnable using samples independent of special type of unimodal distribution: “Poisson Binomial Distribution”

Thank you

Key ingredient: oblivious decomposition Decompose into intervals whose widths increase as powers of. Call these the oblivious buckets. ……

Flattening a monotone distribution using the oblivious decomposition Given a monotone decreasing distribution, the flattened version of, denoted, spreads ’s weight uniformly within each bucket of the oblivious decomposition: Lemma: [B87] For any monotone decreasing distribution, have …… true pdf flattened version … …

Learning monotone distributions using oblivious decomposition [B87] Reduce - View as arbitrary distribution over -element set: Algorithm: Draw samples from Output hypothesis is the flattened empirical distribution learning monotone distributions over to accuracy learning arbitrary distributions over to accuracy Analysis:

Testing monotone distributions using oblivious decomposition : known monotone distribution over : unknown monotone distribution over : known distribution over : unknown distribution over But, can do better by using oblivious decomposition directly: testing equality of monotone distributions over to accuracy testing equality of arbitrary distributions over to accuracy Using [BFFKRW02], get -sample testing algorithm Can use learning algorithm to get -sample algorithm for testing problem. Can show lower bound for any tester.

[BKR04] implicitly gave log^2(n)loglog(n)/eps^5-sample algorithm for learning monotone distribution

Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Similar presentations

Presentation on theme: "Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Similar presentations

Presentation on theme: "Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis."— Presentation transcript:

Similar presentations

About project

Feedback