Algorithmic Construction of Sets for k-Restrictions Dana Moshkovitz Joint work with Noga Alon and Muli Safra Tel-Aviv University
Talk Plan Problem definition: k-restrictions Applications: … group testing generealized hashing Set-Cover Hardness Background Techniques and Results
Techniques Greedine$$ k-wise approximating distributions Concatenation multi-way splitters via the topological Necklace Splitting Theorem
Problem Definition
On Forgetful Hot-Tempered Pirates and Helpless Goldsmiths One day the hot-tempered pirate asks the goldsmith to prepare him a nice string in m.
But the capricious pirate has various contradicting local demands he may pose when he comes to collect it… this pattern! should differ!
What will the goldsmith do?
make many strings, so every demand is met!
Formal Definition [~NSS95] Input: alphabet , length m. demands f1,…,fs:k{0,1}, Solution: Am s.t for every 1i1<…<ikm, 1js, there is aA s.t. fj(a(i1),…,a(ik))=1. Measure: how small |A| is k
Applications
Goldsmith-Pirate Games Capture Many Known Problems universal sets hashing and its generalizations group testing set-cover gadget separating codes superimposed codes color coding …
Application I Universal Set every k configuration is tried. circuit 1 . 1 . 1 . . . . . . m
Application II Hashing Goal: small set of functions [m][q] For every kq in [m], some function maps them to k different elements small set of functions u1 u2 u3 u4 . um k r1 r2 . rq
Generalized Hashing Theorem Definition (t,u)-hash families [ACKL]: for all TU, |T|=t, |U|=u, some function f satisfies f(i)≠f(j) for every iT, jU-{i}. Theorem: For any fixed 2≤t<u, for any >0, one can construct efficiently a (t,u)-hash family over alphabet of size t+1, whose rate (i.e logqm/n) ≥ (1-)t!(u-t)u-t/uu+1ln(t+1)
Application III Group Testing [DH,ND…] . m people at most k-1 are ill can test a group: contains illness? Goal: identify the ill people by few tests. . . . ? ? ? ? ? ?
Group-Tests Theorem Theorem: For every >0, there exists d(), s.t for any number of ill people d>d(), there exists an algorithm that outputs a set of at most (1+)ed2lnm group-tests in time polynomial in the population’s size (m).
Application IV Orientations [AYZ94] Input: directed graph G Question: simple k-path? if G were DAG…
Application IV Orientations [AYZ94] Need several orientations, s.t wherever the path is, one reflects it. Pick an orientation Delete ‘bad’ edges Now G is a DAG… 3 5 1 4 2 1 2 3 4 5
Application V Set-Cover Gadget sets Gadget: a succinct set-cover instance so that: a small, illegal sub-collection is not a cover. elements legal cover: set and its complement small: its total weight ≤ … sets and complements differ in weight
Approximability of Set-Cover approximation ratio (upto low-order terms) known app. algorithms [Lov75,Sla95,Sri99] ln n if NPDTIME(nloglogn) [Feige96] if NPP [RS97]
Random and Pseudo-Random Solutions Background Random and Pseudo-Random Solutions
= minI,j PraD[ fj(a(I))=1 ] Density m D:m[0,1] - probability distribution. density w.r.t D is: = minI,j PraD[ fj(a(I))=1 ] m . k
Probabilistic Strategy Claim: t=-1(klnm+lns+1) random strings from D form a solution, with probability≥½. Let Cj be some constraint. Prr[rCj] 1- Choose a random set of strings A. Prr[A∩Cj =] (1-)|A| Prr[A∩Cj =] e-|A| The probability some constraint has A∩Cj=, is sC(m,k)e-|A|.
Deterministic Construction!
First Observation m support(D) is a solution if density positive w.r.t D. every demand is satisfied w.p ≥ |support(uniform)|=qm k
Second Observation A k-wise, O()-close to D is a solution. m k every demand is satisfied w.p (1-..) A k-wise, O()-close to D is a solution. Theorem [EGLNV98]: Product dist. are efficiently (poly(qk,m,-1)) approximatable
So What’s the Problem? It’s much more costly than a random solution! Random solution: ~ klogm/ for all distributions! k-wise -close to uniform: O(2kk2 log2m /2) [AGHP90] for other distributions, the state of affairs is usually much worse… e.g for the uniform distribution 2k-2k2log2m
Background Sum-Up Random strings are good solutions for k-restriction problems if one picks the ‘right’ distribution… k-wise approximating distributions are deterministic solutions of larger size… Our goal: simulate deterministically the probabilistic bound
Our Results
Outline k=O(1) Greedy on approximation k=O(logm/loglogm) Concatenation assumes invariance under permutations + k=O(logm/loglogm) Concatenation works for some problems + multi-way splitters larger k’s
same as random solution! Greedine$$ same as random solution! m Claim: Can find a solution of size --1(klnm+lns) in time poly(C(m,k), s, |support|) Proof: Formulate as Set-Cover: elements: <position,constraint> sets: <support vector> Apply greedy strategy. k
Concatenation m m’ m’ N N hash family inefficient solution
Concatenation Works For Permutations Invariant Demands
Theorem Theorem: Fix some eff. approx. dist. D. Given a k-rest. prob. with density w.r.t D, obtain a solution of size arbitrarily close to (2klnk+lns)/ × k4logm in time poly(m,s,kk,qk,-1).
Dividing Into BLOCKS m
Splitters, [NSS95] What are they? How to construct? several block divisions any k are splat by one k-restriction problem! How to construct? needs only (b-1) cuts use concatenation
Multi-Way Splitters m For any I1⊎…⊎It[m], |⊎Ij|k, some partition to b blocks is a split. k-restriction problem! b k
Necklace Splitting [A87] b thieves t types How many splits?
Necklace Splitting [A87]
Necklace Splitting Theorem Theorem (Alon, 1987): Every necklace with bai beads of color i, 1it, has a b-splitting of size at most (b-1)t. tight! Corollary: A multi-way splitter of size b(b-1)t+1 C(m, (b-1)t) is efficiently constructible. Noga’s result: Continuous splitting: Given a t-coloring of [0,1], divide it into b pairwise disjoint Lebesgue measurable subfamilies, s.t each captures 1/b measure. Reduction from discrete splitting to continuous splitting: Given a necklace color [0,1] accordingly. Problem: single bids can be splat too. Solution: show we can get rid of ‘bad cuts’. Draw a multi-graph: vertex for each thief, edge for each color i s.t bid of this color is splat between the two thieves. Note: All degrees are even! [Each color is splat evenly between the thieves] Hence, there is an Euler cycle in that multi-graph. Now – slide the cuts along this cycle. Prove the continuous version: (1) Show it holds for every prime (the case b=2 was already proven). (2) Note this implies the general case: say we prove for (t,k) and (t,l). For (t,kl): First split to k parts. Gather all of them, re-scale and split to l parts. Topological Proof for (1): via a generalization of the Borsuk-Ulam theorem [Any continuous function from an n-sphere into Euclidean n-space maps some pair of antipodal points to the same point.] C(k2, ·|Hashm,k2,k| concatenation
The b=t=2 Case
Sum-Up Beat k-wise approximations for k-restriction problems. Multi-way splitters via Necklace Splitting. Substantial improvements for: Group Testing Generalized Hashing Set-Cover
Further Research Applications: complexity, algorithms, combinatorics, cryptography… Better constructions? different techniques?