Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation Immune Functions and Learning Lisa Hellerstein Polytechnic Institute of NYU Brooklyn, NY Includes joint work with Bernard Rosell (AT&T), Eric.

Similar presentations


Presentation on theme: "Correlation Immune Functions and Learning Lisa Hellerstein Polytechnic Institute of NYU Brooklyn, NY Includes joint work with Bernard Rosell (AT&T), Eric."— Presentation transcript:

1 Correlation Immune Functions and Learning Lisa Hellerstein Polytechnic Institute of NYU Brooklyn, NY Includes joint work with Bernard Rosell (AT&T), Eric Bach and David Page (U. of Wisconsin), and Soumya Ray (Case Western)

2 2 Identifying relevant variables from random examples x f(x) (1,1,0,0,0,1,1,0,1,0) 1 (0,1,0,0,1,0,1,1,0,1) 1 (1,0,0,1,0,1,0,0,1,0) 0

3 3 Technicalities Assume random examples drawn from uniform distribution over {0,1} n Have access to source of random examples

4 4 Detecting that a variable is relevant Look for dependence between input variables and output If x i irrelevant P(f=1|xi=1) = P(f=1|xi=0) If x i relevant P(f=1|xi=1) ≠ P(f=1|xi=0) for previous function f

5 5 Unfortunately… x i relevant P(f=1|xi=1) = 1/2 = P(f=1|xi=0) x i irrelevant P(f=1|xi=1) = 1/2 = P(f=1|xi=0) Finding a relevant variable easy for some functions. Not so easy for others.

6 6 How to find the relevant variables Suppose you know r (# of relevant vars) Assume r << n (Think of r = log n) Get m random examples, where m = poly(2 r,log n,1/δ) With probability > 1-δ, have enough info to determine which r variables are relevant –All other sets of r variables can be ruled out

7 7 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0) 1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1, 1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1, 0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1, 1, 1) 0

8 8 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0) 1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1, 1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1, 0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1, 0, 1) 0

9 9 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0) 1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1, 1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1, 0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1, 0, 1) 0 x 3, x 5, x 9 can’t be the relevant variables

10 10 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 f (1, 1, 0, 1, 1, 0, 1, 0, 1, 0) 1 (0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0 (1, 1, 1, 0, 0, 0, 0, 0, 0, 0) 1 (0, 0, 0, 1, 1, 0, 0, 0, 0, 0) 0 (1, 1, 1, 0, 0, 0, 1, 1, 1, 1) 0 x 1, x 3, x 10 ok

11 11 Naïve algorithm: Try all combinations of r variables. Time ≈ n r Mossel, O’Donnell, Servedio [STOC 2003] –Algorithm that takes time ≈ n cr where c ≈.704 –Subroutine: Find a single relevant variable Still open: Can this bound be improved?

12 12 If output of f is dependent on xi, can detect dependence (whp) in time poly(n, 2r) and identify xi as relevant. Problematic Functions Every variable is independent of output of f P[f=1|x i =0] = P[f=1|x i =1] for all x i Equivalently, all degree 1 Fourier coeffs = 0 Functions with this property said to be CORRELATION-IMMUNE

13 13 P[f=1|x i =0] = P[f=1|x i =1] for all x i Geometrically: 00 01 10 11 e.g. n=2

14 14 P[f=1|x i =0] = P[f=1|x i =1] for all x i Geometrically: 00 01 10 11 1 1 0 0 Parity(x 1,x 2 )

15 15 P[f=1|x i =0] = P[f=1|x i =1] for all x i Geometrically: 00 01 10 11 1 1 0 0 X 1 =1 X 1 =0

16 16 P[f=1|x i =0] = P[f=1|x i =1] for all x i 00 01 10 11 1 1 0 0 X 2 =0X 2 =1

17 17 Other correlation-immune functions besides parity? –f(x 1,…,x n ) = 1 iff x 1 = x 2 = … = x n

18 18 Other correlation-immune functions besides parity? –All reflexive functions

19 19 Other correlation-immune functions besides parity? –All reflexive functions –More…

20 20 Correlation-immune functions and decision tree learners Decision tree learners in ML –Popular machine learning approach (CART, C4.5) –Given set of examples of Boolean function, build a decision tree Heuristics for decision tree learning –Greedy, top-down –Differ in way choose which variable to put in node –Pick variable having highest “gain” –P[f=1|xi=1] = P[f=1|xi=0] means 0 gain Correlation-immune functions problematic for decision tree learners

21 21 Lookahead Skewing: An efficient alternative to lookahead for decision tree induction. IJCAI 2003 [Page, Ray] Why skewing works: learning difficult Boolean functions with greedy tree learners. ICML 2005 [Rosell, Hellerstein, Ray, Page]

22 22 Story Part One

23 23 How many difficult functions? More than n # fns 012345 224186483140062 n-1 2

24 24 How many different hard functions? More than SOMEONE MUST HAVE STUDIED THESE FUNCTIONS BEFORE… n # fns 012345 224186483140062 n/2 2

25 25

26 26

27 27 Story Part Two

28 28 I had lunch with Eric Bach

29 29 Roy, B. K. 2002. A Brief Outline of Research on Correlation Immune Functions. In Proceedings of the 7th Australian Conference on information Security and Privacy (July 03 - 05, 2002). L. M. Batten and J. Seberry, Eds. Lecture Notes In Computer Science, vol. 2384. Springer-Verlag, London, 379-394.

30 30 Correlation-immune functions k-correlation immune function –For every subset S of the input variables s.t. 1 ≤ |S| ≤ k P[f | S] = P[f] –[Xiao, Massey 1988] Equivalently, all Fourier coefficients of degree i are 0, for 1 ≤ i ≤ k

31 31 Siegenthaler’s Theorem If f is k-correlation immune, then the GF[2] polynomial for f has degree at most n-k.

32 32 Siegenthaler’s Theorem [1984] If f is k-correlation immune, then the GF[2] polynomial for f has degree at most n-k. Algorithm of Mossel, O’Donnell, Servedio [STOC 2003] based on this theorem

33 33 End of Story

34 34 Non-uniform distributions Correlation-immune functions are defined wrt the uniform distribution What if distribution is biased? e.g. each bit 1 with probability ¾

35 35 f(x 1,x 2 ) = parity(x 1,x 2 ) each bit 1 with probability 3/4 x parity(x) P[x] 00 0 1/16 01 1 3/16 10 1 3/16 11 0 9/16 P[f=1|x 1 =1] ≠ P[f=1|x 1 =0]

36 36 f(x 1,x 2 ) = parity(x 1,x 2 ) p=1 with probability 1/4 x parity(x) P[x] 00 0 1/16 01 1 3/16 10 1 3/16 11 0 9/16 P[f=1|x 1 =1] ≠ P[f=1|x 1 =0] For added irrelevant variables, would be equal

37 37 Correlation-immunity wrt p-biased distributions Definitions f is correlation-immune wrt distribution D if P D [f=1|xi=1] = P D [f=1|xi=0] for all xi p-biased distribution D p : each bit set to 1 independently with probability p –For all p-biased distributions D, P D [f=1|xi=1] = P D [f=1|xi=0] for all irrelevant xi

38 38 Lemma: Let f(x1,…,xn) be a Boolean function with r relevant variables. Then f is correlation immune w.r.t. D p for at most r-1 values of p. Pf: Correlation immune wrt Dp means P[f=1|xi=1] – P[f=1|xi=0] = 0 (*) for all xi. Consider fixed f and xi. Can write lhs of (*) as polynomial h(p).

39 39 e.g. f(x 1,x 2, x 3 ) = parity(x 1,x 2, x 3 ) p-biased distribution Dp h(p) = P Dp [f=1|x1=1] - P Dp [f=1|x1=0] = ( p 2 + p(1-p) ) – ( p(1-p) + (1-p)p ) If add irrelevant variable, this polynomial doesn’t change h(p) for arbitrary f, variable x i, has degree <= r- 1, where r is number of variables. f correlation-immune wrt at most r-1 values of p, unless h(p) identically 0 for all xi.

40 40 h(p) = P Dp [f=1|xi=1] -P Dp [f=1|xi=0] where w d is number of inputs x for which f(x)=1, xi=1, and x contains exactly d additional 1’s. i.e. w d = number of positive assignments of f xi<-1 of Hamming weight d Similar expression for P Dp [f=1|xi=0]

41 41 P Dp [f=1|x i =1] - P Dp [f=1|x i =0] = where w d = number of positive assignments of f xi<-1 of Hamming weight d r d = number of positive assignments of f xi<-0 of Hamming weight d Not identically 0 iff w d ≠ r d for some d

42 42 Property of Boolean functions Lemma: If f has at least one relevant variable, then for some relevant variable xi, and some d, w d ≠ r d for some d where w d = number of positive assignments of f xi<-1 of Hamming weight d r d = number of positive assignments of f xi<-0 of Hamming weight d

43 43 How much does it help to have access to examples from different distributions?

44 44 How much does it help to have access to examples from different distributions? Hellerstein, Rosell, Bach, Page, Ray Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions [Hellerstein, Rosell, Bach, Ray, Page]

45 45 Even if f is not correlation-immune wrt Dp, may need very large sample to detect relevant variable –if value of p very near root of h(p) Lemma: If h(p) not identically 0, then for some value of p in the set { 1/(r+1),2/(r+1),3/(r+1)…, (r+1)/(r+1) }, h(p) ≥ 1/(r+1) r-1

46 46 Algorithm to find a relevant variable –Uses examples from distributions Dp, for p = 1/(r+1),2/(r+1),3/(r+1)…, (r+1)/(r+1) –sample size poly((r+1) r, log n, log 1/δ) [Essentially same algorithm found independently by Arpe and Mossel, using very different techniques] Another algorithm to find a relevant variable –Based on proving (roughly) that if choose random p, then h 2 (p) likely to be reasonably large. Uses prime number theorem. –Uses examples from poly(2 r, log 1/ δ) distributions Dp. –Sample size poly(2 r, log n, log 1/ δ)

47 47 Better algorithms?

48 48 Summary Finding relevant variables (junta-learning) Correlation-immune functions Learning from p-biased distributions

49 49 Moral of the Story Handbook of integer sequences can be useful in doing literature search Eating lunch with the right person can be much more useful


Download ppt "Correlation Immune Functions and Learning Lisa Hellerstein Polytechnic Institute of NYU Brooklyn, NY Includes joint work with Bernard Rosell (AT&T), Eric."

Similar presentations


Ads by Google