1 Slides by Elery Pfeffer and Elad Hazan, Based on slides by Michael Lewin & Robert Sayegh. Adapted from Oded Goldreich’s course lecture notes by Eilon Reshef.
2 Introduction In this lecture we will show our main result: BPSPACE(log n) DSPACE(log 2 n) The result will be derived in the following order: Formal Definitions Execution Graph Representation A Pseudorandom Generator based on UHF. Analysis of the Execution Graph traversal by the Pseudorandom Generator.
3 Definition of BPSPACE(·) BPSPACE(·) is the family of bounded probability space-bounded complexity classes. Def: The complexity class BPSPACE(s(·)) is the set of all languages L s.t. there exists a randomized TM M that on input x: M uses at most s(|x|) space The running time of M is bounded by exp(s(|x|)) x L Pr [M(x) = 1] 2/3 x L Pr [M(x) = 1] 1/3 s(·) is any complexity function s.t. s(·) log(·). We focus on BPL, namely, BPSPACE(log) 16.2 Without this condition: NSPACE(·) = RSPACE(·) BPSPACE(·)
4 Execution Graphs We represent the execution of a BPSPACE machine M on input x as a layered directed graph G M,x The vertices in the i-th layer ( ) correspond to all the possible configurations of M after it has used i random bits. Each vertex has 2 outgoing edges corresponding to reading a “0” or “1” bit from the random tape. Note: Width of G M,x = | | 2 s(n) · s(n) · n exp(s(n)) Depth of G M,x = # of layers exp(s(n)) 16.3
5 Execution Graph Example Width Depth
6 Execution Graph Definitions The set of final vertices are partitioned into: Vacc - The set of accepting configurations. Vrej - The set of rejecting configurations. A random walk on G M,x is a sequence of steps emanating from the initial configuration and traversing randomly the directed edges of G M,x. A guided walk on G M,x (with guide R) is a sequence of steps emanating from the initial configuration and traversing the i-th edge in G M,x according to the i- th bit given by the guide R.
7 Execution Graph Definitions Denote by ACC(G M,x,R) the event that the R guided walk reaches a vertex in Vacc. Thus, Pr[M accepts x] = Pr[ACC(G M,x,R)] Summarizing, we learn that for a language L in BPL there exists a (D,W)-graph s.t. Width- W(n) = exp(s(n)) = poly(n). Depth- D(n) = exp(s(n)) = poly(n). For a random guide R: –x L Pr R [ACC(G M,x,R)] 2/3. –x L Pr R [ACC(G M,x,R)] 1/3.
8 Execution Graph Definitions We note that the following technical step preserves a random walk on G M,x pruning the layers of G M,x s.t. only every l-th layer remains, contracting edges when necessary. Denote the new pruned graph as G. This is done to ease the analysis of the pseudorandom generator further on.
9 0 l 2l 0101… …010 Execution Graph Definitions Clearly, a random walk on G M,x is equivalent to random walk on G.
10 Universal Hash Functions Def: A family of functions H = {h: A B} is called a universal family of hash functions if for every x 1 and x 2 in A, x 1 x 2, Pr h H [h(x 1 ) = y 1 and h(x 2 ) = y 2 ] = (1/|B| 2 ) We will use a linear universal family of hash functions seen previously: H l = {h: {0,1} l {0,1} l } h a,b = ax + b This family has a succinct representation ( 2l space ) and can be computed in linear space. 16.4
11 Universal Hash Functions For every A {0,1} l denote by m(A) the probability that a random element hits the set A: m(A) = |A| / 2 l Hash Proposition: For every universal family of hash functions, H l, and for every two sets A,B {0,1} l, all but a 2 -l/5 fraction of functions h H l satisfy |Pr x [x A and h(x) B] - m(A) · m(B)| 2 -l/5 That is, a large fraction of the hash functions extend well to the sets A and B.
12 Construction Overview Def: A function H: {0,1} k {0,1} D is a (D,W)- pseudorandom generator if for every (D,W)- graph G: | Pr R {0,1} D [ACC(G,R)] - Pr R’ {0,1} k [ACC(G,H(R’)] | 1/10 Prop: There exists a (D,W)-pseudorandom generator H(·) with k(n)=O(logD·logW). Futher, H(·) is computable in space linear on its input. 16.5
13 Construction Overview Corollary: There exists a (D,W)- pseudorandom generator H(·) with the following parameters: s(n) = (log n) D(n) poly(n) W(n) poly(n) k(n) = O(log 2 n) By trying all seeds of H, it follows that: BPL DSPACE(log 2 n) This is NOT surprising since: RL NL DSPACE(log 2 n)
14 The Pseudorandom Generator We will define a (D,W)-pseudorandom generator, H. Assume without loss of generality D W. H extends strings of length O(l 2 ) to strings of length D exp(l), for l = (log W). The input to H is the tuple: I = (r,,,…,, ) Where |r|=l, and are representations of functions in H l and l’ = log(D/l). Obviously, |I| = O(l 2 ). 16.6
15 The Pseudorandom Generator The computation of the PRG can be represented as a complete binary tree of depth l’. The output of H is the concatenation of the binary values of the leaves. Formally: H(r,,…,, ) = H(r,,…,, ) H(h i (r),,…,, ) starting with H(z) = z r r r r h 1 (r) h 2 (r) h 2 (h 1 (r)) h 3 (h 2 (h 1 (r))) h 3 (h 2 (r))
16 intuition r r r r h 1 (r) h 2 (r) h 2 (h 1 (r)) h 3 (h 2 (h 1 (r))) h 3 (h 2 (r)) resulted pseudorandom sequence Execution graph ( using sequence ) 0L2L3L4L5L6L7L The output of the pseudorandom generator represents a path in The execution graph. The sequence is contructed using only L bits for r, and l’ hash functions.
17 Analysis Claim: H is indeed a (D,W)-pseudorandom generator We will show that a guided walk on G M,x using the guide H(I) behaves almost as a truly random walk. We will perform a sequence of coarsenings. At each coarsening we will use a new hash function from H and reduce the number of random bits by a factor of 2. After l’ such coarsenings, the only random bits remaining are l random bits of r and the l’ representations of the hash functions 16.7
18 At the first coarsening we replace the random guide R = (R 1, R 2,…, R D/l ) with the semi-random guide R’ = (R 1, h l’ (R 1 ), R 3, h l’ (R 3 ),…, R D/l-1, h l’ (R D/l-1 )). And we will show that this semi-random guide succeeds to “fool” G M,x. Analysis r1r1 h 3 (r 1 ) r1r1 r3r3 h 3 (r 3 ) r3r3 r5r5 h 3 (r 5 ) r5r5 r7r7 h 3 (r 7 ) r7r7
19 Intuition h 3 (r 1 ) r1r1 h 3 (r 3 ) r3r3 h 3 (r 5 ) r5r5 pseudorandom random seed < P(Acc) - By summing over all l’ levels in the tree we will show that the total difference in probability to accept is less the 1/10
20 At the second coarsening we again replace half of the random bits by choosing a new hash function R’ = (R 1, h l’ (R 1 ), R 3, h l’ (R 3 ),…, R D/l-1, h l’ (R D/l-1 )) R’’ = (R 1, h l’ (R 1 ), h l’-1 (R 1 ), h l’-1 (h l’ (R 1 )),…) And again we show that this semi-random guide also succeeds to “fool” G M,x. Analysis r1r1 h 3 (r 1 ) r1r1 h 3 (h 2 (r 1 )) r1r1 h 2 (r 1 ) r5r5 h 3 (r 5 ) r5r5 h 3 (h 2 (r 5 )) r5r5 h 2 (r 5 )
21 Analysis And so on, until we perform l’ such coarsenings. Upon which we have proven the the generator H(I) is indeed a (D,W)-pseudorandom generator. We recall that the pruned execution graph was denoted G.
22 Analysis At the first coarsening we replace the random guide R = (R 1, R 2,…, R D/l ) with the semi-random guide R’ = (R 1, h l’ (R 1 ), R 3, h l’ (R 3 ),…, R D/l-1, h l’ (R D/l-1 )). We show that: |Pr R [ACC(G,R)] - Pr R’ [ACC(G,R’]| <
23 Analysis We perform preprocessing by removing from G all edges (u,v) whose traversal probability is very small, that is, Pr R [u v] < 1/W 2. Denote by G’ the new graph. Lemma 1: 1 = 2/W, 1.|Pr R [ACC(G’,R)] - Pr R [ACC(G,R)] | < 1 2.|Pr R’ [ACC(G’,R’)] - Pr R’ [ACC(G,R’)] | < 1 Proof: For the first part, the probability that a random walk uses a low probability edge is at most D·(1/W 2 ) < 1/W < 1. For the second part, we consider two consecutive steps. The first step is truly random and the traversal probability is 1/W 2. On the second step we use the hash proposition for the set {0,1} l and the set of low probability edges.
24 Analysis Proof(continued): For all but a 2 -l/5 fraction of hash functions the traversal probability is bounded by, 1/W l/5 < 2/W 2. On the whole, except for a total of (D/2)·2 -l/5 < 1 /2 hash functions the overall probability to traverse a low probability edge is bounded by D·(2/W 2 ) < 1 /2. Thus the total probability is bounded by 1. Thus removing low probability edges does not significantly affect the outcome of G.
25 Analysis We will show that on G’ the semi-random guide R’ performs similarly to the true random guide R. Lemma 2: |Pr R [ACC(G’,R)] - Pr R’ [ACC(G’,R’)] | < 2 Proof: Consider first 3 consecutive vertices u, v, w and the set of edges between them E u,v, E v,w. The probability that a random walk leaves u and reaches w through v is: Pr u-v-w = Pr R 1,R 2 {0,1} l [R 1 E u, v and R 2 E v, w ] Since we removed low probability edges: Pr u-v-w 1/W 4
26 Analysis Proof(continued): The probability that a semi-random walk, determined by hash function h, leaves u and reaches w through v is: Pr h u-v-w = Pr R {0,1} l [R E u, v and h(R) E v, w ] Using the hash proposition with respect to sets E u,v, E v,w we learn that except for a fraction of 2 -l/5 h’s: | Pr h u-v-w - Pr u-v-w | 2 -l/5 Applying to all possible triplets we learn that except for a fraction of 3 =W 3 ·2 -l/5 hash functions: u,v,w | Pr h u-v-w - Pr u-v-w | 2 -l/5
27 Analysis Proof(continued): Denote by Pr u-w (Pr h u-w ) the probability of reaching w from u for the random ( semi-random ) guide. Pr u-w = v Pr u-v-w and Pr h u-w = v Pr h u-v-w Consequently if we assume that h is a “good” hash function, |Pr u-w - Pr h u-w | W·2 -l/5 W 4 ·2 -l/5 ·Pr u-w 4 ·Pr u-w For a large enough constant of l = (log W).
28 Analysis Proof(continued): Since the probability of traversing any path P in G is the sum of the probabilities of traversing every two- hop u-v-w, we learn that: |Pr[R’ = P] - Pr[R = P]| 4 ·Pr[R = P] Summing over all accepting paths, |Pr R [ACC(G’,R)]-Pr R’ [ACC(G’,R’)] | 4 · Pr R [ACC(G’,R)] 4 The probability that h is indeed a good hash function is bounded by 3. Therefore, if we define 2 = 3 + 4 we prove the lemma: |Pr R [ACC(G’,R)]-Pr R’ [ACC(G’,R’)] | 2
29 Analysis Applying both lemmas, we prove that the semi-random guide R’ behaves well in the original graph G M,x : |Pr R [ACC(G M,x,R)]-Pr R’ [ACC(G M,x,R’)] | . We have proved that the first coarsening succeeds. To proceed, we contract every two adjacent layers of G and create a single edge for every two-hop path taken by R’. Lemma 1 and 2 can be reapplied consecutively until after l’ iterations we are left with a bipartite graph with a truly random guide.
30 Analysis All in all, we have shown: |Pr R {0,1} D [ACC(G,R)]-Pr I {0,1} k [ACC(G,H(I)] | ·l’ 1/10 which concludes the proof that H is a (D,W)- pseudorandom generator.
31 Analysis Problem: h is a hash function dependent on O(logn) bits and M is a log-space machine. Why can’t M differentiate between a truly random guide and a pseudorandom guide by just looking at four consecutive blocks of the pseudo-random sequence z, h(z), z’, h(z’), and fully determining h by solving linear equations in log-space? Solution: During the analysis we required that l = (log W) be large enough. In the underlying computation model this corresponds to the fact that M can not even retain a description of the hash function h.
32 Extensions and Related Results We have shown that BPL DSPACE(log 2 n) but the running time of the straightforward derandomized algorithm is (exp(log 2 n) ). Here we sketch the following result BPL SC (“Steve’s Class”). Where SC is the class of all languages that can be recognized in poly(n) time and polylog(n) space. Thm: BPL SC Proof Sketch: If we good guess a “good” set of hash functions h 1,h 2,…,h l’. Then all that would be left to do is to enumerate on r which take poly(n) time. We will show that we can efficiently find a good set of hash functions. 16.8
33 Extensions and Related Results Proof Sketch(continued): We will incrementally fix the hash functions one at a time from h l’ to h 1. The important point to notice is that due to the recursive nature of H, whether h is a good hash function or not depends only on the hash functions fixed before it. Therefore it is enough to incrementally find good hash functions. In order to check whether h is a good hash function we must test if lemma 1 and lemma 2 hold. This requires creating the proper pruned graph G and checking the probabilities on different subsets of edges. Both of these tasks can be performed in poly(n) time.
34 Extensions and Related Results Proof Sketch(continued): Hence, the total time required is l’·poly(n) = poly(n) time and the total space required, O(log 2 n), is dominated by storing the functions h 1,h 2,…,h l’. Further Results (without proof): 1. BPL DSPACE(log 1.5 n) 2. Every random computation that can be carried out in polynomial time and in linear space can also be carried out in polynomial time and in linear space, but using only a linear amount of randomness