The 1’st annual (?) workshop
2 Communication under Channel Uncertainty: Oblivious channels Michael Langberg California Institute of Technology
3 Coding theory XY m {0,1} k Noise x = C(m) {0,1} n y C: {0,1} k {0,1} n Error correcting codes m decode
4 Communication channels Design of C depends on properties of channel. Channel W: W(e|x) = probability that error e is imposed by channel when x=C(m) is transmitted. In this case y=x e is received. BSC p : Binary Symmetric Channel. Each bit flipped with probability p. W(e|x)=p |e| (1-p) n-|e| X Y xey=x e
5 Success criteria Let D: {0,1} n {0,1} k be a decoder. Let D: {0,1} n {0,1} k be a decoder. C is said to allow the communication of m over W (with D) if Pr e [D(C(m) e)=m] ~ 1. C is said to allow the communication of m over W (with D) if Pr e [D(C(m) e)=m] ~ 1. Probability over W(e|C(m)). Probability over W(e|C(m)). C: {0,1} k {0,1} n XY x=C(m)y=x e e C is said to allow the communication of {0,1} k over W (with D) if Pr m,e [D(C(m) e)=m] ~ 1. C is said to allow the communication of {0,1} k over W (with D) if Pr m,e [D(C(m) e)=m] ~ 1. Probability uniform over {0,1} k and over W(e|C(m)). Probability uniform over {0,1} k and over W(e|C(m)). Rate of C is k/n. Rate of C is k/n. BSC p [Shannon] : exist codes with rate ~ 1-H(p) (optimal).
6 Channel uncertainty What if properties of the channel are not known? What if properties of the channel are not known? Channel can be any channel in family W = {W}. Channel can be any channel in family W = {W}. Objective: design a code that will allow communication not matter which W is chosen in W. Objective: design a code that will allow communication not matter which W is chosen in W. C is said to allow the communication of {0,1} k over channel family W if there exists a decoder D s.t. for each W W : C,D allow communication of {0,1} k over W. C is said to allow the communication of {0,1} k over channel family W if there exists a decoder D s.t. for each W W : C,D allow communication of {0,1} k over W. XY ?
7 A channel W is a p-channel if it can only change a p-fraction of the bits transmitted: W(e|x)=0 if |e|>pn. A channel W is a p-channel if it can only change a p-fraction of the bits transmitted: W(e|x)=0 if |e|>pn. W p = family of all p-channels. W p = family of all p-channels. Communicating over W p : design a code that enables communication no matter which p-fraction of bits are flipped. Communicating over W p : design a code that enables communication no matter which p-fraction of bits are flipped. The family W p Power constrain on W XY W p. Adversarial model in which the channel W is chosen maliciously by an adversarial jammer within limits of W p.
8 * * * * * * * Communicating over W p Communicating over W p : design a code C that enables communication no matter which p-fraction of bits are flipped. Communicating over W p : design a code C that enables communication no matter which p-fraction of bits are flipped. “Equivalently”: minimum distance of C is 2pn. “Equivalently”: minimum distance of C is 2pn. C {0,1} n Min. distance X Y WpWpWpWp What is the maximum achievable W p ? rate over W p ? Major open problem. Major open problem. Known: 1-H(2p) ≤ R < 1-H(p) Known: 1-H(2p) ≤ R < 1-H(p)
9 This talk Communication over W p not fully understood. Communication over W p not fully understood. W p does not allow communication w/ rate 1-H(p). W p does not allow communication w/ rate 1-H(p). BSC p allows communication at rate 1-H(p). BSC p allows communication at rate 1-H(p). In “essence” BSC p W p (power constraint). In “essence” BSC p W p (power constraint). Close gap by considering restriction of W p. Close gap by considering restriction of W p. Oblivious channels Oblivious channels Communication over W p with the assumption that the channel has a limited view of the transmitted codeword. Communication over W p with the assumption that the channel has a limited view of the transmitted codeword. X Y
10 Recall … Communicating over W p : design a code C that enables communication no matter which p-fraction of bits are flipped (“equivalently”: minimum distance of C is 2pn). Communicating over W p : design a code C that enables communication no matter which p-fraction of bits are flipped (“equivalently”: minimum distance of C is 2pn). Known: 1-H(2p) ≤ R < 1-H(p). Known: 1-H(2p) ≤ R < 1-H(p). Communicating over W p : Corresponds to an energy constraint on the channels behavior. Communicating over W p : Corresponds to an energy constraint on the channels behavior. Would like to study rate achievable under additional limitations. Would like to study rate achievable under additional limitations. X Y
11 Oblivious channels Communicating over W p : only p-fraction of bits can be flipped. Communicating over W p : only p-fraction of bits can be flipped. Think of channel as adversarial jammer. Think of channel as adversarial jammer. Jammer acts maliciously according to codeword sent. Jammer acts maliciously according to codeword sent. Additional constraint: Would like to limit the amount of information the adversary has on codeword x sent. Additional constraint: Would like to limit the amount of information the adversary has on codeword x sent. For example: For example: Channel with a “window” view. Channel with a “window” view. In general: correlation between codeword x and error e imposed by W is limited. In general: correlation between codeword x and error e imposed by W is limited. X Y
12 Oblivious channels: model A channel W is oblivious if W(e|x) is independent of x. A channel W is oblivious if W(e|x) is independent of x. BSC p is an oblivious channel. BSC p is an oblivious channel. A channel W is partially-oblivious if the dependence of W(e|x) on x is limited: A channel W is partially-oblivious if the dependence of W(e|x) on x is limited: Intuitively I(e,x) is small. Intuitively I(e,x) is small. Partially oblivious - definition: Partially oblivious - definition: For each x: W(e|x)=W x (e) is a distribution over {0,1} n. For each x: W(e|x)=W x (e) is a distribution over {0,1} n. Limit the size of the family {W x |x}. Limit the size of the family {W x |x}. X Y Let W 0 and W 1 be two distributions over errors. Define W as follows: W(e|x) = W 0 (e) if the first bit of x is 0. W(e|x) = W 1 (e) if the first bit of x is 1. W is almost completely oblivious.
13 Families of oblivious channels A family of channels W * W p is (partially) oblivious if each W W * is (partially) oblivious. A family of channels W * W p is (partially) oblivious if each W W * is (partially) oblivious. Study the rate achievable when comm. over W *. Study the rate achievable when comm. over W *. Jammer W * is limited in power and knowledge. Jammer W * is limited in power and knowledge. BSC p is an oblivious channel “in” W p. BSC p is an oblivious channel “in” W p. Rate on BSC p ~ 1-H(p). Rate on BSC p ~ 1-H(p). Natural question: Can this be extended to any family of oblivious channels? Natural question: Can this be extended to any family of oblivious channels?
14 Our results Study both oblivious and partially oblivious families. Study both oblivious and partially oblivious families. For oblivious families W * one can achieve rate ~ 1-H(p). For oblivious families W * one can achieve rate ~ 1-H(p). For families W * of partially oblivious channels in which For families W * of partially oblivious channels in which W W * : {W x |x} of size at most 2 n. Achievable rate ~ 1-H(p)- (if < (1-H(p))/3). Sketch proof for oblivious W *. Sketch proof for oblivious W *. X Y
15 Previous work Oblivious channels in W * have been addressed by [CsiszarNarayan] as a special case of Arbirtrarily Varying Channels with state constraints. Oblivious channels in W * have been addressed by [CsiszarNarayan] as a special case of Arbirtrarily Varying Channels with state constraints. [CsiszarNarayan] show that rate ~ 1-H(p) for oblivious channels in W * using the “method of types”. [CsiszarNarayan] show that rate ~ 1-H(p) for oblivious channels in W * using the “method of types”. Partially oblivious channels not defined previously. Partially oblivious channels not defined previously. For partially oblivious channels [CsiszarNarayan] implicitly show 1-H(p)-30 (compare with 1-H(p)- ). For partially oblivious channels [CsiszarNarayan] implicitly show 1-H(p)-30 (compare with 1-H(p)- ). Our proof technique are substantially different. Our proof technique are substantially different.
16 Linear codes Observation: linear codes will not allow rate 1-H(p) on W * unless they allow comm. over W p (dist. 2pn): Observation: linear codes will not allow rate 1-H(p) on W * unless they allow comm. over W p (dist. 2pn): Exists a codeword c of weight less than 2pn. Exists a codeword c of weight less than 2pn. Take W to be the channel that always imposes error e s.t. |e| ≤ pn and dist(e,c) ≤ pn. Take W to be the channel that always imposes error e s.t. |e| ≤ pn and dist(e,c) ≤ pn. Can show: W does not allow transition of ½ the messages. Can show: W does not allow transition of ½ the messages. No codes of distance 2pn and rate 1-H(p) linear codes do not suffice. No codes of distance 2pn and rate 1-H(p) linear codes do not suffice. Natural candidate: random codes (each codeword chosen at random). Natural candidate: random codes (each codeword chosen at random). Linear codes work well vs. BSC p.
17 Proof technique: Random codes Let C be a code (of rate 1-H(p)) in which each codeword is picked at random. Let C be a code (of rate 1-H(p)) in which each codeword is picked at random. Show: with high probability C allows comm. over any oblivious channel in W * (any channel W which always imposes the same distribution over errors). Show: with high probability C allows comm. over any oblivious channel in W * (any channel W which always imposes the same distribution over errors). Implies: Exists a code C that allows comm. over W * with rate 1-H(p). Implies: Exists a code C that allows comm. over W * with rate 1-H(p).
18 Proof sketch Show: with high probability C allows comm. over any oblivious channel in W *. Show: with high probability C allows comm. over any oblivious channel in W *. Step 1: show that C allows comm. over W * iff C allows comm. over channels W that always impose a single error e (|e| ≤ pn). Step 1: show that C allows comm. over W * iff C allows comm. over channels W that always impose a single error e (|e| ≤ pn). Step 2: Let W e be the channel that always imposes error e. Step 2: Let W e be the channel that always imposes error e. Show that w.h.p. C allows comm. over W e. Step 3: As there are only ~ 2 H(p)n channels W e : use union bound. Step 3: As there are only ~ 2 H(p)n channels W e : use union bound. X Y xey=x e
19 Proof of Step 2 Step 2: Let W e be the channel that always imposes error e. Step 2: Let W e be the channel that always imposes error e. Show that w.h.p. C allows comm. over W e. Let D be the Nearest Neighbor decoder. Let D be the Nearest Neighbor decoder. By definition: C allows comm. over W e iff By definition: C allows comm. over W e iff for most codewords x=C(m): D(x e)=m. Codeword x=C(m) is disturbed if D(x e) m. Codeword x=C(m) is disturbed if D(x e) m. Random C: expected number of disturbed codewords is small (i.e. in expectation C allows communication). Random C: expected number of disturbed codewords is small (i.e. in expectation C allows communication). Need to prove that number of disturbed codewords is small w.h.p. Need to prove that number of disturbed codewords is small w.h.p. X Y xey=x e * * * * * * * * C e
20 Concentration Expected number of disturbed codewords is small. Expected number of disturbed codewords is small. Need to prove that number of disturbed Need to prove that number of disturbed codewords is small w.h.p. Standard tool - Concentration inequalities: Standard tool - Concentration inequalities: Azuma, Talagrand, Chernoff. Work well when random variable has small Work well when random variable has small Lipschitz coefficient. Study Lipschitz coefficient of our process. Study Lipschitz coefficient of our process. X Y xey=x e * * * * * * * * C e
21 Lipschitz coefficient Lipschitz coefficient in our setting: Let C and C’ be two codes that differ Let C and C’ be two codes that differ in a single codeword. Lipschitz coefficient = difference between Lipschitz coefficient = difference between number of disturbed codewords in C and C’ number of disturbed codewords in C and C’ w.r.t. W e. w.r.t. W e. Can show that L. coefficient is very large. Can show that L. coefficient is very large. Cannot apply standard concentration techniques. Cannot apply standard concentration techniques. What next? What next? X Y xey=x e * * * * * * * * C e
22 Lipschitz coefficient Lipschitz coefficient in our setting is large. However one may show that “on average” However one may show that “on average” Lipschitz coefficient is small. This is done by studying the list decoding This is done by studying the list decoding properties of random C. Once we establish that “average” Lipschitz Once we establish that “average” Lipschitz coef. is small one may use recent concentration result of Vu to obtain proof. Establishing “average” Lipschitz coef. is technically involved. Establishing “average” Lipschitz coef. is technically involved. X Y xey=x e * * * * * * * * C e [Vu]: Random process in which Lipschitz coefficient has small “expectation” and “variance” will have exponential concentration: probability of deviation from expectation is exponential in deviation. [KimVu]: concentration of low degree multivariate polynomials (extends Chernoff).
23 Conclusions and future research Theme: Communication over W p not fully understood. Gain understanding of certain relaxations of W p. Theme: Communication over W p not fully understood. Gain understanding of certain relaxations of W p. Seen: Seen: Oblivious channels W * W p. Oblivious channels W * W p. Allows rate 1-H(p). Allows rate 1-H(p). Other relaxations: Other relaxations: “Online adversaries”. “Online adversaries”. Adversaries restricted to changing certain locations (unknown to X and Y). Adversaries restricted to changing certain locations (unknown to X and Y).