General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Guruswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) January 10, 2019 UCI: General Strong Polarization
Part I: Motivation Achieving Shannon Capacity January 10, 2019 UCI: General Strong Polarization
Shannon and Channel Capacity Shannon [‘48]: For every (noisy) channel of communication, ∃ capacity 𝐶 s.t. communication at rate 𝑅<𝐶 is possible and 𝑅>𝐶 is not. formalization – omitted. Example: Acts independently on bits Capacity =1 – 𝐻(𝑝) ; 𝐻(𝑝) = binary entropy! 𝐻 𝑝 =𝑝⋅ log 1 𝑝 + 1−𝑝 ⋅ log 1 1−𝑝 Capacity? ∃ 𝐸: 𝔽 2 𝑘 𝑛 → 𝔽 2 𝑛 𝐷: 𝔽 2 𝑛 → 𝔽 2 𝑘 𝑛 w. Pr 𝐷 𝐵𝑆𝐶 𝐸 𝑚 ≠𝑚 =𝑜(1) lim 𝑛→∞ 𝑘 𝑛 𝑛 ≥𝑅 (Rate) 𝑋∈ 𝔽 2 𝑋 w.p. 1−𝑝 1−𝑋 w.p. 𝑝 𝐵𝑆𝐶(𝑝) January 10, 2019 UCI: General Strong Polarization
“Achieving” Shannon Capacity Commonly used phrase. Many mathematical interpretations. Some possibilities: How small can 𝑛 be? Get 𝑅>𝐶−𝜖 with polytime algorithms? [Luby et al.’95] running time poly 𝑛 𝜖 ? (or 𝑛=poly 1 𝜖 ?) Open till 2008 Arikan’08: Invented “Polar Codes” … Final resolution: Guruswami+Xia’13, Hassani+Alishahi+Urbanke’13 – Strong analysis Shannon ‘48: 𝑛=𝑂 1 𝜖 2 ; 𝜖≝𝐶−𝑅 Forney ‘66 :time =𝑝𝑜𝑙𝑦 𝑛, 2 1 𝜖 2 January 10, 2019 UCI: General Strong Polarization
Polar Codes and Polarization Arikan: Defined Polar Codes, one for every integer 𝑡 Associated random var’s 𝑋 0 ,…, 𝑋 𝑡 ,… 𝑋 𝑡 ∈[0,1] 𝑡th code: Length 𝑛= 2 𝑡 If Pr 𝑋 𝑡 ∈ 𝜏 𝑡 ,1− 𝛿 𝑡 ≤ 𝜖 𝑡 then 𝑡th code is 𝜖 𝑡 + 𝛿 𝑡 -close to capacity, and Pr Decode BSC Encode 𝑚 ≠𝑚 ≤𝑛⋅ 𝜏 𝑡 Need 𝜏 𝑡 =𝑜 1 𝑛 (strong polarization ⇒𝜏=𝑛𝑒𝑔(𝑛)) Need 𝜖 𝑡 =1/ 𝑛 Ω(1) (⇐ strong polarization) Arikan et al. 𝜏=𝑛𝑒𝑔 𝑛 ;𝜖=𝑜 1 ; [GX13,HAU13] ↑ January 10, 2019 UCI: General Strong Polarization
Part II: Martingales, Polarization, Strong & Local January 10, 2019 UCI: General Strong Polarization
Martingales and Polarization [0,1]-Martingale: Sequence of r.v.s 𝑋 0 , 𝑋 1 , 𝑋 2 ,… 𝑋 𝑡 ∈ 0,1 , 𝔼 𝑋 𝑡+1 | 𝑋 0 ,…, 𝑋 𝑡 = 𝑋 𝑡 Well-studied in algorithms: [MotwaniRaghavan, §4.4], [MitzenmacherUpfal, §12] E.g., 𝑋 𝑡 = Expected approx. factor of algorithm given first 𝑡 random choices. Usually study “concentration” E.g. … whp 𝑋 final ∈[ 𝑋 0 ±𝜎] Today: Polarization: … whp 𝑋 final ∉ 𝜏,1−𝜏 ⇔ lim 𝑡→∞ 𝑋 𝑡 = 𝑁(𝑋 0 , 𝜎 2 ) ⇔ lim 𝑡→∞ 𝑋 𝑡 = Bern 𝑋 0 January 10, 2019 UCI: General Strong Polarization
UCI: General Strong Polarization Some examples 𝑋 𝑡+1 = 𝑋 𝑡 + 2 − 𝑡 2 w.p. 1 2 𝑋 𝑡 − 2 − 𝑡 2 w.p. 1 2 𝑋 𝑡+1 = 𝑋 𝑡 + 2 −𝑡 w.p. 1 2 𝑋 𝑡 − 2 −𝑡 w.p. 1 2 𝑋 𝑡+1 = 3 2 𝑋 𝑡 w.p. 1 2 if 𝑋 𝑡 ≤ 1 2 1 2 𝑋 𝑡 w.p. 1 2 if 𝑋 𝑡 ≤ 1 2 𝑋 𝑡+1 = 𝑋 𝑡 2 w.p. 1 2 if 𝑋 𝑡 ≤ 1 2 2 𝑋 𝑡 − 𝑋 𝑡 2 w.p. 1 2 if 𝑋 𝑡 ≤ 1 2 Converges! Uniform on [0,1] Polarizes (weakly)! Polarizes (strongly)! January 10, 2019 UCI: General Strong Polarization
UCI: General Strong Polarization Arikan Martingale Issues: Local behavior – well understood Challenge: Limiting behavior January 10, 2019 UCI: General Strong Polarization
Main Result: Definition and Theorem Strong Polarization: (informally) Pr 𝑋 𝑡 ∈(𝜏,1−𝜏) ≤𝜖 if 𝜏= 2 −𝜔(𝑡) and 𝜖= 2 −𝑂(𝑡) formally ∀𝛾>0 ∃𝛽<1,𝑐 𝑠.𝑡. ∀𝑡 Pr 𝑋 𝑡 ∈( 𝛾 𝑡 ,1− 𝛾 𝑡 ) ≤𝑐⋅ 𝛽 𝑡 Local Polarization: Variance in the middle: 𝑋 𝑡 ∈(𝜏,1−𝜏) ∀𝜏>0 ∃𝜎>0 𝑠.𝑡. ∀𝑡, 𝑋 𝑡 ∈ 𝜏,1−𝜏 ⇒𝑉𝑎𝑟 𝑋 𝑡+1 𝑋 𝑡 ≥𝜎 Suction at the ends: 𝑋 𝑡 ∉ 𝜏,1−𝜏 ∃𝜃>0, ∀𝑐<∞, ∃𝜏>0 𝑠.𝑡. 𝑋 𝑡 <𝜏⇒ Pr 𝑋 𝑡+1 < 𝑋 𝑡 𝑐 ≥𝜃 Theorem: Local Polarization ⇒ Strong Polarization. Both definitions qualitative! “low end” condition. Similar condition for high end January 10, 2019 UCI: General Strong Polarization
UCI: General Strong Polarization Proof (Idea): Step 1: The potential Φ 𝑡 ≝ min 𝑋 𝑡 , 1− 𝑋 𝑡 decreases by constant factor in expectation in each step. ⇒𝔼 Φ 𝑇 = exp −𝑇 ⇒ Pr 𝑋 𝑇 ≥ exp −𝑇/2 ≤ exp −𝑇/2 Step 2: Next 𝑇 time steps, 𝑋 𝑡 plummets whp 2.1: Say, If 𝑋 𝑡 ≤𝜏 then Pr 𝑋 𝑡+1 ≤ 𝑋 𝑡 100 ≥1/2. 2.2: Pr ∃𝑡∈ 𝑇,2𝑇 𝑠.𝑡. 𝑋 𝑡 >𝜏 ≤ 𝑋 𝑇 /𝜏 [Doob] 2.3: If above doesn’t happen 𝑋 2𝑇 < 5 −𝑇 whp QED January 10, 2019 UCI: General Strong Polarization
Part III: Polar Codes Encoding, Decoding, Martingale, Polarization January 10, 2019 UCI: General Strong Polarization
Lesson 0: Compression ⇒ Coding Defn: Linear Compression Scheme: 𝑀,𝐷 form compression scheme for bern 𝑝 𝑛 if Linear map 𝑀: 𝔽 2 𝑛 → 𝔽 2 𝑚 Pr 𝑍∼bern 𝑝 𝑛 𝐷 𝑀⋅𝑍 ≠𝑍 =𝑜 1 Hopefully 𝑚 𝑛 ≤𝐻 𝑝 +𝜖 Compression ⇒ Coding Let 𝑀 ⊥ be such that 𝑀⋅𝑀 ⊥ =0; Encoder: 𝑋↦ 𝑀 ⊥ ⋅𝑋 Error-Corrector: 𝑌= 𝑀 ⊥ ⋅𝑋+𝑍↦𝑌−𝐷 𝑀⋅𝑌 =𝑌−𝐷 𝑀⋅ 𝑀 ⊥ ⋅𝑋+𝑀.𝑍 = 𝑤.𝑝. 1−𝑜(1) 𝑀 ⊥ ⋅𝑋 𝑀 𝑀 ⊥ January 10, 2019 UCI: General Strong Polarization
Question: How to compress? Arikan’s key idea: Start with 2×2 “Polarization Transform”: 𝑈,𝑉 → 𝑈+𝑉,𝑉 Invertible – so does nothing? If 𝑈,𝑉 independent, then 𝑈+𝑉 “more random” than either 𝑉 | 𝑈+𝑉 “less random” than either Iterate (ignoring conditioning) End with bits that are almost random, or almost determined (by others). Output “totally random part” to get compression! January 10, 2019 UCI: General Strong Polarization
Iteration elaborated: Martingale: 𝑋 𝑡 ≝ conditional entropy of random output after 𝑡 iterations B A+B A C+D A+B+C+D D C+D C D B+D H G+H G F E+F E F+H E+F+G+H January 10, 2019 UCI: General Strong Polarization
The Polarization Butterfly 𝑃 𝑛 𝑈,𝑉 = 𝑃 𝑛 2 𝑈+𝑉 , 𝑃 𝑛 2 𝑉 Polarization ⇒∃𝑆⊆ 𝑛 𝐻 𝑊 𝑛 −𝑆 𝑊 𝑆 →0 𝑆 ≤ 𝐻 𝑝 +𝜖 ⋅𝑛 𝑍 1 𝑍 2 𝑍 𝑛 2 𝑍 𝑛 2 +1 𝑍 𝑛 2 +2 𝑍 𝑛 𝑍 1 + 𝑍 𝑛 2 +1 𝑍 2 + 𝑍 𝑛 2 +2 𝑍 𝑛 2 + 𝑍 𝑛 𝑍 𝑛 2 +1 𝑍 𝑛 2 +2 𝑍 𝑛 𝑊 1 𝑊 2 𝑊 𝑛 2 𝑊 𝑛 2 +1 𝑊 𝑛 2 +2 𝑊 𝑛 𝑈 Compression 𝐸 𝑍 = 𝑃 𝑛 𝑍 𝑆 Encoding time =𝑂 𝑛 log 𝑛 (given 𝑆) 𝑉 To be shown: 1. Decoding = 𝑂 𝑛 log 𝑛 2. Polarization! January 10, 2019 UCI: General Strong Polarization
The Polarization Butterfly: Decoding Decoding idea: Given 𝑆, 𝑊 𝑆 = 𝑃 𝑛 𝑈,𝑉 Compute 𝑈+𝑉∼Bern 2𝑝−2 𝑝 2 Determine 𝑉∼ ? 𝑍 1 𝑍 2 𝑍 𝑛 2 𝑍 𝑛 2 +1 𝑍 𝑛 2 +2 𝑍 𝑛 𝑍 1 + 𝑍 𝑛 2 +1 𝑊 1 𝑊 2 𝑊 𝑛 2 𝑊 𝑛 2 +1 𝑊 𝑛 2 +2 𝑊 𝑛 𝑍 2 + 𝑍 𝑛 2 +2 𝑉|𝑈+𝑉∼ 𝑖 Bern( 𝑟 𝑖 ) Key idea: Stronger induction! - non-identical product distribution 𝑍 𝑛 2 + 𝑍 𝑛 𝑍 𝑛 2 +1 Decoding Algorithm: Given 𝑆, 𝑊 𝑆 = 𝑃 𝑛 𝑈,𝑉 , 𝑝 1 ,…, 𝑝 𝑛 Compute 𝑞 1 ,.. 𝑞 𝑛 2 ← 𝑝 1 … 𝑝 𝑛 Determine 𝑈+𝑉=𝐷 𝑊 𝑆 + , 𝑞 1 … 𝑞 𝑛 2 Compute 𝑟 1 … 𝑟 𝑛 2 ← 𝑝 1 … 𝑝 𝑛 ;𝑈+𝑉 Determine 𝑉=𝐷 𝑊 𝑆 − , 𝑟 1 … 𝑟 𝑛 2 𝑍 𝑛 2 +2 𝑍 𝑛 January 10, 2019 UCI: General Strong Polarization
Polarization Martingale 𝑍 1 𝑍 2 𝑍 𝑛 2 𝑍 𝑛 2 +1 𝑍 𝑛 2 +2 𝑍 𝑛 𝑍 1 + 𝑍 𝑛 2 +1 𝑊 1 𝑊 2 𝑊 𝑛 2 𝑊 𝑛 2 +1 𝑊 𝑛 2 +2 𝑊 𝑛 Idea: Follow 𝑋 𝑡 = 𝐻 𝑍 𝑖,𝑡 𝑍 <𝑖,𝑡 for random 𝑖 𝑍 2 + 𝑍 𝑛 2 +2 𝑍 𝑖,0 𝑍 𝑖,1 … Z 𝑖,𝑡 … 𝑍 𝑖, log 𝑛 𝑍 𝑛 2 + 𝑍 𝑛 𝑍 𝑛 2 +1 𝑋 𝑡 Martingale! 𝑍 𝑛 2 +2 Polarizes? Strongly? ⇐ Locally? 𝑍 𝑛 January 10, 2019 UCI: General Strong Polarization
Local Polarization of Arikan Martingale Variance in the Middle: Roughly: 𝐻 𝑝 ,𝐻 𝑝 →(𝐻 2𝑝−2 𝑝 2 , 2𝐻 𝑝 −𝐻 2𝑝−2 𝑝 2 ) 𝑝∈ 𝜏,1−𝜏 ⇒2𝑝−2 𝑝 2 far from 𝑝 + continuity of 𝐻 ⋅ ⇒ 𝐻 2𝑝−2 𝑝 2 far from 𝐻(𝑝) Suction: High end:𝐻 1 2 −𝛾 →𝐻 1 2 − 𝛾 2 𝐻 1 2 −𝛾 =1 −Θ 𝛾 2 ⇒1− 𝛾 2 →1 −Θ 𝛾 4 Low end: 𝐻 𝑝 ≈𝑝 log 1 𝑝 ;2𝐻 𝑝 ≈2𝑝 log 1 𝑝 ; 𝐻 2𝑝−2 𝑝 2 ≈𝐻 2𝑝 ≈2𝑝 log 1 2𝑝 ≈2𝐻 𝑝 −2𝑝≈ 2− 1 log 1 𝑝 𝐻 𝑝 Dealing with conditioning – more work (lots of Markov) January 10, 2019 UCI: General Strong Polarization
UCI: General Strong Polarization “New contributions” So far: Reproduced old work (simpler proofs?) Main new technical contributions: Strong polarization of general transforms E.g. 𝑈,𝑉,𝑊 →(𝑈+𝑉,𝑉,𝑊)? Exponentially strong polarization [BGS’18] Suction at low end is very strong! Random 𝑘×𝑘 matrix yields 𝑋 𝑡+1 ≈ 𝑋 𝑡 𝑘 .99 whp Strong polarization of Markovian sources [GNS’19?] Separation of compression of known sources from unknown ones. January 10, 2019 UCI: General Strong Polarization
UCI: General Strong Polarization Conclusions Polarization: Important phenomenon associated with bounded random variables. Not always bad Needs better understanding! Recently also used in CS: [Lovett Meka] – implicit [Chattopadhyay-Hatami-Hosseini-Lovett] - explicit January 10, 2019 UCI: General Strong Polarization
UCI: General Strong Polarization Thank You! January 10, 2019 UCI: General Strong Polarization