General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) August 11, 2017 IITB: General Strong Polarization
Shannon and Channel Capacity Shannon [‘48]: For every (noisy) channel of communication, ∃ capacity 𝐶 s.t. communication at rate 𝑅<𝐶 is possible and 𝑅>𝐶 is not. Needs lot of formalization – omitted. Example: Acts independently on bits Capacity =1 – 𝐻(𝑝) ; 𝐻(𝑝) = binary entropy! 𝐻 𝑝 =𝑝⋅ log 1 𝑝 + 1−𝑝 ⋅ log 1 1−𝑝 𝑋∈ 𝔽 2 𝑋 w.p. 1−𝑝 1−𝑋 w.p. 𝑝 𝐵𝑆𝐶(𝑝) August 11, 2017 IITB: General Strong Polarization
“Achieving” Channel Capacity Commonly used phrase. Many mathematical interpretations. Some possibilities: How small can 𝑛 be? Get 𝑅>𝐶−𝜖 with polytime algorithms? [Luby et al.’95] Make running time poly 1 𝜖 ? Open till 2008 Arikan’08: Invented “Polar Codes” … Final resolution: Guruswami+Xia’13, Hassani+Alishahi+Urbanke’13 – Strong analysis Shannon ‘48 : 𝑛=𝑂 𝐶−𝑅 −2 Forney ‘66 :time =𝑝𝑜𝑙𝑦 𝑛, 2 1 𝜖 2 August 11, 2017 IITB: General Strong Polarization
Context for today’s talk [Arikan’08]: General class of codes. [GX’13, HAU’13]: One specific code within. Why? Bottleneck: Strong Polarization Arikan et al.: General class polarizes [GX,HAU]: Specific code polarizes strongly! This work/talk: Introduce Local Polarization Local ⇒ Strong Polarization Thm: All known codes that polarize also polarize strongly! August 11, 2017 IITB: General Strong Polarization
IITB: General Strong Polarization This talk: What are Polar Codes “Local vs. Global” (Strong) Polarization Analysis of Polarization August 11, 2017 IITB: General Strong Polarization
Lesson 0: Compression ⇒ Coding Linear Compression Scheme: 𝐻,𝐷 for compression scheme for bern 𝑝 𝑛 if Linear map 𝐻: 𝔽 2 𝑛 → 𝔽 2 𝑚 Pr 𝑍∼bern 𝑝 𝑛 𝐷 𝐻⋅𝑍 ≠𝑍 ≤𝜖 Hopefully 𝑚 𝑛 →𝐻(𝑝) Compression ⇒ Coding Let 𝐻 ⊥ be such that 𝐻⋅𝐻 ⊥ =0; Encoder: 𝑋↦ 𝐻 ⊥ ⋅𝑋 Error-Corrector: 𝑌= 𝐻 ⊥ ⋅𝑋+𝜂↦𝑌−𝐷 𝐻⋅𝑌 = 𝑤.𝑝. 1−𝜖 𝐻 ⊥ ⋅𝑋 August 11, 2017 IITB: General Strong Polarization
Question: How to compress? Arikan’s key idea: Start with 2×2 “Polarization Transform”: 𝑈,𝑉 → 𝑈+𝑉,𝑉 Invertible – so does nothing? If 𝑈,𝑉 independent, then 𝑈+𝑉 “more random” than either 𝑉 | 𝑈+𝑉 “less random” than either Iterate (ignoring conditioning) End with bits that are completely random, or completed determined (by others). Output “totally random part” to get compression! August 11, 2017 IITB: General Strong Polarization
Information Theory Basics Entropy: Defn: for random variable 𝑋∈Ω, 𝐻 𝑋 = 𝑥∈Ω 𝑝 𝑥 log 1 𝑝 𝑥 where 𝑝 𝑥 = Pr 𝑋=𝑥 Bounds: 0≤𝐻 𝑋 ≤ log Ω Conditional Entropy Defn: for jointly distributed 𝑋,𝑌 𝐻 𝑋 𝑌 = 𝑦 𝑝 𝑦 𝐻(𝑋|𝑌=𝑦) Chain Rule: 𝐻 𝑋,𝑌 =𝐻 𝑋 +𝐻(𝑌|𝑋) Conditioning does not increase entropy: 𝐻 𝑋 𝑌 ≤𝐻(𝑋) Equality ⇔ Independence: 𝐻 𝑋 𝑌 =𝐻 𝑋 ⇔ 𝑋⊥𝑌 August 11, 2017 IITB: General Strong Polarization
Iteration by Example: 𝐴,𝐵,𝐶,𝐷 → 𝐴+𝐵+𝐶+𝐷,𝐵+𝐷,𝐶+𝐷, 𝐷 𝐸,𝐹,𝐺,𝐻 → 𝐸+𝐹+𝐺+𝐻,𝐹+𝐻,𝐺+𝐻,𝐻 𝐵+𝐷 𝐹+𝐻 𝐵+𝐷+𝐹+𝐻 𝐻 𝐵+𝐷 𝐴+𝐵+𝐶+𝐷) 𝐻 𝐵+𝐷+𝐹+𝐻| 𝐴+𝐵+𝐶+𝐷, 𝐸+𝐹+𝐺+𝐻 𝐻 𝐹+𝐻| 𝐴+𝐵+𝐶+𝐷, 𝐸+𝐹+𝐺+𝐻. 𝐵+𝐷+𝐹+𝐻 𝐻 𝐹+𝐻 𝐸+𝐹+𝐺+𝐻) August 11, 2017 IITB: General Strong Polarization
IITB: General Strong Polarization Algorithmics Summary: Iterating 𝑡 times yields 𝑛×𝑛-polarizing transform for 𝑛= 2 𝑡 : (𝑍 1 ,…, 𝑍 𝑛 )→( 𝑊 1 ,…, 𝑊 𝑛 ) → 𝑊 𝑆 ( 𝑆 ≈𝐻 𝑝 ⋅𝑛) Encoding/Compression: Clearly poly time. Decoding/Decompression: “Successive cancellation decoder”: implementable in poly time. Can compute Pr 𝑍∼bern(𝑝) 𝑊 𝑖 =1 | 𝑊 <𝑖 recursively Can reduce both run times to 𝑂(𝑛 log 𝑛) Only barrier: Determining 𝑆 Resolved by [Tal-Vardy’13] August 11, 2017 IITB: General Strong Polarization
Analysis: The Polarization martingale Martingale: Sequence of r.v., 𝑋 0 , 𝑋 1 ,… ∀𝑖, 𝑥 0 ,…, 𝑥 𝑖−1 , 𝔼 𝑋 𝑖 | 𝑥 0 ,…, 𝑥 𝑖−1 = 𝑥 𝑖−1 Polarization martingale: 𝑋 𝑖 = Conditional entropy of random output at 𝑖th stage. Why martingale? At 𝑖th stage two inputs take (𝑈,𝐴); (𝑉,𝐵) Output = (𝑈+𝑉, A∪𝐵) and 𝑉, 𝐴∪𝐵∪𝑈+𝑉 Input entropies = 𝑥 𝑖−1 (𝐻 𝑈 𝐴 =𝐻 𝑉 𝐵 = 𝑥 𝑖−1 ) ⇒ Output entropies average to 𝑥 𝑖−1 𝐻 𝑈+𝑉 𝐴,𝐵 +𝐻 𝑉 𝐴,𝐵,𝑈+𝑉 =2 𝑥 𝑖−1 August 11, 2017 IITB: General Strong Polarization
IITB: General Strong Polarization Quantitative issues How “quickly” does this martingale converge? (𝜆) How “well” does it converge? 𝜖 Formally, want: Pr 𝑋 𝑡 ∈ 𝜆,1−𝜆 ≤𝜖 𝜖: Gives gap to capacity 𝜆: Governs decoding failure ≈𝑛⋅ 𝜆 Reminder: 𝑛= 2 𝑡 For useable code, need: 𝜆≪ 4 −𝑡 For poly 1 𝜖 block length, need: 𝜖≤ 𝛼 𝑡 , 𝛼<1 August 11, 2017 IITB: General Strong Polarization
Regular and Strong Polarization (Regular) Polarization: ∀𝛾>0, Lim 𝑡→∞ Pr 𝑋 𝑡 ∈ 𝛾 𝑡 ,1− 𝛾 𝑡 →0. Strong Polarization: ∀𝛾>0 ∃𝛼>0 ∀𝑡, Pr 𝑋 𝑡 ∈ 𝛾 𝑡 , 1− 𝛾 𝑡 ≤ 𝛼 𝑡 Both “global” properties (large 𝑡 behavior) Construction yields: Local Property How does 𝑋 𝑖 relate to 𝑋 𝑖−1 ? Local vs. Global Relationship? Regular polarization well understood. Strong understood only 𝑈,𝑉 ↦(𝑈+𝑉,𝑉) What about 𝑈,𝑉,𝑊 ↦ 𝑈+𝑉,𝑉,𝑊 ?, more generally? August 11, 2017 IITB: General Strong Polarization
IITB: General Strong Polarization General Polarization Basic underlying ingredient 𝑀∈ 𝔽 2 𝑘×𝑘 Local polarization step: Pick 𝑈 𝑗 , 𝐴 𝑗 𝑗∈[𝑘] i.i.d. One step: generate 𝐿 1 ,…, 𝐿 𝑘 =𝑀⋅ 𝑈 1 ,…, 𝑈 𝑘 𝑋 𝑖−1 =𝐻 𝑈 𝑗 | ∪ ℓ 𝐴 ℓ , 𝑈 <𝑗 =𝐻 𝑈 𝑗 | 𝐴 𝑗 𝑋 𝑖 =𝐻 𝐿 𝑗 | ∪ ℓ 𝐴 ℓ , 𝐿 <𝑗 Examples 𝐻 2 = 1 1 1 0 ; 𝐻 3 ′ = 1 1 0 1 0 0 0 0 1 Which matrices polarize? Strongly? August 11, 2017 IITB: General Strong Polarization
IITB: General Strong Polarization Mixing matrices Observation 1: Identity matrix does not polarize. Observation 1’: Upper triangular matrix does not polarize. Observation 2: (Row) permutation of non-polarizing matrix does not polarize! Corresponds to permuting 𝑈 1 ,…, 𝑈 𝑘 Definition: 𝑀 mixing if no row permutation is upper-triangular. [KSU]: 𝑀 mixing ⇒ martingale polarizes. [Our main theorem]: 𝑀 mixing ⇒ martingale polarizes strongly. August 11, 2017 IITB: General Strong Polarization
Key Concept: Local (Strong) Polarization Martingale 𝑋 0 , 𝑋 1 ,… locally polarizes if it exhibits Variance in the middle: ∀𝜏>0 ∃𝜎>0, Var 𝑋 𝑖 𝑋 𝑖−1 ∈ 𝜏,1−𝜏 >𝜎 Suction at the end: ∃𝜃>0 ∀𝑐<∞, ∃𝜏>0 Pr 𝑋 𝑖 < 𝑋 𝑖−1 𝑐 | 𝑋 𝑖−1 ≤𝜏 ≥𝜃 (similarly with 1 − 𝑋 𝑖 ) Definition local: compares 𝑋 𝑖 with 𝑋 𝑖−1 Implications global: Thm 1: Local Polarization ⇒ Strong Polarization. Thm 2: Mixing matrices ⇒ Local Polarization. August 11, 2017 IITB: General Strong Polarization
Mixing Matrices ⇒ Local Polarization 2×2 case: Ignoring conditioning, we have 𝐻 𝑝 → 𝐻 2𝑝 1−𝑝 w.p. 1 2 2𝐻 𝑝 −𝐻 2𝑝 1−𝑝 w.p. 1 2 Variance, Suction ⇐ Taylor series for log . … 𝑘×𝑘 case: Reduction to 2×2 case: 1 2 probs. →∼ 1 𝑘 August 11, 2017 IITB: General Strong Polarization
Local ⇒ (Global) Strong Polarization Step 1: Medium Polarization: ∃𝛼,𝛽<1 Pr 𝑋 𝑡 ∈ 𝛽 𝑡 ,1− 𝛽 𝑡 ≤ 𝛼 𝑡 Proof: Potential Φ 𝑡 =𝔼 min 𝑋 𝑡 , 1− 𝑋 𝑡 Variance+Suction ⇒ ∃𝜂<1 s.t. Φ 𝑡 ≤𝜂⋅ 𝜙 𝑡−1 (Aside: Why 𝑋 𝑡 ? Clearly 𝑋 𝑡 won’t work. And 𝑋 𝑡 2 increases!) ⇒ Φ 𝑡 ≤ 𝜂 𝑡 ⇒ Pr 𝑋 𝑡 ∈ 𝛽 𝑡 ,1− 𝛽 𝑡 ≤ 𝜂 𝛽 𝑡 August 11, 2017 IITB: General Strong Polarization
Local ⇒ (Global) Strong Polarization Step 1: Medium Polarization: ∃𝛼,𝛽<1 Pr 𝑋 𝑡 ∈ 𝛽 𝑡 ,1− 𝛽 𝑡 ≤ 𝛼 𝑡 Step 2: Medium Polarization + 𝑡-more steps ⇒ Strong Polarization: Proof: Assume 𝑋 0 ≤ 𝛼 𝑡 ; Want 𝑋 𝑡 ≤ 𝛾 𝑡 w.p. 1 −𝑂 𝛼 1 𝑡 Pick 𝑐 large & 𝜏 s.t. Pr 𝑋 𝑖 < 𝑋 𝑖−1 𝑐 | 𝑋 𝑖−1 ≤𝜏 ≥𝜃 2.1: Pr ∃𝑖 𝑠.𝑡. 𝑋 𝑖 ≥𝜏 ≤ 𝜏 −1 ⋅ 𝛼 𝑡 (Doob’s Inequality) 2.2: Let 𝑌 𝑖 = log 𝑋 𝑖 ; Assuming 𝑋 𝑖 ≤𝜏 for all 𝑖 𝑌 𝑖 − 𝑌 𝑖−1 ≤− log 𝑐 w.p. ≥𝜃; & ≥𝑘 w.p. ≤ 2 −𝑘 . 𝔼xp 𝑌 𝑡 ≤−𝑡 log 𝑐 𝜃 +1 ≤2𝑡⋅ log 𝛾 Concentration! Pr 𝑌 𝑡 ≥𝑡⋅ log 𝛾 ≤ 𝛼 2 𝑡 QED! August 11, 2017 IITB: General Strong Polarization
IITB: General Strong Polarization Conclusion Simple General Analysis of Strong Polarization But main reason to study general polarization: Maybe some matrices polarize even faster! This analysis does not get us there. But hopefully following the (simpler) steps in specific cases can lead to improvements. August 11, 2017 IITB: General Strong Polarization
IITB: General Strong Polarization Thank You! August 11, 2017 IITB: General Strong Polarization