General Strong Polarization

Slides:

Advertisements

Similar presentations

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Advertisements

Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)

Information Theory EE322 Al-Sanie.

Gillat Kol (IAS) joint work with Ran Raz (Weizmann + IAS) Interactive Channel Capacity.

Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.

1 Introduction to Quantum Information Processing QIC 710 / CS 667 / PH 767 / CO 681 / AM 871 Richard Cleve DC 2117 / RAC 2211 Lecture.

1 Finite-Length Scaling and Error Floors Abdelaziz Amraoui Andrea Montanari Ruediger Urbanke Tom Richardson.

Chapter 6 Information Theory

Fundamental limits in Information Theory Chapter 10 :

Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research

Computability and Complexity 10-1 Computability and Complexity Andrei Bulatov Gödel’s Incompleteness Theorem.

Division of Engineering and Applied Sciences DIMACS-04 Iterative Timing Recovery Aleksandar Kavčić Division of Engineering and Applied Sciences Harvard.

CSI Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability.

Compression with Side Information using Turbo Codes Anne Aaron and Bernd Girod Information Systems Laboratory Stanford University Data Compression Conference.

The Role of Specialization in LDPC Codes Jeremy Thorpe Pizza Meeting Talk 2/12/03.

Ger man Aerospace Center Gothenburg, April, 2007 Coding Schemes for Crisscross Error Patterns Simon Plass, Gerd Richter, and A.J. Han Vinck.

Channel Polarization and Polar Codes

Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.

INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.

1 Introduction to Quantum Information Processing QIC 710 / CS 667 / PH 767 / CO 681 / AM 871 Richard Cleve DC 2117 Lecture 16 (2011)

Channel Capacity.

§2 Discrete memoryless channels and their capacity function

Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.

Information Theory Linear Block Codes Jalal Al Roumy.

Part 1: Overview of Low Density Parity Check(LDPC) codes.

Image Processing Architecture, © Oleh TretiakPage 1Lecture 4 ECE-C490 Winter 2004 Image Processing Architecture Lecture 4, 1/20/2004 Principles.

Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.

Compression for Fixed-Width Memories Ori Rottenstriech, Amit Berman, Yuval Cassuto and Isaac Keslassy Technion, Israel.

The Message Passing Communication Model David Woodruff IBM Almaden.

Search Engines WS 2009 / 2010 Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University of Freiburg Lecture.

Image Processing Architecture, © Oleh TretiakPage 1Lecture 5 ECEC 453 Image Processing Architecture Lecture 5, 1/22/2004 Rate-Distortion Theory,

Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.

Imperfectly Shared Randomness

Information Complexity Lower Bounds

Distributed Compression For Still Images

Richard Cleve DC 2117 Introduction to Quantum Information Processing CS 667 / PH 767 / CO 681 / AM 871 Lecture 16 (2009) Richard.

Markov Chains and Mixing Times

Locality in Coding Theory II: LTCs

New Characterizations in Turnstile Streams with Applications

Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.

Locality in Coding Theory

Sampling of min-entropy relative to quantum knowledge Robert König in collaboration with Renato Renner TexPoint fonts used in EMF. Read the TexPoint.

Streaming & sampling.

Communication Amid Uncertainty

Fei Li Jinjun Xiong University of Wisconsin-Madison

Error-Correcting Codes:

Context-based Data Compression

Communication Amid Uncertainty

Local Decoding and Testing Polynomials over Grids

Algebraic Codes and Invariance

General Strong Polarization

COT 5611 Operating Systems Design Principles Spring 2012

Cryptography Lecture 19.

COT 5611 Operating Systems Design Principles Spring 2014

Communication Amid Uncertainty

Matrix Martingales in Randomized Numerical Linear Algebra

Locally Decodable Codes from Lifting

The Curve Merger (Dvir & Widgerson, 2008)

Uncertain Compression

General Strong Polarization

Distributed Compression For Binary Symetric Channels

Locality in Coding Theory II: LTCs

Copyright © Cengage Learning. All rights reserved.

The Byzantine Secretary Problem

Imperfectly Shared Randomness

Irregular Structured LDPC Codes and Structured Puncturing

Communication Amid Uncertainty

Richard Cleve DC 2117 Introduction to Quantum Information Processing CS 667 / PH 767 / CO 681 / AM 871 Lecture 16 (2009) Richard.

General Strong Polarization

General Strong Polarization

Presentation transcript:

General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) August 11, 2017 IITB: General Strong Polarization

Shannon and Channel Capacity Shannon [‘48]: For every (noisy) channel of communication, ∃ capacity 𝐶 s.t. communication at rate 𝑅<𝐶 is possible and 𝑅>𝐶 is not. Needs lot of formalization – omitted. Example: Acts independently on bits Capacity =1 – 𝐻(𝑝) ; 𝐻(𝑝) = binary entropy! 𝐻 𝑝 =𝑝⋅ log 1 𝑝 + 1−𝑝 ⋅ log 1 1−𝑝 𝑋∈ 𝔽 2 𝑋 w.p. 1−𝑝 1−𝑋 w.p. 𝑝 𝐵𝑆𝐶(𝑝) August 11, 2017 IITB: General Strong Polarization

“Achieving” Channel Capacity Commonly used phrase. Many mathematical interpretations. Some possibilities: How small can 𝑛 be? Get 𝑅>𝐶−𝜖 with polytime algorithms? [Luby et al.’95] Make running time poly 1 𝜖 ? Open till 2008 Arikan’08: Invented “Polar Codes” … Final resolution: Guruswami+Xia’13, Hassani+Alishahi+Urbanke’13 – Strong analysis Shannon ‘48 : 𝑛=𝑂 𝐶−𝑅 −2 Forney ‘66 :time =𝑝𝑜𝑙𝑦 𝑛, 2 1 𝜖 2 August 11, 2017 IITB: General Strong Polarization

Context for today’s talk [Arikan’08]: General class of codes. [GX’13, HAU’13]: One specific code within. Why? Bottleneck: Strong Polarization Arikan et al.: General class polarizes [GX,HAU]: Specific code polarizes strongly! This work/talk: Introduce Local Polarization Local ⇒ Strong Polarization Thm: All known codes that polarize also polarize strongly! August 11, 2017 IITB: General Strong Polarization

IITB: General Strong Polarization This talk: What are Polar Codes “Local vs. Global” (Strong) Polarization Analysis of Polarization August 11, 2017 IITB: General Strong Polarization

Lesson 0: Compression ⇒ Coding Linear Compression Scheme: 𝐻,𝐷 for compression scheme for bern 𝑝 𝑛 if Linear map 𝐻: 𝔽 2 𝑛 → 𝔽 2 𝑚 Pr 𝑍∼bern 𝑝 𝑛 𝐷 𝐻⋅𝑍 ≠𝑍 ≤𝜖 Hopefully 𝑚 𝑛 →𝐻(𝑝) Compression ⇒ Coding Let 𝐻 ⊥ be such that 𝐻⋅𝐻 ⊥ =0; Encoder: 𝑋↦ 𝐻 ⊥ ⋅𝑋 Error-Corrector: 𝑌= 𝐻 ⊥ ⋅𝑋+𝜂↦𝑌−𝐷 𝐻⋅𝑌 = 𝑤.𝑝. 1−𝜖 𝐻 ⊥ ⋅𝑋 August 11, 2017 IITB: General Strong Polarization

Question: How to compress? Arikan’s key idea: Start with 2×2 “Polarization Transform”: 𝑈,𝑉 → 𝑈+𝑉,𝑉 Invertible – so does nothing? If 𝑈,𝑉 independent, then 𝑈+𝑉 “more random” than either 𝑉 | 𝑈+𝑉 “less random” than either Iterate (ignoring conditioning) End with bits that are completely random, or completed determined (by others). Output “totally random part” to get compression! August 11, 2017 IITB: General Strong Polarization

Information Theory Basics Entropy: Defn: for random variable 𝑋∈Ω, 𝐻 𝑋 = 𝑥∈Ω 𝑝 𝑥 log 1 𝑝 𝑥 where 𝑝 𝑥 = Pr 𝑋=𝑥 Bounds: 0≤𝐻 𝑋 ≤ log Ω Conditional Entropy Defn: for jointly distributed 𝑋,𝑌 𝐻 𝑋 𝑌 = 𝑦 𝑝 𝑦 𝐻(𝑋|𝑌=𝑦) Chain Rule: 𝐻 𝑋,𝑌 =𝐻 𝑋 +𝐻(𝑌|𝑋) Conditioning does not increase entropy: 𝐻 𝑋 𝑌 ≤𝐻(𝑋) Equality ⇔ Independence: 𝐻 𝑋 𝑌 =𝐻 𝑋 ⇔ 𝑋⊥𝑌 August 11, 2017 IITB: General Strong Polarization

Iteration by Example: 𝐴,𝐵,𝐶,𝐷 → 𝐴+𝐵+𝐶+𝐷,𝐵+𝐷,𝐶+𝐷, 𝐷 𝐸,𝐹,𝐺,𝐻 → 𝐸+𝐹+𝐺+𝐻,𝐹+𝐻,𝐺+𝐻,𝐻 𝐵+𝐷 𝐹+𝐻 𝐵+𝐷+𝐹+𝐻 𝐻 𝐵+𝐷 𝐴+𝐵+𝐶+𝐷) 𝐻 𝐵+𝐷+𝐹+𝐻| 𝐴+𝐵+𝐶+𝐷, 𝐸+𝐹+𝐺+𝐻 𝐻 𝐹+𝐻| 𝐴+𝐵+𝐶+𝐷, 𝐸+𝐹+𝐺+𝐻. 𝐵+𝐷+𝐹+𝐻 𝐻 𝐹+𝐻 𝐸+𝐹+𝐺+𝐻) August 11, 2017 IITB: General Strong Polarization

IITB: General Strong Polarization Algorithmics Summary: Iterating 𝑡 times yields 𝑛×𝑛-polarizing transform for 𝑛= 2 𝑡 : (𝑍 1 ,…, 𝑍 𝑛 )→( 𝑊 1 ,…, 𝑊 𝑛 ) → 𝑊 𝑆 ( 𝑆 ≈𝐻 𝑝 ⋅𝑛) Encoding/Compression: Clearly poly time. Decoding/Decompression: “Successive cancellation decoder”: implementable in poly time. Can compute Pr 𝑍∼bern(𝑝) 𝑊 𝑖 =1 | 𝑊 <𝑖 recursively Can reduce both run times to 𝑂(𝑛 log 𝑛) Only barrier: Determining 𝑆 Resolved by [Tal-Vardy’13] August 11, 2017 IITB: General Strong Polarization

Analysis: The Polarization martingale Martingale: Sequence of r.v., 𝑋 0 , 𝑋 1 ,… ∀𝑖, 𝑥 0 ,…, 𝑥 𝑖−1 , 𝔼 𝑋 𝑖 | 𝑥 0 ,…, 𝑥 𝑖−1 = 𝑥 𝑖−1 Polarization martingale: 𝑋 𝑖 = Conditional entropy of random output at 𝑖th stage. Why martingale? At 𝑖th stage two inputs take (𝑈,𝐴); (𝑉,𝐵) Output = (𝑈+𝑉, A∪𝐵) and 𝑉, 𝐴∪𝐵∪𝑈+𝑉 Input entropies = 𝑥 𝑖−1 (𝐻 𝑈 𝐴 =𝐻 𝑉 𝐵 = 𝑥 𝑖−1 ) ⇒ Output entropies average to 𝑥 𝑖−1 𝐻 𝑈+𝑉 𝐴,𝐵 +𝐻 𝑉 𝐴,𝐵,𝑈+𝑉 =2 𝑥 𝑖−1 August 11, 2017 IITB: General Strong Polarization

IITB: General Strong Polarization Quantitative issues How “quickly” does this martingale converge? (𝜆) How “well” does it converge? 𝜖 Formally, want: Pr 𝑋 𝑡 ∈ 𝜆,1−𝜆 ≤𝜖 𝜖: Gives gap to capacity 𝜆: Governs decoding failure ≈𝑛⋅ 𝜆 Reminder: 𝑛= 2 𝑡 For useable code, need: 𝜆≪ 4 −𝑡 For poly 1 𝜖 block length, need: 𝜖≤ 𝛼 𝑡 , 𝛼<1 August 11, 2017 IITB: General Strong Polarization

Regular and Strong Polarization (Regular) Polarization: ∀𝛾>0, Lim 𝑡→∞ Pr 𝑋 𝑡 ∈ 𝛾 𝑡 ,1− 𝛾 𝑡 →0. Strong Polarization: ∀𝛾>0 ∃𝛼>0 ∀𝑡, Pr 𝑋 𝑡 ∈ 𝛾 𝑡 , 1− 𝛾 𝑡 ≤ 𝛼 𝑡 Both “global” properties (large 𝑡 behavior) Construction yields: Local Property How does 𝑋 𝑖 relate to 𝑋 𝑖−1 ? Local vs. Global Relationship? Regular polarization well understood. Strong understood only 𝑈,𝑉 ↦(𝑈+𝑉,𝑉) What about 𝑈,𝑉,𝑊 ↦ 𝑈+𝑉,𝑉,𝑊 ?, more generally? August 11, 2017 IITB: General Strong Polarization

IITB: General Strong Polarization General Polarization Basic underlying ingredient 𝑀∈ 𝔽 2 𝑘×𝑘 Local polarization step: Pick 𝑈 𝑗 , 𝐴 𝑗 𝑗∈[𝑘] i.i.d. One step: generate 𝐿 1 ,…, 𝐿 𝑘 =𝑀⋅ 𝑈 1 ,…, 𝑈 𝑘 𝑋 𝑖−1 =𝐻 𝑈 𝑗 | ∪ ℓ 𝐴 ℓ , 𝑈 <𝑗 =𝐻 𝑈 𝑗 | 𝐴 𝑗 𝑋 𝑖 =𝐻 𝐿 𝑗 | ∪ ℓ 𝐴 ℓ , 𝐿 <𝑗 Examples 𝐻 2 = 1 1 1 0 ; 𝐻 3 ′ = 1 1 0 1 0 0 0 0 1 Which matrices polarize? Strongly? August 11, 2017 IITB: General Strong Polarization

IITB: General Strong Polarization Mixing matrices Observation 1: Identity matrix does not polarize. Observation 1’: Upper triangular matrix does not polarize. Observation 2: (Row) permutation of non-polarizing matrix does not polarize! Corresponds to permuting 𝑈 1 ,…, 𝑈 𝑘 Definition: 𝑀 mixing if no row permutation is upper-triangular. [KSU]: 𝑀 mixing ⇒ martingale polarizes. [Our main theorem]: 𝑀 mixing ⇒ martingale polarizes strongly. August 11, 2017 IITB: General Strong Polarization

Key Concept: Local (Strong) Polarization Martingale 𝑋 0 , 𝑋 1 ,… locally polarizes if it exhibits Variance in the middle: ∀𝜏>0 ∃𝜎>0, Var 𝑋 𝑖 𝑋 𝑖−1 ∈ 𝜏,1−𝜏 >𝜎 Suction at the end: ∃𝜃>0 ∀𝑐<∞, ∃𝜏>0 Pr 𝑋 𝑖 < 𝑋 𝑖−1 𝑐 | 𝑋 𝑖−1 ≤𝜏 ≥𝜃 (similarly with 1 − 𝑋 𝑖 ) Definition local: compares 𝑋 𝑖 with 𝑋 𝑖−1 Implications global: Thm 1: Local Polarization ⇒ Strong Polarization. Thm 2: Mixing matrices ⇒ Local Polarization. August 11, 2017 IITB: General Strong Polarization

Mixing Matrices ⇒ Local Polarization 2×2 case: Ignoring conditioning, we have 𝐻 𝑝 → 𝐻 2𝑝 1−𝑝 w.p. 1 2 2𝐻 𝑝 −𝐻 2𝑝 1−𝑝 w.p. 1 2 Variance, Suction ⇐ Taylor series for log . … 𝑘×𝑘 case: Reduction to 2×2 case: 1 2 probs. →∼ 1 𝑘 August 11, 2017 IITB: General Strong Polarization

Local ⇒ (Global) Strong Polarization Step 1: Medium Polarization: ∃𝛼,𝛽<1 Pr 𝑋 𝑡 ∈ 𝛽 𝑡 ,1− 𝛽 𝑡 ≤ 𝛼 𝑡 Proof: Potential Φ 𝑡 =𝔼 min 𝑋 𝑡 , 1− 𝑋 𝑡 Variance+Suction ⇒ ∃𝜂<1 s.t. Φ 𝑡 ≤𝜂⋅ 𝜙 𝑡−1 (Aside: Why 𝑋 𝑡 ? Clearly 𝑋 𝑡 won’t work. And 𝑋 𝑡 2 increases!) ⇒ Φ 𝑡 ≤ 𝜂 𝑡 ⇒ Pr 𝑋 𝑡 ∈ 𝛽 𝑡 ,1− 𝛽 𝑡 ≤ 𝜂 𝛽 𝑡 August 11, 2017 IITB: General Strong Polarization

Local ⇒ (Global) Strong Polarization Step 1: Medium Polarization: ∃𝛼,𝛽<1 Pr 𝑋 𝑡 ∈ 𝛽 𝑡 ,1− 𝛽 𝑡 ≤ 𝛼 𝑡 Step 2: Medium Polarization + 𝑡-more steps ⇒ Strong Polarization: Proof: Assume 𝑋 0 ≤ 𝛼 𝑡 ; Want 𝑋 𝑡 ≤ 𝛾 𝑡 w.p. 1 −𝑂 𝛼 1 𝑡 Pick 𝑐 large & 𝜏 s.t. Pr 𝑋 𝑖 < 𝑋 𝑖−1 𝑐 | 𝑋 𝑖−1 ≤𝜏 ≥𝜃 2.1: Pr ∃𝑖 𝑠.𝑡. 𝑋 𝑖 ≥𝜏 ≤ 𝜏 −1 ⋅ 𝛼 𝑡 (Doob’s Inequality) 2.2: Let 𝑌 𝑖 = log 𝑋 𝑖 ; Assuming 𝑋 𝑖 ≤𝜏 for all 𝑖 𝑌 𝑖 − 𝑌 𝑖−1 ≤− log 𝑐 w.p. ≥𝜃; & ≥𝑘 w.p. ≤ 2 −𝑘 . 𝔼xp 𝑌 𝑡 ≤−𝑡 log 𝑐 𝜃 +1 ≤2𝑡⋅ log 𝛾 Concentration! Pr 𝑌 𝑡 ≥𝑡⋅ log 𝛾 ≤ 𝛼 2 𝑡 QED! August 11, 2017 IITB: General Strong Polarization

IITB: General Strong Polarization Conclusion Simple General Analysis of Strong Polarization But main reason to study general polarization: Maybe some matrices polarize even faster! This analysis does not get us there. But hopefully following the (simpler) steps in specific cases can lead to improvements. August 11, 2017 IITB: General Strong Polarization

IITB: General Strong Polarization Thank You! August 11, 2017 IITB: General Strong Polarization