Mathematical Theories of Communication: Old and New Madhu Sudan Harvard University September 26, 2017 HLF: Mathematical Theory Communication
HLF: Mathematical Theory Communication Communication = What? Today: Digital Communication i.e., Communicating bits/bytes/data … As opposed to “radio waves for sound” Challenge? Communication can be … … expensive: (e.g., satellite with bounded energy to communicate) … noisy: (bits flipped, DVD scratched) … interactive: (complicates above further) … contextual (assumes you are running specific OS, IPvX) September 26, 2017 HLF: Mathematical Theory Communication
HLF: Mathematical Theory Communication Theory = Why? Why build theory and not just use whatever works? Ad-hoc solutions work today, but will they work tomorrow? Formulating problem allows comparison of solutions! And some creative solutions might be surprisingly better than naïve! Better understanding of limits. What can not be achieved … September 26, 2017 HLF: Mathematical Theory Communication
HLF: Mathematical Theory Communication Old? New? Why new theories? Were old ones not good enough? Quite the opposite: Old ones were too good. They provided right framework and took us from ground level to “orbit”! And now we can explore all of “space” … but new possibilities leads to new challenges. September 26, 2017 HLF: Mathematical Theory Communication
Reliable Communication? Problem from the 1940s: Advent of digital age. Communication media are always noisy But digital information less tolerant to noise! We are not ready We are now ready Alice Bob January 5, 2016 IISER: Reliable Meaningful Communication
Reliability by Repetition Can repeat (every letter of) message to improve reliability: WWW EEE AAA RRR EEE NNN OOO WWW … WXW EEA ARA SSR EEE NMN OOP WWW … Elementary Reasoning: ↑ repetitions ⇒ ↓ Prob. decoding error; but still +ve ↑ length of transmission ⇒ ↑ expected # errors. Combining above: Rate of repetition coding →0 as length of transmission increases. Belief (pre1940): Rate of any scheme →0 as length →∞ January 5, 2016 IISER: Reliable Meaningful Communication
IISER: Reliable Meaningful Communication Shannon’s Theory [1948] Sender “Encodes” before transmitting Receiver “Decodes” after receiving Encoder/Decoder arbitrary functions. 𝐸: 0,1 𝑘 → 0,1 𝑛 𝐷: 0,1 𝑛 → 0,1 𝑘 Rate = 𝑘 𝑛 ; Requirement: 𝑚=𝐷(𝐸 𝑚 +error) w. high prob. What are the best 𝐸,𝐷 (with highest Rate)? Alice Bob Encoder Decoder January 5, 2016 IISER: Reliable Meaningful Communication
IISER: Reliable Meaningful Communication Shannon’s Theorem If every bit is flipped with probability 𝑝 Rate →1−𝐻(𝑝) can be achieved. 𝐻 𝑝 ≜𝑝 log 2 1 𝑝 + 1−𝑝 log 2 1 1−𝑝 This is best possible. Examples: 𝑝=0⇒𝑅𝑎𝑡𝑒=1 𝑝= 1 2 ⇒𝑅𝑎𝑡𝑒=0 Monotone decreasing for 𝑝∈(0, 1 2 ) Positive rate for 𝑝=0.4999 ; even if 𝑘→∞ January 5, 2016 IISER: Reliable Meaningful Communication
Shannon’s contributions Far-reaching architecture: Profound analysis: First (?) use of probabilistic method. Deep Mathematical Discoveries: Entropy, Information, Bit? Alice Bob Encoder Decoder January 5, 2016 IISER: Reliable Meaningful Communication
Challenges post-Shannon Encoding/Decoding functions not “constructive”. Shannon picked 𝐸 at random, 𝐷 brute force. Consequence: 𝐷 takes time ~ 2 𝑘 to compute (on a computer). 𝐸 takes time 2 2 𝑘 to find! Algorithmic challenge: Find 𝐸,𝐷 more explicitly. Both should take time ~ 𝑘, 𝑘 2 , 𝑘 3 … to compute Solutions: 1948-2017 January 5, 2016 IISER: Reliable Meaningful Communication
Chomsky: Theory of “Language” Formalized syntax in languages mathematically (with caveats). Initially goal was to understand inter-human communication! Structure behind languages Implication on acquisition. Major side-effect: Use of “context-free grammars’’ to specify programming languages! Automated parsing, compiling, interpretation! September 26, 2017 HLF: Mathematical Theory Communication
HLF: Mathematical Theory Communication Modern Theories: Communication is interactive E.g., “buying airline ticket online” Involve large inputs from either side: Travel agent = vast portfolio of airlines+flights Traveler = complex collection of constraints and preferences. Best protocol ≠ Travel agent sends brochure. ≠ Traveller sends entire list of constraints. How to model? What happens if errors happen? How well should context be shared? September 26, 2017 HLF: Mathematical Theory Communication
Communication Complexity: Yao The model (with shared randomness) 𝑥 𝑦 𝑓: 𝑥,𝑦 ↦Σ 𝑅=$$$ Usually studied for lower bounds. This talk: CC as +ve model. Alice Bob 𝐶𝐶 𝑓 =# bits exchanged by best protocol 𝑓(𝑥,𝑦) w.p. 2/3 January 3, 2017 IITB Math & Inf: Functional Uncertainty
HLF: Mathematical Theory Communication Some short protocols! Problem 1: Alice ← 𝑥 1 ,…, 𝑥 𝑛 ; Bob ← 𝑦 1 ,…, 𝑦 𝑛 ; Want to know: 𝑥 1 − 𝑦 1 +⋯+ 𝑥 𝑛 − 𝑦 𝑛 Solution: Alice → Bob: 𝑆≝𝑥 1 +⋯ 𝑥 𝑛 Bob → Alice: 𝑆− 𝑦 1 +⋯+ 𝑦 𝑛 Problem 2: Alice ← 𝑥 1 ,…, 𝑥 𝑛 ; Bob ← 𝑦 1 ,…, 𝑦 𝑛 ; Want to know: 𝑥 1 − 𝑦 1 2 +⋯+ 𝑥 𝑛 − 𝑦 𝑛 2 Solution? Deterministically: Needs 𝑛 bits of communication. Randomized: Say Alice+Bob ← 𝑟 1 , 𝑟 2 ,…, 𝑟 𝑛 ∈ −1.+1 random. Alice→Bob: 𝑆 1 = 𝑥 1 2 +⋯+ 𝑥 𝑛 2 ; 𝑆 2 = 𝑟 1 𝑥 1 +⋯ 𝑟 𝑛 𝑥 𝑛 Bob→Alice: 𝑇 1 =𝑦+⋯+𝑦; 𝑇 2 = 𝑟 1 𝑦 1 +⋯ 𝑟 𝑛 𝑦 𝑛 Thm: Exp 𝑆 1 + 𝑇 1 −2 𝑆 2 𝑇 2 = 𝑥 1 − 𝑦 1 2 +⋯+ 𝑥 𝑛 − 𝑦 𝑛 2 September 26, 2017 HLF: Mathematical Theory Communication
Application to Buying Air Tickets If we can express every flight and every user’s preference as 𝑛 numbers (commonly done in Machine Learning) Then #bits communicated ≈2. description of final itinerary. Only two rounds of communication! Challenge: Express user preferences as numbers! Not yet there … but soon your cellphones will do it! September 26, 2017 HLF: Mathematical Theory Communication
Interaction + Errors: Schulman Consider distributed update of shared document. What if there are errors in interaction? Error must be detected immediately? Or else all future communication wasted. But too early detection might lead to false alarms! Alice Server Bob Typical interaction: Server → User: Current state of document User → Server: Update September 26, 2017 HLF: Mathematical Theory Communication
Interactive Coding Schemes If bits are flipped with probability 𝑝 what is the rate of communication? Limits still not precisely determined! But linear for 𝑝≤ 1 8 (scales more like 1 − 𝐻 𝑝 ) Non-trivial mathematics! Some still not fully constructive … surprising even given Shannon theory! Schulman Braverman-Rao Kol-Raz Haeupler September 26, 2017 HLF: Mathematical Theory Communication
HLF: Mathematical Theory Communication September 26, 2017 HLF: Mathematical Theory Communication
Oberwolfach: Uncertain Communication Sales Pitch + Intro Most of communication theory [a la Shannon, Hamming]: Built around sender and receiver perfectly synchronized. So large context (codes, protocols, priors) ignored. Most Human communication (also device-device) … does not assume perfect synchronization. So context is relevant: Qualitatively (receiver takes wrong action) and Quantitatively (inputs are long!!) August 17, 2017 Oberwolfach: Uncertain Communication
Aside: Contextual Proofs & Uncertainty Mathematical proofs assume large context. “By some estimates a proof that 2+2=4 in ZFC would require about 20000 steps … so we will use a huge set of axioms to shorten our proofs – namely, everything from high-school mathematics” [Lehman,Leighton,Meyer] Context shortens proofs. But context is uncertain! What is “high school mathematics” Is it a fixed set of axioms? Or a set from which others can be derived? Is the latter amenable to efficient reasoning? What is efficiency with large context? August 17, 2017 Oberwolfach: Uncertain Communication
Oberwolfach: Uncertain Communication Sales Pitch + Intro Most of communication theory [a la Shannon, Hamming]: Built around sender and receiver perfectly synchronized. So large context (codes, protocols, priors) ignored. Most Human communication (also device-device) … does not assume perfect synchronization. So context is relevant: Qualitatively (receiver takes wrong action) and Quantitatively (inputs are long!!) Theory? What are the problems? Starting point = Shannon? Yao? August 17, 2017 Oberwolfach: Uncertain Communication
Communication Complexity The model (with shared randomness) 𝑥 𝑦 𝑓: 𝑥,𝑦 ↦Σ 𝑅=$$$ Usually studied for lower bounds. This talk: CC as +ve model. Alice Bob 𝐶𝐶 𝑓 =# bits exchanged by best protocol 𝑓(𝑥,𝑦) w.p. 2/3 August 17, 2017 Oberwolfach: Uncertain Communication
Aside: Easy CC Problems [Ghazi,Kamath,S’15] ∃ Problems with large inputs and small communication? Equality testing: 𝐸𝑄 𝑥,𝑦 =1⇔𝑥=𝑦; 𝐶𝐶 𝐸𝑄 =𝑂 1 Hamming distance: 𝐻 𝑘 𝑥,𝑦 =1⇔Δ 𝑥,𝑦 ≤𝑘; 𝐶𝐶 𝐻 𝑘 =𝑂(𝑘 log 𝑘 ) [Huang etal.] Small set intersection: ∩ 𝑘 𝑥,𝑦 =1⇔ wt 𝑥 , wt 𝑦 ≤𝑘 & ∃𝑖 𝑠.𝑡. 𝑥 𝑖 = 𝑦 𝑖 =1. 𝐶𝐶 ∩ 𝑘 =𝑂(𝑘) [Håstad Wigderson] Gap (Real) Inner Product: 𝑥,𝑦∈ ℝ 𝑛 ; 𝑥 2 , 𝑦 2 =1; 𝐺𝐼 𝑃 𝑐,𝑠 𝑥,𝑦 =1 if 𝑥,𝑦 ≥𝑐; =0 if 𝑥,𝑦 ≤𝑠; 𝐶𝐶 𝐺𝐼 𝑃 𝑐,𝑠 =𝑂 1 𝑐−𝑠 2 ; [Alon, Matias, Szegedy] Protocol: Fix ECC 𝐸: 0,1 𝑛 → 0,1 𝑁 ; Shared randomness: 𝑖← 𝑁 ; Exchange 𝐸 𝑥 𝑖 , 𝐸 𝑦 𝑖 ; Accept iff 𝐸 𝑥 𝑖 =𝐸 𝑦 𝑖 . 𝑝𝑜𝑙𝑦 𝑘 Protocol: Use common randomness to hash 𝑛 →[ 𝑘 2 ] 𝑥=( 𝑥 1 , …, 𝑥 𝑛 ) 𝑦= 𝑦 1 , …, 𝑦 𝑛 〈𝑥,𝑦〉≜ 𝑖 𝑥 𝑖 𝑦 𝑖 Main Insight: If 𝐺←𝑁 0,1 𝑛 , then 𝔼 𝐺,𝑥 ⋅ 𝐺,𝑦 =〈𝑥,𝑦〉 Unstated philosophical contribution of CC a la Yao: Communication with a focus (“only need to determine 𝑓 𝑥,𝑦 ”) can be more effective (shorter than 𝑥 ,𝐻 𝑥 ,𝐻 𝑦 ,𝐼(𝑥;𝑦)… ) August 17, 2017 Oberwolfach: Uncertain Communication
Modelling Shared Context + Imperfection 1 Many possibilities. Ongoing effort. Alice+Bob may have estimates of 𝑥 and 𝑦. More generally: 𝑥,𝑦 close (in some sense). Knowledge of 𝑓 – function Bob wants to compute may not be exactly known to Alice! Shared randomness Alice + Bob may not have identical copies. 3 2 August 17, 2017 Oberwolfach: Uncertain Communication
Oberwolfach: Uncertain Communication 1. Compression Classical compression: Alice ←𝑃, 𝑚∼𝑃; Bob ←𝑃; Alice → Bob: 𝑦= 𝐸 𝑃 (𝑚) ; Bob 𝑚 = 𝐷 𝑃 𝑦 ≟𝑚; [Shannon]: 𝑚 =𝑚; w. 𝔼 𝑚∼𝑃 𝐸 𝑃 𝑚 ≤𝐻 𝑃 +1 𝐻 𝑃 ≝ 𝔼 𝑚∼𝑃 [− log 𝑃 𝑚 ] Uncertain compression [Juba,Kalai,Khanna,S.] Alice ←𝑃, 𝑚∼𝑃; Bob ←𝑄; Alice → Bob: 𝑦= 𝐸 𝑃 (𝑚) ; Bob 𝑚 = 𝐷 𝑄 𝑦 ≟𝑚; 𝑃,𝑄 Δ-close: ∀𝑚 log 𝑃(𝑚) − log 𝑄(𝑚) ≤Δ Can we get 𝔼 𝑚∼𝑃 𝐸 𝑃 𝑚 ≤𝑂 𝐻 𝑃 +Δ ? [JKKS] – Yes – with shared randomness. [CGMS] – Yes – with shared correlation. [Haramaty+S.] – Deterministically 𝑂(𝐻 𝑃 + log log |Ω|) Universe August 17, 2017 Oberwolfach: Uncertain Communication
Deterministic Compression: Challenge Say Alice and Bob have rankings of N players. Rankings = bijections 𝜋, 𝜎 : 𝑁 → 𝑁 𝜋 𝑖 = rank of i th player in Alice’s ranking. Further suppose they know rankings are close. ∀ 𝑖∈ 𝑁 : 𝜋 𝑖 −𝜎 𝑖 ≤2. Bob wants to know: Is 𝜋 −1 1 = 𝜎 −1 1 How many bits does Alice need to send (non-interactively). With shared randomness – 𝑂(1) Deterministically? 𝑂 1 ? 𝑂( log 𝑁 )?𝑂( log log log 𝑁)? With Elad Haramaty: 𝑂 ( log ∗ 𝑛 ) August 17, 2017 Oberwolfach: Uncertain Communication
Compression as a proxy for language Information theoretic study of language? Goal of language: Effective means of expressing information/action. Implicit objective of language: Make frequent messages short. Compression! Frequency = Known globally? Learned locally? If latter – every one can’t possibly agree on it; Yet need to agree on language (mostly)! Similar to problem of Uncertain Compression. Studied formally in [Ghazi,Haramaty,Kamath,S. ITCS 17] August 17, 2017 Oberwolfach: Uncertain Communication
2. Imperfectly Shared Randomness Recall: Communication becomes more effective with randomness. Identity, Hamming Distance, Small Set Intersection, Inner Product. How does performance degrade if players only share correlated variables: E.g. Alice ←𝑟; Bob ←𝑠. 𝑟,𝑠)= ( 𝑟 𝑖 , 𝑡 𝑖 𝑖 i.i.d. 𝑟 𝑖 , 𝑠 𝑖 ∈ −1,1 ; 𝔼 𝑟 𝑖 =𝔼 𝑠 𝑖 =0;𝔼 𝑟 𝑖 𝑠 𝑖 =𝜌∈(0,1); [CGMS ’16]: Comm. With perfect randomness = 𝑘 ⇒ Comm. With imperfect randomness = 𝑂 𝜌 ( 2 𝑘 ) August 17, 2017 Oberwolfach: Uncertain Communication
Imperfectly Shared Randomness Easy (Complete) Problem: Gap Inner Product: 𝑥,𝑦∈ ℝ 𝑛 𝐺𝐼 𝑃 𝑐,𝑠 𝑥,𝑦 =1 if 𝑥,𝑦 ≥𝜖⋅ 𝑥 2 ⋅ 𝑦 2 ; =0 if 𝑥,𝑦 ≤0 Decidable with 𝑂 𝜌 1 𝜖 2 (o.w.) communication Hard Problem: Sparse Gap Inner Product: 𝐺𝐼𝑃 on sparse 𝑥 𝑥∈ 0,1 𝑛 , 𝑦∈ −1,1 𝑛 ; 𝑥 1 =2𝜖𝑛 Classical communication = 𝑂 log 1 𝜖 [uses sparsity] No way to use sparsity with imperfect randomness. August 17, 2017 Oberwolfach: Uncertain Communication
3. Functional Uncertainty [Ghazi,Komargodski,Kothari,S. ‘16] Recall positive message of Yao’s model: Communication can be brief, if Alice knows what function 𝑓(𝑥,𝑦) Bob wants to compute. What if Alice only knows 𝑓 approximately? Can communication still be short? August 17, 2017 Oberwolfach: Uncertain Communication
Oberwolfach: Uncertain Communication The Model Recall Distributional Complexity: 𝑥,𝑦 ∼𝜇; error 𝜇 (Π)≝ Pr 𝑥,𝑦∼𝜇 𝑓 𝑥,𝑦 ≠Π 𝑥,𝑦 Complexity:𝑐 𝑐 𝜇, 𝜖 𝑓 ≝ min Π:𝑒𝑟𝑟𝑜 𝑟 𝜇 Π ≤𝜖 max 𝑥,𝑦 Π(𝑥,𝑦) Functional Uncertainty Model - I: Adversary picks 𝑓,𝑔 . Nature picks 𝑥,𝑦 ∼𝜇 Alice ←(𝑓,𝑥); Bob ←(𝑔,𝑦) ; Compute 𝑔(𝑥,𝑦) Promise: 𝛿 𝜇 𝑓,𝑔 ≝ Pr 𝜇 𝑓 𝑥,𝑦 ≠𝑔 𝑥,𝑦 ≤ 𝛿 0 Goal: Compute (any) Π 𝑥,𝑦 with 𝛿 𝜇 𝑔,Π ≤ 𝜖 1 (just want 𝜖 1 →0 as 𝛿 0 →0) If 𝑓,𝑔 part of input; this is complexity of what? August 17, 2017 Oberwolfach: Uncertain Communication
Modelling Uncertainty 𝑓 𝑔 𝒢 Modelled by graph 𝒢 of possible inputs Protocols know 𝒢 but not (𝑓,𝑔) 𝑐 𝑐 𝜇,𝜖 𝒢 ≝ max 𝑓,𝑔 ∈𝒢 𝑐 𝑐 𝜇,𝜖 𝑔 𝛿 𝜇 𝒢 ≝ max 𝑓,𝑔 ∈𝒢 𝛿 𝜇 (𝑓,𝑔) Uncertain error: 𝑒𝑟𝑟𝑜 𝑟 𝜇,𝒢 Π ≝ max 𝑓,𝑔 ∈𝒢 Pr 𝜇 𝑔 𝑥,𝑦 ≠Π(𝑓,𝑔,𝑥,𝑦) Uncertain complexity: 𝑈𝑐 𝑐 𝜇,𝜖 𝒢 ≝ min Π:error Π ≤𝜖 max 𝑓,𝑔,𝑥,𝑦 Π 𝑓,𝑔,𝑥,𝑦 Compare 𝑐 𝑐 𝜇, 𝜖 0 𝒢 vs. 𝑈𝑐 𝑐 𝜇, 𝜖 1 (𝒢) want 𝜖 1 →0 as 𝜖 0 , 𝛿 𝜇 𝒢 →0 August 17, 2017 Oberwolfach: Uncertain Communication
Main Results Thm 1: (-ve) ∃𝒢,𝜇 s.t. 𝛿 𝜇 𝒢 =𝑜 1 ;𝑐 𝑐 𝜇,𝑜(1) 𝒢 =1; but 𝑈𝑐 𝑐 𝜇,.1 𝒢 =Ω 𝑛 ; ( 𝑛= 𝑥 = 𝑦 ) Thm 2: (+ve) ∀𝒢, product 𝜇, 𝑈𝑐 𝑐 𝜇, 𝜖 1 𝑜𝑛𝑒𝑤𝑎𝑦 𝒢 =𝑂 𝑐 𝑐 𝜇, 𝜖 0 𝑜𝑛𝑒𝑤𝑎𝑦 𝒢 where 𝜖 1 →0 as 𝜖 0 , 𝛿 𝜇 𝒢 →0 Thm 2’: (+ve) ∀𝒢, 𝜇, 𝑈𝑐 𝑐 𝜇, 𝜖 1 𝑜𝑛𝑒𝑤𝑎𝑦 𝒢 =𝑂 𝑐 𝑐 𝜇, 𝜖 0 𝑜𝑛𝑒𝑤𝑎𝑦 𝒢 ⋅(1+𝐼 𝑥;𝑦 ) and 𝐼 𝑥;𝑦 = Mutual Information between 𝑥;𝑦 Protocols are not continuous wrt the function being computed August 17, 2017 Oberwolfach: Uncertain Communication
Details of Negative Result 𝜇: 𝑥∼𝑈 0,1 𝑛 ;𝑦=𝑁𝑜𝑖𝑠𝑦(𝑥) ; Pr 𝑥 𝑖 ≠ 𝑦 𝑖 =1/ 𝑛 ; 𝒢= ⊕ 𝑆 𝑥⊕𝑦 , ⊕ 𝑇 (𝑥⊕𝑦) | 𝑆⊕𝑇 =𝑜 𝑛 ⊕ 𝑆 𝑧 = ⊕ 𝑖∈𝑆 𝑧 𝑖 Certain Comm: Alice → Bob: ⊕ 𝑇 𝑥 𝛿 𝜇 𝒢 = max 𝑆,𝑇 Pr 𝑥,𝑦 ⊕ 𝑆 𝑥⊕𝑦 ≠ ⊕ 𝑇 𝑥⊕𝑦 = max 𝑈: 𝑈 =𝑜 𝑛 Pr 𝑧∼Bernoulli 1 𝑛 𝑛 ⊕ 𝑈 𝑧 =1 =𝑜(1) Uncertain Lower bound: Standard 𝑐 𝑐 𝜇,𝜖 𝐹 where 𝐹 𝑆,𝑥 ; 𝑇,𝑦 = ⊕ 𝑇 𝑥⊕𝑦 ; Lower bound obtain by picking 𝑆,𝑇 randomly: 𝑆 uniform; 𝑇 noisy copy of 𝑆 August 17, 2017 Oberwolfach: Uncertain Communication
Positive result (Thm. 2) 𝑦 1 𝑦 2 𝑦 𝑚 Consider comm. Matrix Protocol for 𝑔 partitions matrix in 2 𝑘 blocks Bob wants to know which block? Common randomness: 𝑦 1 ,…, 𝑦 𝑚 Alice → Bob: 𝑓 𝑥, 𝑦 1 …𝑓 𝑥, 𝑦 𝑚 Bob (whp) recognizes block and uses it. 𝑚=𝑂 𝑘 suffices. “CC preserved under uncertainty for one-way communication if 𝑥⊥𝑦” 𝑔(𝑥,𝑦) August 17, 2017 Oberwolfach: Uncertain Communication
Oberwolfach: Uncertain Communication Analysis Details W.p. 1 − 𝜖 , ∃𝑗 s.t. 𝛿 𝑔 𝑥 𝑗 ,⋅ ,𝑔 𝑥,⋅ ≤ 𝜖 Main idea: If Π 𝑔 𝑥 = Π 𝑔 𝑥 𝑗 then w.h.p. 𝛿 𝑔 𝑥 𝑗 ,⋅ ,𝑔 𝑥,⋅ ≤ 𝜖 If 𝑗∈ 𝐾 s.t. 𝛿 𝑔 𝑥 𝑗 ,⋅ , 𝑔 𝑥,⋅ ≥2 𝜖 then Pr 𝑗 is selected = exp(−𝑚). But Step 2. works only if 𝑦 𝑖 ∼ 𝜇 𝑥 August 17, 2017 Oberwolfach: Uncertain Communication
Oberwolfach: Uncertain Communication Thm 2’: Main Idea Now can not sample 𝑦 1 ,…, 𝑦 𝑚 independent of 𝑥 Instead use [HJMR’07] to sample 𝑦 𝑖 ∼ 𝜇 𝑥 Each sample costs 𝐼 𝑥;𝑦 Analysis goes through … August 17, 2017 Oberwolfach: Uncertain Communication
4. Contextual Proofs and Uncertainty? Scenario: Alice + Bob start with axioms 𝐴: subset of clauses on 𝑋 1 ,…, 𝑋 𝑛 Alice wishes to prove 𝐴⇒𝐶 for some clause 𝐶 But proof Π:𝐴⇒𝐶 may be long (~ 2 𝑛 ) Context to rescue: Maybe Alice + Bob share context 𝐷⇐𝐴 ; and contextual proof Π ′ :𝐷⇒𝐶 short (poly(𝑛)) Uncertainty: Alice’s Context 𝐷 𝐴 ≠ 𝐷 𝐵 (Bob’s context) Alice writes proof Π ′ : 𝐷 𝐴 ⇒𝐶 When can Bob verify Π′ given 𝐷 𝐵 ? August 17, 2017 Oberwolfach: Uncertain Communication
4. Contextual Proofs and Uncertainty? -2 Scenario: Alice + Bob start with axioms 𝐴: subset of clauses on 𝑋 1 ,…, 𝑋 𝑛 Alice wishes to prove 𝐴⇒𝐶 for some clause 𝐶 Alice writes proof Π ′ : 𝐷 𝐴 ⇒𝐶 When can Bob verify Π′ given 𝐷 𝐵 ? Surely if 𝐷 𝐴 ⊆ 𝐷 𝐵 What if 𝐷 𝐴 ∖ 𝐷 𝐵 = 𝐶′ and Π ′′ : 𝐷 𝐵 ⇒𝐶′ is one step long? Can Bob still verify Π′: 𝐷 𝐴 ⇒𝐶 in poly(𝑛) time? Need feasible data structure that allows this! None known to exist. Might be harder than Partial Match Retrieval … August 17, 2017 Oberwolfach: Uncertain Communication
Oberwolfach: Uncertain Communication Summarizing Perturbing “common information” assumptions in Shannon/Yao theory, lead to many changes. Some debilitating Some not; but require careful protocol choice. In general: Communication protocols are not continuous functions of the common information. Uncertain model (𝒢) needs more exploration! Some open questions from our work: Tighten the gap: 𝑐𝑐 𝑓 ⋅𝐼 vs. 𝑐𝑐 𝑓 + 𝐼 Multi-round setting? Two rounds? What if randomness & function imprefectly shared? [Prelim. Results in [Ghazi+S’17]] August 17, 2017 Oberwolfach: Uncertain Communication
Contextual Communication & Uncertainty September 26, 2017 HLF: Mathematical Theory Communication
Communication Complexity: Yao The model (with shared randomness) 𝑥 𝑦 𝑓: 𝑥,𝑦 ↦Σ 𝑅=$$$ Bob 𝐶𝐶 𝑓 =# bits exchanged by best protocol 𝑓(𝑥,𝑦) w.p. 2/3 January 3, 2017 IITB Math & Inf: Functional Uncertainty
Uncertain Communication September 26, 2017 HLF: Mathematical Theory Communication
Boole’s “modest” ambition “The design of the following treatise is to investigate the fundamental laws of those operations of the mind by which reasoning is performed; to give expression to them in the symbolical language of a Calculus, and upon this foundation to establish the science of Logic and construct its method; to make that method itself the basis of a general method for the application of the mathematical doctrine of Probabilities; and, finally, to collect from the various elements of truth brought to view in the course of these inquiries some probable intimations concerning the nature and constitution of the human mind.” [G.Boole, “On the laws of thought …” p.1] September 26, 2017 HLF: Mathematical Theory Communication
HLF: Mathematical Theory Communication Sales Pitch + Intro Most of communication theory [a la Shannon, Hamming]: Built around sender and receiver perfectly synchronized. So large context (codes, protocols, priors) ignored. Most Human communication (also device-device) … does not assume perfect synchronization. So context is relevant: Qualitatively (receiver takes wrong action) and Quantitatively (inputs are long!!) September 26, 2017 HLF: Mathematical Theory Communication
Aside: Contextual Proofs & Uncertainty Mathematical proofs assume large context. “By some estimates a proof that 2+2=4 in ZFC would require about 20000 steps … so we will use a huge set of axioms to shorten our proofs – namely, everything from high-school mathematics” [Lehman,Leighton,Meyer] Context shortens proofs. But context is uncertain! What is “high school mathematics” Is it a fixed set of axioms? Or a set from which others can be derived? Is the latter amenable to efficient reasoning? What is efficiency with large context? September 26, 2017 HLF: Mathematical Theory Communication
HLF: Mathematical Theory Communication Sales Pitch + Intro Most of communication theory [a la Shannon, Hamming]: Built around sender and receiver perfectly synchronized. So large context (codes, protocols, priors) ignored. Most Human communication (also device-device) … does not assume perfect synchronization. So context is relevant: Qualitatively (receiver takes wrong action) and Quantitatively (inputs are long!!) Theory? What are the problems? Starting point = Shannon? Yao? September 26, 2017 HLF: Mathematical Theory Communication
Communication Complexity The model (with shared randomness) 𝑥 𝑦 𝑓: 𝑥,𝑦 ↦Σ 𝑅=$$$ Usually studied for lower bounds. This talk: CC as +ve model. Alice Bob 𝐶𝐶 𝑓 =# bits exchanged by best protocol 𝑓(𝑥,𝑦) w.p. 2/3 September 26, 2017 HLF: Mathematical Theory Communication
Aside: Easy CC Problems [Ghazi,Kamath,S’15] ∃ Problems with large inputs and small communication? Equality testing: 𝐸𝑄 𝑥,𝑦 =1⇔𝑥=𝑦; 𝐶𝐶 𝐸𝑄 =𝑂 1 Hamming distance: 𝐻 𝑘 𝑥,𝑦 =1⇔Δ 𝑥,𝑦 ≤𝑘; 𝐶𝐶 𝐻 𝑘 =𝑂(𝑘 log 𝑘 ) [Huang etal.] Small set intersection: ∩ 𝑘 𝑥,𝑦 =1⇔ wt 𝑥 , wt 𝑦 ≤𝑘 & ∃𝑖 𝑠.𝑡. 𝑥 𝑖 = 𝑦 𝑖 =1. 𝐶𝐶 ∩ 𝑘 =𝑂(𝑘) [Håstad Wigderson] Gap (Real) Inner Product: 𝑥,𝑦∈ ℝ 𝑛 ; 𝑥 2 , 𝑦 2 =1; 𝐺𝐼 𝑃 𝑐,𝑠 𝑥,𝑦 =1 if 𝑥,𝑦 ≥𝑐; =0 if 𝑥,𝑦 ≤𝑠; 𝐶𝐶 𝐺𝐼 𝑃 𝑐,𝑠 =𝑂 1 𝑐−𝑠 2 ; [Alon, Matias, Szegedy] Protocol: Fix ECC 𝐸: 0,1 𝑛 → 0,1 𝑁 ; Shared randomness: 𝑖← 𝑁 ; Exchange 𝐸 𝑥 𝑖 , 𝐸 𝑦 𝑖 ; Accept iff 𝐸 𝑥 𝑖 =𝐸 𝑦 𝑖 . 𝑝𝑜𝑙𝑦 𝑘 Protocol: Use common randomness to hash 𝑛 →[ 𝑘 2 ] 𝑥=( 𝑥 1 , …, 𝑥 𝑛 ) 𝑦= 𝑦 1 , …, 𝑦 𝑛 〈𝑥,𝑦〉≜ 𝑖 𝑥 𝑖 𝑦 𝑖 Main Insight: If 𝐺←𝑁 0,1 𝑛 , then 𝔼 𝐺,𝑥 ⋅ 𝐺,𝑦 =〈𝑥,𝑦〉 Unstated philosophical contribution of CC a la Yao: Communication with a focus (“only need to determine 𝑓 𝑥,𝑦 ”) can be more effective (shorter than 𝑥 ,𝐻 𝑥 ,𝐻 𝑦 ,𝐼(𝑥;𝑦)… ) September 26, 2017 HLF: Mathematical Theory Communication
Modelling Shared Context + Imperfection 1 Many possibilities. Ongoing effort. Alice+Bob may have estimates of 𝑥 and 𝑦. More generally: 𝑥,𝑦 close (in some sense). Knowledge of 𝑓 – function Bob wants to compute may not be exactly known to Alice! Shared randomness Alice + Bob may not have identical copies. 3 2 September 26, 2017 HLF: Mathematical Theory Communication
HLF: Mathematical Theory Communication 1. Compression Classical compression: Alice ←𝑃, 𝑚∼𝑃; Bob ←𝑃; Alice → Bob: 𝑦= 𝐸 𝑃 (𝑚) ; Bob 𝑚 = 𝐷 𝑃 𝑦 ≟𝑚; [Shannon]: 𝑚 =𝑚; w. 𝔼 𝑚∼𝑃 𝐸 𝑃 𝑚 ≤𝐻 𝑃 +1 𝐻 𝑃 ≝ 𝔼 𝑚∼𝑃 [− log 𝑃 𝑚 ] Uncertain compression [Juba,Kalai,Khanna,S.] Alice ←𝑃, 𝑚∼𝑃; Bob ←𝑄; Alice → Bob: 𝑦= 𝐸 𝑃 (𝑚) ; Bob 𝑚 = 𝐷 𝑄 𝑦 ≟𝑚; 𝑃,𝑄 Δ-close: ∀𝑚 log 𝑃(𝑚) − log 𝑄(𝑚) ≤Δ Can we get 𝔼 𝑚∼𝑃 𝐸 𝑃 𝑚 ≤𝑂 𝐻 𝑃 +Δ ? [JKKS] – Yes – with shared randomness. [CGMS] – Yes – with shared correlation. [Haramaty+S.] – Deterministically 𝑂(𝐻 𝑃 + log log |Ω|) Universe September 26, 2017 HLF: Mathematical Theory Communication
Deterministic Compression: Challenge Say Alice and Bob have rankings of N players. Rankings = bijections 𝜋, 𝜎 : 𝑁 → 𝑁 𝜋 𝑖 = rank of i th player in Alice’s ranking. Further suppose they know rankings are close. ∀ 𝑖∈ 𝑁 : 𝜋 𝑖 −𝜎 𝑖 ≤2. Bob wants to know: Is 𝜋 −1 1 = 𝜎 −1 1 How many bits does Alice need to send (non-interactively). With shared randomness – 𝑂(1) Deterministically? 𝑂 1 ? 𝑂( log 𝑁 )?𝑂( log log log 𝑁)? With Elad Haramaty: 𝑂 ( log ∗ 𝑛 ) September 26, 2017 HLF: Mathematical Theory Communication
Compression as a proxy for language Information theoretic study of language? Goal of language: Effective means of expressing information/action. Implicit objective of language: Make frequent messages short. Compression! Frequency = Known globally? Learned locally? If latter – every one can’t possibly agree on it; Yet need to agree on language (mostly)! Similar to problem of Uncertain Compression. Studied formally in [Ghazi,Haramaty,Kamath,S. ITCS 17] September 26, 2017 HLF: Mathematical Theory Communication
2. Imperfectly Shared Randomness Recall: Communication becomes more effective with randomness. Identity, Hamming Distance, Small Set Intersection, Inner Product. How does performance degrade if players only share correlated variables: E.g. Alice ←𝑟; Bob ←𝑠. 𝑟,𝑠)= ( 𝑟 𝑖 , 𝑡 𝑖 𝑖 i.i.d. 𝑟 𝑖 , 𝑠 𝑖 ∈ −1,1 ; 𝔼 𝑟 𝑖 =𝔼 𝑠 𝑖 =0;𝔼 𝑟 𝑖 𝑠 𝑖 =𝜌∈(0,1); [CGMS ’16]: Comm. With perfect randomness = 𝑘 ⇒ Comm. With imperfect randomness = 𝑂 𝜌 ( 2 𝑘 ) September 26, 2017 HLF: Mathematical Theory Communication
Imperfectly Shared Randomness Easy (Complete) Problem: Gap Inner Product: 𝑥,𝑦∈ ℝ 𝑛 𝐺𝐼 𝑃 𝑐,𝑠 𝑥,𝑦 =1 if 𝑥,𝑦 ≥𝜖⋅ 𝑥 2 ⋅ 𝑦 2 ; =0 if 𝑥,𝑦 ≤0 Decidable with 𝑂 𝜌 1 𝜖 2 (o.w.) communication Hard Problem: Sparse Gap Inner Product: 𝐺𝐼𝑃 on sparse 𝑥 𝑥∈ 0,1 𝑛 , 𝑦∈ −1,1 𝑛 ; 𝑥 1 =2𝜖𝑛 Classical communication = 𝑂 log 1 𝜖 [uses sparsity] No way to use sparsity with imperfect randomness. September 26, 2017 HLF: Mathematical Theory Communication
3. Functional Uncertainty [Ghazi,Komargodski,Kothari,S. ‘16] Recall positive message of Yao’s model: Communication can be brief, if Alice knows what function 𝑓(𝑥,𝑦) Bob wants to compute. What if Alice only knows 𝑓 approximately? Can communication still be short? September 26, 2017 HLF: Mathematical Theory Communication
HLF: Mathematical Theory Communication The Model Recall Distributional Complexity: 𝑥,𝑦 ∼𝜇; error 𝜇 (Π)≝ Pr 𝑥,𝑦∼𝜇 𝑓 𝑥,𝑦 ≠Π 𝑥,𝑦 Complexity:𝑐 𝑐 𝜇, 𝜖 𝑓 ≝ min Π:𝑒𝑟𝑟𝑜 𝑟 𝜇 Π ≤𝜖 max 𝑥,𝑦 Π(𝑥,𝑦) Functional Uncertainty Model - I: Adversary picks 𝑓,𝑔 . Nature picks 𝑥,𝑦 ∼𝜇 Alice ←(𝑓,𝑥); Bob ←(𝑔,𝑦) ; Compute 𝑔(𝑥,𝑦) Promise: 𝛿 𝜇 𝑓,𝑔 ≝ Pr 𝜇 𝑓 𝑥,𝑦 ≠𝑔 𝑥,𝑦 ≤ 𝛿 0 Goal: Compute (any) Π 𝑥,𝑦 with 𝛿 𝜇 𝑔,Π ≤ 𝜖 1 (just want 𝜖 1 →0 as 𝛿 0 →0) If 𝑓,𝑔 part of input; this is complexity of what? September 26, 2017 HLF: Mathematical Theory Communication
Modelling Uncertainty 𝑓 𝑔 𝒢 Modelled by graph 𝒢 of possible inputs Protocols know 𝒢 but not (𝑓,𝑔) 𝑐 𝑐 𝜇,𝜖 𝒢 ≝ max 𝑓,𝑔 ∈𝒢 𝑐 𝑐 𝜇,𝜖 𝑔 𝛿 𝜇 𝒢 ≝ max 𝑓,𝑔 ∈𝒢 𝛿 𝜇 (𝑓,𝑔) Uncertain error: 𝑒𝑟𝑟𝑜 𝑟 𝜇,𝒢 Π ≝ max 𝑓,𝑔 ∈𝒢 Pr 𝜇 𝑔 𝑥,𝑦 ≠Π(𝑓,𝑔,𝑥,𝑦) Uncertain complexity: 𝑈𝑐 𝑐 𝜇,𝜖 𝒢 ≝ min Π:error Π ≤𝜖 max 𝑓,𝑔,𝑥,𝑦 Π 𝑓,𝑔,𝑥,𝑦 Compare 𝑐 𝑐 𝜇, 𝜖 0 𝒢 vs. 𝑈𝑐 𝑐 𝜇, 𝜖 1 (𝒢) want 𝜖 1 →0 as 𝜖 0 , 𝛿 𝜇 𝒢 →0 September 26, 2017 HLF: Mathematical Theory Communication
Main Results Thm 1: (-ve) ∃𝒢,𝜇 s.t. 𝛿 𝜇 𝒢 =𝑜 1 ;𝑐 𝑐 𝜇,𝑜(1) 𝒢 =1; but 𝑈𝑐 𝑐 𝜇,.1 𝒢 =Ω 𝑛 ; ( 𝑛= 𝑥 = 𝑦 ) Thm 2: (+ve) ∀𝒢, product 𝜇, 𝑈𝑐 𝑐 𝜇, 𝜖 1 𝑜𝑛𝑒𝑤𝑎𝑦 𝒢 =𝑂 𝑐 𝑐 𝜇, 𝜖 0 𝑜𝑛𝑒𝑤𝑎𝑦 𝒢 where 𝜖 1 →0 as 𝜖 0 , 𝛿 𝜇 𝒢 →0 Thm 2’: (+ve) ∀𝒢, 𝜇, 𝑈𝑐 𝑐 𝜇, 𝜖 1 𝑜𝑛𝑒𝑤𝑎𝑦 𝒢 =𝑂 𝑐 𝑐 𝜇, 𝜖 0 𝑜𝑛𝑒𝑤𝑎𝑦 𝒢 ⋅(1+𝐼 𝑥;𝑦 ) and 𝐼 𝑥;𝑦 = Mutual Information between 𝑥;𝑦 Protocols are not continuous wrt the function being computed September 26, 2017 HLF: Mathematical Theory Communication
Details of Negative Result 𝜇: 𝑥∼𝑈 0,1 𝑛 ;𝑦=𝑁𝑜𝑖𝑠𝑦(𝑥) ; Pr 𝑥 𝑖 ≠ 𝑦 𝑖 =1/ 𝑛 ; 𝒢= ⊕ 𝑆 𝑥⊕𝑦 , ⊕ 𝑇 (𝑥⊕𝑦) | 𝑆⊕𝑇 =𝑜 𝑛 ⊕ 𝑆 𝑧 = ⊕ 𝑖∈𝑆 𝑧 𝑖 Certain Comm: Alice → Bob: ⊕ 𝑇 𝑥 𝛿 𝜇 𝒢 = max 𝑆,𝑇 Pr 𝑥,𝑦 ⊕ 𝑆 𝑥⊕𝑦 ≠ ⊕ 𝑇 𝑥⊕𝑦 = max 𝑈: 𝑈 =𝑜 𝑛 Pr 𝑧∼Bernoulli 1 𝑛 𝑛 ⊕ 𝑈 𝑧 =1 =𝑜(1) Uncertain Lower bound: Standard 𝑐 𝑐 𝜇,𝜖 𝐹 where 𝐹 𝑆,𝑥 ; 𝑇,𝑦 = ⊕ 𝑇 𝑥⊕𝑦 ; Lower bound obtain by picking 𝑆,𝑇 randomly: 𝑆 uniform; 𝑇 noisy copy of 𝑆 September 26, 2017 HLF: Mathematical Theory Communication
Positive result (Thm. 2) 𝑦 1 𝑦 2 𝑦 𝑚 Consider comm. Matrix Protocol for 𝑔 partitions matrix in 2 𝑘 blocks Bob wants to know which block? Common randomness: 𝑦 1 ,…, 𝑦 𝑚 Alice → Bob: 𝑓 𝑥, 𝑦 1 …𝑓 𝑥, 𝑦 𝑚 Bob (whp) recognizes block and uses it. 𝑚=𝑂 𝑘 suffices. “CC preserved under uncertainty for one-way communication if 𝑥⊥𝑦” 𝑔(𝑥,𝑦) September 26, 2017 HLF: Mathematical Theory Communication
HLF: Mathematical Theory Communication Analysis Details W.p. 1 − 𝜖 , ∃𝑗 s.t. 𝛿 𝑔 𝑥 𝑗 ,⋅ ,𝑔 𝑥,⋅ ≤ 𝜖 Main idea: If Π 𝑔 𝑥 = Π 𝑔 𝑥 𝑗 then w.h.p. 𝛿 𝑔 𝑥 𝑗 ,⋅ ,𝑔 𝑥,⋅ ≤ 𝜖 If 𝑗∈ 𝐾 s.t. 𝛿 𝑔 𝑥 𝑗 ,⋅ , 𝑔 𝑥,⋅ ≥2 𝜖 then Pr 𝑗 is selected = exp(−𝑚). But Step 2. works only if 𝑦 𝑖 ∼ 𝜇 𝑥 September 26, 2017 HLF: Mathematical Theory Communication
HLF: Mathematical Theory Communication Thm 2’: Main Idea Now can not sample 𝑦 1 ,…, 𝑦 𝑚 independent of 𝑥 Instead use [HJMR’07] to sample 𝑦 𝑖 ∼ 𝜇 𝑥 Each sample costs 𝐼 𝑥;𝑦 Analysis goes through … September 26, 2017 HLF: Mathematical Theory Communication
4. Contextual Proofs and Uncertainty? Scenario: Alice + Bob start with axioms 𝐴: subset of clauses on 𝑋 1 ,…, 𝑋 𝑛 Alice wishes to prove 𝐴⇒𝐶 for some clause 𝐶 But proof Π:𝐴⇒𝐶 may be long (~ 2 𝑛 ) Context to rescue: Maybe Alice + Bob share context 𝐷⇐𝐴 ; and contextual proof Π ′ :𝐷⇒𝐶 short (poly(𝑛)) Uncertainty: Alice’s Context 𝐷 𝐴 ≠ 𝐷 𝐵 (Bob’s context) Alice writes proof Π ′ : 𝐷 𝐴 ⇒𝐶 When can Bob verify Π′ given 𝐷 𝐵 ? September 26, 2017 HLF: Mathematical Theory Communication
4. Contextual Proofs and Uncertainty? -2 Scenario: Alice + Bob start with axioms 𝐴: subset of clauses on 𝑋 1 ,…, 𝑋 𝑛 Alice wishes to prove 𝐴⇒𝐶 for some clause 𝐶 Alice writes proof Π ′ : 𝐷 𝐴 ⇒𝐶 When can Bob verify Π′ given 𝐷 𝐵 ? Surely if 𝐷 𝐴 ⊆ 𝐷 𝐵 What if 𝐷 𝐴 ∖ 𝐷 𝐵 = 𝐶′ and Π ′′ : 𝐷 𝐵 ⇒𝐶′ is one step long? Can Bob still verify Π′: 𝐷 𝐴 ⇒𝐶 in poly(𝑛) time? Need feasible data structure that allows this! None known to exist. Might be harder than Partial Match Retrieval … September 26, 2017 HLF: Mathematical Theory Communication
HLF: Mathematical Theory Communication Summarizing Perturbing “common information” assumptions in Shannon/Yao theory, lead to many changes. Some debilitating Some not; but require careful protocol choice. In general: Communication protocols are not continuous functions of the common information. Uncertain model (𝒢) needs more exploration! Some open questions from our work: Tighten the gap: 𝑐𝑐 𝑓 ⋅𝐼 vs. 𝑐𝑐 𝑓 + 𝐼 Multi-round setting? Two rounds? What if randomness & function imprefectly shared? [Prelim. Results in [Ghazi+S’17]] September 26, 2017 HLF: Mathematical Theory Communication
HLF: Mathematical Theory Communication Thank You! September 26, 2017 HLF: Mathematical Theory Communication