Randomization Carmella Kroitoru Seminar on Communication Complexity.

Slides:



Advertisements
Similar presentations
Sublinear Algorithms … Lecture 23: April 20.
Advertisements

Shortest Vector In A Lattice is NP-Hard to approximate
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
CSCI 3160 Design and Analysis of Algorithms Tutorial 4
Order Statistics Sorted
Presentation on Probability Distribution * Binomial * Chi-square
Gillat Kol (IAS) joint work with Ran Raz (Weizmann + IAS) Interactive Channel Capacity.
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Having Proofs for Incorrectness
Randomized Algorithms Kyomin Jung KAIST Applied Algorithm Lab Jan 12, WSAC
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
Complexity 18-1 Complexity Andrei Bulatov Probabilistic Algorithms.
CS151 Complexity Theory Lecture 6 April 15, 2015.
CS151 Complexity Theory Lecture 7 April 20, 2004.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
Randomized Computation Roni Parshani Orly Margalit Eran Mantzur Avi Mintz
Class notes for ISE 201 San Jose State University
The Goldreich-Levin Theorem: List-decoding the Hadamard code
Evaluating Hypotheses
Complexity 19-1 Complexity Andrei Bulatov More Probabilistic Algorithms.
1 Communication Complexity מגישים: מיכאל זמור: /2 אבי מינץ: ערן מנצור: ת.ז /9.
Study Group Randomized Algorithms Jun 7, 2003 Jun 14, 2003.
DANSS Colloquium By Prof. Danny Dolev Presented by Rica Gonen
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Finite probability space set  (sample space) function P:  R + (probability distribution)  P(x) = 1 x 
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Randomized Algorithms Morteza ZadiMoghaddam Amin Sayedi.
Complexity Theory Lecture 2 Lecturer: Moni Naor. Recap of last week Computational Complexity Theory: What, Why and How Overview: Turing Machines, Church-Turing.
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
Great Theoretical Ideas in Computer Science.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Chapter 14 Randomized algorithms Introduction Las Vegas and Monte Carlo algorithms Randomized Quicksort Randomized selection Testing String Equality Pattern.
Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties:
. CLASSES RP AND ZPP By: SARIKA PAMMI. CONTENTS:  INTRODUCTION  RP  FACTS ABOUT RP  MONTE CARLO ALGORITHM  CO-RP  ZPP  FACTS ABOUT ZPP  RELATION.
1 Permutation routing in n-cube. 2 n-cube 1-cube2-cube3-cube 4-cube.
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
Expectation for multivariate distributions. Definition Let X 1, X 2, …, X n denote n jointly distributed random variable with joint density function f(x.
CpSc 881: Machine Learning Evaluating Hypotheses.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Asymmetric Communication Complexity And its implications on Cell Probe Complexity Slides by Elad Verbin Based on a paper of Peter Bro Miltersen, Noam Nisan,
Fall 2013 CMU CS Computational Complexity Lectures 8-9 Randomness, communication, complexity of unique solutions These slides are mostly a resequencing.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Umans Complexity Theory Lectures Lecture 7b: Randomization in Communication Complexity.
Data Stream Algorithms Lower Bounds Graham Cormode
Function Definition by Cases and Recursion Lecture 2, Programmeringsteknik del A.
Binomial Distributions Chapter 5.3 – Probability Distributions and Predictions Mathematics of Data Management (Nelson) MDM 4U.
ICS 353: Design and Analysis of Algorithms
NP-Completness Turing Machine. Hard problems There are many many important problems for which no polynomial algorithms is known. We show that a polynomial-time.
Communication Complexity Guy Feigenblat Based on lecture by Dr. Ely Porat Some slides where adapted from various sources Complexity course Computer science.
Linear Sorting. Comparison based sorting Any sorting algorithm which is based on comparing the input elements has a lower bound of Proof, since there.
Binomial Distributions Chapter 5.3 – Probability Distributions and Predictions Mathematics of Data Management (Nelson) MDM 4U Authors: Gary Greer (with.
3/7/20161 Now it’s time to look at… Discrete Probability.
Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
A Prime Example CS Lecture 20 A positive integer p  2 is prime if the only positive integers that divide p are 1 and p itself. Positive integers.
Probabilistic Algorithms
Information Complexity Lower Bounds
Markov Chains Mixing Times Lecture 5
Streaming & sampling.
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
RS – Reed Solomon List Decoding.
The Curve Merger (Dvir & Widgerson, 2008)
CS21 Decidability and Tractability
CS151 Complexity Theory Lecture 7 April 23, 2019.
Presentation transcript:

Randomization Carmella Kroitoru Seminar on Communication Complexity

What ’ s new? Until now, Alice and Bob were all powerful, but deterministic. But what if Alice and Bob could act in a randomized fashion? What’s randomized fashion?? Well, just flipping a coin, for instance..

What now?? Now players “toss coins”. The coins determine the protocol. The communication over an input (x, y) is not fixed anymore, it’s a random variable. And so is f(x,y).

So, How can we define the success of the protocol? The conservative way - Las Vegas protocols Success = the protocol always gives the right f(x,y). The liberal option – Monte Carlo protocols Success = a high probability that the protocol will give the right f(x,y).

Where do we get all the coins? RandomizedDeterministic Input Additional Information Output Dependence Denote – r(I) is a pseudo random string of arbitrary length Alice gets x Bob gets y Alice gets x Bob gets y Alice has r(A) Bob has r(B) F(x,y) depends only on x and y F(x,y) depends on x, y, r(A), r(B)

So how does the decision tree looks?? Before Randomization After Randomization

But What if we get the ‘ wrong ’ r(I)? Does it mean the f(x,y) will be wrong? Yes! For The same input (x,y), f(x,y) will differ, depending on r(A) and r(B). So how will we get the right answer? Through the magic of probability.

Definitions: P – a randomized protocol Zero error – for every (x,y) Pr [P(x,y) = f(x,y)] = 1 ε error – for every (x,y) Pr [P(x,y) = f(x,y)] ≥ 1- ε One sided ε error - for every (x,y) if f(x,y) = 0 then Pr [P(x,y) = 0] = 1 if f(x,y) = 1 then Pr [P(x,y) = 1] ≥ 1- ε

Now f(x,y) can vary depending on r(A) and r(B) What about D(f)?? Is it constant for some (x,y) input? No! So how will we measure the running time?

Running time – first method The worst case running time of a randomized protocol p is The worst case cost of p is

Running time – second method The average case running time of a randomized protocol p is The average case cost of p is

Average? r(A) and r(B) are chosen independently, according to some probability distribution. Should we consider the distribution of (x,y)? No ‘average input’!

Now what ’ s all that ‘ probability ’ talk? Didn ’ t I pass that course already? Let’s do a quick review..

Binomial Distribution A binomial experiment, also known as Bernoulli trial, is a statistical experiment with the properties: n Independent trials – y i ’s. Success or failure (denoted 0/1, T/F) Probability of success, p, is the same on every trial. Denote: S – number of successes. Example: coin toss. Success = ‘Heads’. ynyn ….. y3y3 y2y2 y1y1 trial S = ∑y i 0….. 110result

Exercise: Given a box with red and blue ball, suppose red balls are at least δ fraction of total. Probability that draws don’t see any red balls? But what if 1/3 isn’t good enough? What if we want <α? Probability that draws don’t see any red balls?

Expectation Linearity  E( a ) = a  E( a*X + b) = a*E(X) + b  E( X+Y ) = E(X) + E(Y) Non-multiplicativity E( X*Y ) = E(X) * E(Y) only if X and Y independent

What is the probability that a random variable deviates from its expectation? Focus on sums/averages of n bounded variables: Note: can get similar bounds for other bounded variables

Don ’ t know anything about the y i ’ s : If y i is positive, but we don’t know anything else, then we can use Markov’s inequality to bound S: Some intuition - no more than 1/5th of the population can have more than 5 times the average income Example: – toss 100 coins, Pr (# of heads ≥ 60) ≤ 5/6 (E(Y) = 50, t = 6/5) So without knowing if y i is bound, or if the y i ‘s are independent we got an upper bound. Although it’s not a great bound, and we don’t know the lower bound.

Bounding S=Σy i when y i ‘ s are independent

An exact result: Using the Cumulative binomial probability. Refers to the probability that the binomial random variable falls within a specified range Example: toss 100 coins, Pr (# of heads ≥ 60) = ? Solution: To solve this problem, we compute 40 individual probabilities, using the binomial formula. The sum of all these probabilities is the answer we seek. Denote: k = # of heads Pr(k ≥ 60; n = 100, p = 0.5) = Pr(k = 60; 100, 0.5) + Pr(k = 61; 100, 0.5) Pr(k = 100; 100, 0.5) Pr(k ≥ 60; n = 100, p = 0.5) = 0.028

More generally: We can use Chernoff’s inequality to bound S: Then for any δ > 0 Example: toss 100 coins independently Pr(# of heads ≥ 60) ≤ (t = 1/5) Dramatically better bounds then Markov Worse then Cumulative Probability but much easier and works for more cases. Bounds for all S (bigger or smaller then E) Back to running time

3 types of errors - 3 complexity measures (1) Let f : x × y → {0,1} Lets define complexity measures for a randomized protocol p: is the minimum average case cost of p, that computes f with zero error.

3 types of errors - 3 complexity measures (2) For 0< ε <1, is the minimum worst case cost of p that computes f with error ε. For 0< ε <1, is the minimum worst case cost of p that computes f with one sided error ε.

Wait! What’s the meaning of ‘average’ for f with zero error? And the meaning of ‘worst’ for f with error?

Worst case = Θ (Average case) Reminder: D AVG (f) – the average running time over all random vectors r(A), r(B), maxed over all possible inputs x,y. A protocol ‘A’ makes an error ε/2 and D AVG (A) = t Define A’: ~ Execute A as long as at most 2t/ ε bits are exchanged ~ If A finished, return it’s answer ~ Else return 0 And #{bits exchenged} of A’ in the worst case is D WORST (A’) = 2t/ε So we found k= 2/ε s.t. D AVG (A) = k * D WORST (A’)

But is A and A ’ have the same errors? A has error of ε/2. What’s the error of A’?  A’ can return the answer A output. That has a chance of ε/2 to be wrong.  Or A’ can output 0 because A wasn’t done after 2t/ ε bits. What’s the chance of that? We’ll use Markov: Pr[A exchanged more then 2t/ ε bits ] = Pr[ #{bits exchenged in A}> (2/ ε)*t ] ≤ 1/(2/ ε ) = ε/2 So A’ has error of at most ε/2 + ε/2 = ε, meaning 2*A error = A’ error, both one sided.

What if ε = 0? For zero error protocols, using the worst case cost gives exactly the deterministic communication complexity. How come? A deterministic protocol can just fix some r(A) and r(B) and proceed. So for zero error protocols, we only care about the average case cost.

Exercise (1): A: Calculate f(x, y) for some fixed x,y, with some protocol with one sided error ε. A’: Execute A t times. Denote: ~ f i – result of execution #i ~ if for some i f i =1 then res( A ’ )= 1, else res( A ’ )= 0. Pr [res( A ’ ) ≠ f(x, y)] = ? Solution: If res( A ’ ) = 1, that’s the right answer, 100% (by definition). If res( A ’ ) = 0, it might be the right result, or we got the wrong result for t rounds. What’s the chance of that? Pr[ mistake in A] < ε. Pr[ mistake in A’] < ε^t.

Exercise (2): A: Calculate f(x, y) for some fixed x,y, with some protocol with error ε. A’: Execute A t times. Denote: f i – result of execution #i res( A ’ ) = maj {f i } What’s the possibility that A’ didn’t get a result we trust? Solution(1): Define a Bernoulli trial: y i = 1 if f i ≠ f(x, y). To get res( A ’ ) wrong we need to get more than half y i ‘s wrong. E[∑ y i ] = E[S] = εt. So what’s the chance of S > t/2 ? Hint: Use Chernoff for that.

Solution (2): Let’s fix ε ≤ ¼ (can be generalized for ε < ½) and take δ = 1. What if we want to bound the possibility of error with α smaller than that? Then we need to take t = 12*ln[1/ α] So the error probability can be reduced if we’ll take bigger t, meaning, enlarge the communication complexity by a small penalty.

Field A non-empty set F with two binary operation + (addition) and * (multiplication) is called a field if it has: Closure of F under + and * Associativity of + and * Commutativity of + and * Distributivity of * over + + and * identity (0, 1) + and * inverses examples: rational numbers, GF[7] = {0, 1, 2, 3, 4, 5, 6}

Example: EQ Denote: Alice’s input A = a 0 a 1 …a n-1 Bob’s input B = b 0 b 1 …b n-1 Let’s think of these inputs as polynomials over GF[p] where n² < p < 2n², p is prime. That is A(x) = a 0 + a 1 x + a 2 x² + … + a n-1 x^(n-1) mod p and B(x) = b 0 + b 1 x + b 2 x² + … + b n-1 x^(n-1) mod p Alice picks at random t in GF[p] and sends Bob both t and A(t). Bob outputs 1 if A(t) = B(t) and 0 otherwise. #{bits exchanged} = O(log p) = O(log n)

Correctness Note that if A=B then A(t) = B(t) for all t, so f(A,B) = 1. If A≠B then we have 2 distinct polynomials of degree n-1. Such polynomials can be equal on at most n-1 (out of p) elements of the field (since their difference is a non zero polynomial of degree ≤ n-1, and has at most n-1 roots). Hence the probability of error is at most (n-1)/p ≤ n/n² = 1/n So we have shown that R(EQ) = O(log n), and in fact and In contrast to D(NE) = D(EQ) = n+1

Exercise Prove that the following protocol for EQ achieves similar performance: Alice and Bob view their inputs A and B as n-bit integers (between 1 and 2^n). Alice chooses a random prime number p between the n first primes. (If n = 3 then p is 2, 3 or 5) She sends p and (A mod p) to Bob. Bob outputs 1 if A mod p = B mod p, and 0 otherwise.

Solution First note that if A = B, then of course A mod p = B mod p. If A ≠ B and Bob accepts, it means that A = B (mod p). Define BAD A,B = {p | p is in the first n primes and A = B (mod p)} BAD 5,8 = {3} Claim (without proof): | BAD A,B | ≤ ½n. Then the probability that B accepts is Can repeat this for 100 times to get error of 2^(-100). So we got since max {first n primes} < n^3 and O(log n) = O(log n^3)

So how much better are bounds of randomization protocols then deterministic ones? Lemma: We will prove a more delicate statement:

Proof 1: Let’s present a deterministic simulation of a given randomized protocol p. For each leaf l of the protocol p, Alice will send Bob p(A, l) – the probability that given x, she will respond in a way leading to the l. Bob will compute p(B, l) - the probability that given the y, he will respond in a way leading to the l. Bob will then compute p(l) = p(A, l) * p(B, l) – the probability to reach l.

Proof 2: Bob and Alice will do this for every leaf, thus calculating all leafs. Bob will check which of the values has probability 1-ε, and that is the right f(x,y). What’s the problem?? We need to send probability values, that means real numbers. But we can use precision of bits.

Proof 3: This guarantees deviation in values of at most This implies that p(l), is at most far from the true p(l) (since p(B, l) ≤ 1). So the total error over all the leaves is at most ½-ε. Therefore Bob only needs to check which of the values (0 or 1) has probability of more then ½ and that is the correct f(x,y) value. Q.E.D.