Hebrew University of Jerusalem,

Slides:

Advertisements

Similar presentations

Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.

Advertisements

How to Solve Longstanding Open Problems In Quantum Computing Using Only Fourier Analysis Scott Aaronson (MIT) For those who hate quantum: The open problems.

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.

LEARNIN HE UNIFORM UNDER DISTRIBUTION – Toward DNF – Ryan ODonnell Microsoft Research January, 2006.

Ulams Game and Universal Communications Using Feedback Ofer Shayevitz June 2006.

Subhash Khot IAS Elchanan Mossel UC Berkeley Guy Kindler DIMACS Ryan O’Donnell IAS.

Inapproximability of MAX-CUT Khot,Kindler,Mossel and O ’ Donnell Moshe Ben Nehemia June 05.

Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.

Random Variables ECE460 Spring, 2012.

1 Social choice: Information, power, indeterminacy and chaos Gil Kalai, Hebrew University of Jerusalem HU Economics summer school 2007 Center for rationality.

Learning Juntas Elchanan Mossel UC Berkeley Ryan O’Donnell MIT Rocco Servedio Harvard.

1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista.

Chapter 7 Sampling and Sampling Distributions

Evaluating Hypotheses

1 Noise Sensitivity – The case of Percolation Gil Kalai Institute of Mathematics Hebrew University HU – HEP seminar, 25 April 2007.

Part III: Inference Topic 6 Sampling and Sampling Distributions

Lecture 20: April 12 Introduction to Randomized Algorithms and the Probabilistic Method.

Information Theory and Security

Inferential Statistics

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.

Random Sampling, Point Estimation and Maximum Likelihood.

Ryan O’Donnell Carnegie Mellon University. Part 1: A. Fourier expansion basics B. Concepts: Bias, Influences, Noise Sensitivity C. Kalai’s proof of Arrow’s.

Theory of Probability Statistics for Business and Economics.

Primer on Fourier Analysis Dana Moshkovitz Princeton University and The Institute for Advanced Study.

1 Noise Sensitivity and Noise Stability Gil Kalai Hebrew University of Jeusalem and Yale University NY-Chicago-TA-Barcelona

1 Elections and Manipulations: Ehud Friedgut, Gil Kalai, and Noam Nisan Hebrew University of Jerusalem and EF: U. of Toronto, GK: Yale University, NN:

Analysis of Boolean Functions and Complexity Theory Economics Combinatorics …

Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.

Manipulating the Quota in Weighted Voting Games (M. Zuckerman, P. Faliszewski, Y. Bachrach, and E. Elkind) ‏ Presented by: Sen Li Software Technologies.

Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.

Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.

Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.

Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.

Analysis of Boolean Functions and Complexity Theory Economics Combinatorics …

Stats 242.3(02) Statistical Theory and Methodology.

Estimating standard error using bootstrap

Random Access Codes and a Hypercontractive Inequality for

Chapter 7. Classification and Prediction

EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005

From Classical Proof Theory to P vs. NP

12. Principles of Parameter Estimation

Introduction to Randomized Algorithms and the Probabilistic Method

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

Gil Kalai Einstein Institute of Mathematics

Distribution of the Sample Means

Circuit Lower Bounds A combinatorial approach to P vs NP

Chapter 4: The Nature of Regression Analysis

Noise stability of functions with low influences:

Tight Fourier Tails for AC0 Circuits

Statistical Methods For Engineers

4. The Postulates of Quantum Mechanics 4A. Revisiting Representations

Analysis and design of algorithm

Linear sketching with parities

Discrete Event Simulation - 4

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Linear sketching over

10701 / Machine Learning Today: - Cross validation,

Linear sketching with parities

CS21 Decidability and Tractability

CS151 Complexity Theory Lecture 7 April 23, 2019.

The Selection Problem.

Chapter 4: The Nature of Regression Analysis

12. Principles of Parameter Estimation

16. Mean Square Estimation

Oracle Separation of BQP and PH

Sparse Kindler-Safra Theorem via agreement theorems

Chapter 5: Sampling Distributions

Oracle Separation of BQP and PH

Presentation transcript:

Hebrew University of Jerusalem, Influence and Noise Gil Kalai Hebrew University of Jerusalem, Yale University Microsoft R&D, Israel ICS2011, Beijing January 2011

Part I: Influence

Cause When does event A cause event B? Example (Kira Radinsky’s automatic system for making deductions based on internet searches): Earthquake causes Tsunami Cabbage grew causes Linux release 3

Causality When does event A cause event B? Central problem in philosophy, law, economics, physics, statistics, CS… (Examples are often somber) Two people try to assassin a third person who plan a trip to the desert. One puts poison in his jar, the other empties it. A person throws a baby from a tall building, another is waiting with a sharp sword. 4

Influence The word “influence” (dating back, according to Merriam-Webster dictionary, to the 14th century) is close to the word “fluid”. The original definition of influence is: “an ethereal fluid held to flow from the stars and to affect the actions of humans.” The modern meaning (according to Wictionary) is: ”The power to affect, control or manipulate something or someone.” 5 5

Influence What is the influence an event A has on another event B? Can be regarded as an approach to causality and also as a generalization. 6 6

Influence (Of a variable on a function of many variables) The “amount” that changing the value of a variable will change the value of the function. 7 7

Boolean Functions We consider a BOOLEAN FUNCTION f :{-1,1}n  {-1,1} f(x1 ,x2,...,xn) It is convenient to regard {-1,1}n as a probability space with the uniform probability distribution. 8 8

Influence We consider a BOOLEAN FUNCTION f :{-1,1}n  {-1,1} The influence of the kth variable xk on f, denoted by Ik(f) is the probability that flipping the value of the kth variable will flip the value of f. I(f) is the sum of all individual influences.

Examples 1) Dictatorship f(x1 ,x2,...,xn) =x1 Ik(f) = 0 for k>1 I1(f)=1 2) Majority f(x1 ,x2,...,xn) =1 iff x1 + x2+...+xn > 0 Ik(f) behaves like n-1/2 for every k.

Critical Percolation 11 11

Examples (cont.) 3) The crossing event for percolations For percolation, every hexagon corresponds to a variable. xi =-1 if the hexagon is white and xi =1 if it is grey. f=1 if there is a left to right grey crossing. Ik(f) behaves like n-3/8 for every k but few.

Examples (cont.) 4) Recursive majority of threes Ik(f) behaves like n-log3 for every k. 5) Ben-Or Linial TRIBE example Divide the variables to tribes of size logn-loglogn+loglog2 f=1 iff for some tribe all variables equal to 1 Ik(f) behaves like Klogn/n for some constant K. 13 13

KKL theorem: There always exists an influential variable Theorem (Kahn, Kalai, Linial 1988) Let f be a Boolean function and suppose that Prob(f=1)=s Then there is a variable k such that Ik(f) > C s (1-s) log n / n This result was a conjecture by Ben-Or and Linial 14 14

KKL theorem 15 15

Fourier Given a Boolean function f :{-1,1}n  {-1,1}, the Fourier expansion of f is simply writing f(x) as a sum of multilinear (square free) monomials. Write formulas on whiteboard 16 16

Fourier Spectrum For every set S of variables we have the associated Fourier coefficient. The sum of squares of Fourier coefficients is 1. This defines a probability distribution called the “Fourier spectrum” (or “Fourier distribution”). The probability that k belongs to S, when S is distributed according to the Fourier spectrum is the influence of variable k on f. Write formulas on whiteboard 17 17

Fourier The study of Boolean functions based on their Fourier expansion is fruitful. It can be regarded as a very special case of spectral methods in graph theory. Write formulas on whiteboard

Hypercontractivity A very useful technical tool: The ratio between p-norms of low degree polynomials is bounded. (Khintchine , Nelson, Bonami, Gross, Beckner… ) Write formulas on whiteboard

A glance at further advances and open problems Extensions to general Bernoulli product spaces. Notions of influence for continuous product spaces; larger alphabets; graph products. Symmetry and influence Understanding influence of sets of variables; Power-laws for influence Write formulas on whiteboard 20 20

Choice, power, rationality and manipulation Part II Choice, power, rationality and manipulation 21 21

Individual and collective rationality and judgments Boolean functions can model how individual preferences between two alternatives aggregate. We can consider aggregation of individual preferences between m alternatives. (Social welfare functions.) We can consider aggregation of judgments on r different binary questions, when there are certain consistency requirements. (Judgment aggregation.)

Elections: measures of power Influence= Banzhaf power index Shapley-Shubik power index: Integral over p of the influence of the kth player with respect to the Bernoulli probability with parameter p. (Sum to 1.) Write formulas on whiteboard 23 23

Voting Paradoxes Condorcet’s paradox: Arrow’s paradox: “Doctrinal paradox”: Write formulas on whiteboard 24 24

Manipulation A social choice function is a function from the profile of individual order relations to the set of alternatives. Manipulation: reporting an incorrect preference relation will improve the outcome.

A measure for manipulation The Gibbard-Satterthwaite theorem asserts for a non dictatorial choice function with at least 3 possible outcomes there are preferences that leads to manipulation. The manipulation power (Friedgut, Kalai and Nisan) of an individual k for a social choice function f, denoted by Mk(f) is the probability that x’k is a profitable manipulation for voter k when the profile of preferences x1 x2 ,…, xn and x’k are chosen uniformly at random.

Conclusion of the Algorithmic GT/Econ Part The notions discussed in this lecture (measures for influence, power, manipulation, noise sensitivity…) may be of interest to other GT/econ models. For example, the model of exchange economy. Off-topic comment: Why is it rational and important to give incentives to difficult technical works. Added in proof: There is a work supporting this thought by Kleinberg and Oren. And the applied/pure tension and debate.

Part III Randomness 28 28

Collective coin flipping We need to create a random bit using a protocol based on random bits contributed by n processors. Some of the processors are malicious. A simple suggestion: Choice based on a Boolean function where each processor contributes a single bit. (An often asked question: why not choose among these bits at random?) 29

Randomness as a computational resource “So, yes, I know that the theory folks consider derandomization an open problem, but from my perspective, it is a solved problem for all practical purposes.” AnonCSProf on Shtetl-Optimized 30

What is Randomness? Q: What allows much simpler answers in statistics? What is randomness and what is probability are fundamental question in many areas. Computational complexity offers a deep understanding to randomness. Its asymptotic nature makes it of little use in statistics. Q: What allows much simpler answers in statistics? A: The interactive proof system known as ``statistical hypothesis testing''. 31

What is the source of randomness What is the source of randomness? Is human uncertainty the only source of randomness? What is the explanation of the apparent randomness of high-level phenomena in nature? For example the distribution of females vs. males in a population (I am referring to randomness in terms of the unpredictability and not in the sense of it necessarily having to be evenly distributed). 1. Is it accepted that these phenomena are not really random, meaning that given enough information one could predict it? If so isn't that the case for all random phenomena? 2. If there is true randomness and the outcome cannot be predicted - what is the origin of that randomness? (is it a result of the randomness in the micro world - quantum phenomena etc...) 32

Sharp threshold phenomena: Determinism from randomness Law of large numbers: Large stochastic systems behave deterministically. Sharp threshold phenomena: Choose the value of the variables to be 1 with probability p, independently. The value of f will rapidly change from 0 to 1. 33

Sharp threshold phenomena: Influence is a sort of derivative. Large total influence corresponds to small threshold interval. Theorem (Friedgut Kalai, 96): Symmetry implies sharp threshold Theorem (Kalai, 04): Sharp threshold is equivalent to diminishing maximum Shapley-Shubik power index. The economics term: complete aggregation of information. 34

Influence without independence (Haggstrom, Kalai, Mossel; Graham, Grimmett.) Influence version A: The probability that changing the value of a variable will change the value of the function. Influence version B: The (normalized) correlation between the value of the variable and the value of the function. Write formulas on whiteboard

Part IV Noise 36 36

Noise Sensitivity We consider a BOOLEAN FUNCTION f :{-1,1}n  {-1,1} f(x1 ,x2,...,xn) Given x1 ,x2,...,xn we define y1 ,y2,...,yn as follows: xi = yi with probability 1-t xi = -yi with probability t

Noise Sensitivity Let C(f;t) be the correlation between f(x1 , x2,...,xn) and f(y1,y2,...,yn) A sequence of Boolean function (fn ) is noise-sensitive if for every t>0, C(fn,t) tends to zero with n.

Noise Stability Lets fix a small s=0.0001(say). A Boolean function is noise stable to noise level t if the probability that f(x1 , x2,...,xn) is different from f(y1,y2,...,yn) is smaller than s.

Noise sensitivity, and non-classical stochastic processes; black noise Closely related notions to “noise sensitivity” were studied by Tsirelson and Vershik . In their terminology “noise sensitivity” translates to “non Fock processes”, “black noise”, and “non-classical stochastic processes”. Their motivation is closer to mathematical quantum physics. 40 40

BKS theorem Theorem (Benjamini, Kalai, Schramm 1999) Monotone balanced Boolean functions are noise sensitive unless they have substantial correlation with some weighted majority functions. 41 41

Percolation is Noise sensitive Corollary [BKS, 1999]: The crossing event for critical planar percolation model is noise- sensitive Theorem (Schramm and Smirnov, 2010): Percolation is a 2-dimensional black noise.

Percolation is Noise sensitive Imagine two separate pictures of n by n hexagonal models for percolation. A hexagon is grey with probability ½. If the grey and white hexagons are independent in the two pictures the probability for crossing in both is ¼. If for each hexagon the correlation between its colors in the two pictures is 0.99, still the probability for crossing in both pictures is very close to ¼ as n grows! If you put one drawing on top of the other you will hardly notice a difference!

Other cases of noise sensitivity First Passage Percolation (Benjamini, Kalai, Schramm) The recursive Majority on threes example by Ben-Or and Linial (BKS) Eigenvalues of random Gaussian matrices (Essentially follows from the work of Tracy-Widom) Here, we leave the Boolean setting. Examples related to random walks (required replacing the discrete cube by trees) and more... The eigenvalues examples was further studied by Ofer Zeitouni and me with Itai Benjamini, Gadi Kozma and Elchanan Mossel.

Majority is noise stable Sheppard Theorem: (1899): Suppose that there is a probability t for a mistake in counting each vote. The probability that the outcome of the election are reversed is: arccos(1-t)/π +o(1) When t is small this behaves like t1/2

Majority is noise stable (cont.) Weighted majority functions are also noise stable (BKS, Peres) Is there a more stable voting rule? Sure! dictatorship

Majority is stablest! Theorem: Mossel, O’Donnell and Oleszkiewicz : (2005) Let (fn ) be a sequence of Boolean functions with diminishing maximum influence. I.e., limn->∞ maxk Ik(fn) -> 0 Then the probability that the outcome of the election are reversed when for every vote there is a probability t it is flipped is at least (1-o(1)) arccos(1-t)/π

Majority is stablest: Two applications The probabilities of cyclic outcomes for voting rules with diminishing influences are minimized for the majority voting rule. Improving the Goemans-Williamson 0.878567 approximation algorithm is hard, unique-game-hard.

Is the universe noise sensitive? Are the basic models of high energy physics noise stable? If this is indeed the case, does it reflect some law of physics? Otherwise, will noise sensitivity allow additional modeling power? 49

Part V: Computational complexity Are the notions discussed here related to computational complexity? (We already mentioned relation to PCP; there are some interesting connection to randomized decision trees.) But, there is more to computation than computational complexity. (And also there are various notions of complexity which are not based on computation.) 50

Diversion: CC and modeling But, there is more to computation than computational complexity. (And also there are various notions of complexity which are not based on computation.)

Computational complexity and Influence (… Ajtai, Furst, Saxe, Sipser, Yao, Hastad, Boppana, Linial,Mansour,Nisan…) The total influence of Boolean functions that can be described by depth D size M Boolean circuits is at most log MD-1 52

Computational complexity and Noise stability Conjecture 1: Let f be a monotone Boolean function described by monotone threshold circuits of size M and depth D. Then f is stable to (1/t)-noise where t=log M100D Conjecture 2: For some η >0, every balanced monotone functions in TC0 have correlation at least η with a function in monotone TC0. 53

Complexity of sampling the Fourier spectrum Suppose that f is a Boolean function in P. Can we approximately sample according to its Fourier spectrum? This is unknown and it might be hard. But... It is in BQP. (Namely, it is known to be easy for quantum computers. ) 54

Computation with noise Fault tolerant (quantum) computation. Are quantum computers possible? (This is a main research interest for me in recent years.) The hope regarding FTQC: No matter what the quantum computer computes or simulates, nearly all of the noise will be a mixture of states that are not codewords in the error correcting code, but which are correctable to states in the code. The concern: The process for creating a quantum error correcting code will necessarily lead to a mixture of the desired codeword with undesired codewords. 55

Polymath3 A recent endeavor: you are most welcome to join Details can be found on my blog http://gilkalai.wordpress.com/2010/02/10/noise-stability-and-threshold-circuits/ 1) The AC0 analog 2) Positivity Vs Monotonicity 3) Natural Proof Obstructions 56

谢谢 Thank you! 谢谢 תודה רבה! It is great to be in Beijing! 57