Yi Wu (CMU) Joint work with Parikshit Gopalan (MSR SVC) Ryan O’Donnell (CMU) David Zuckerman (UT Austin) Pseudorandom Generators for Halfspaces TexPoint.

Slides:

Advertisements

Similar presentations

Estimating Distinct Elements, Optimally

Advertisements

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

Pseudorandomness from Shrinkage David Zuckerman University of Texas at Austin Joint with Russell Impagliazzo and Raghu Meka.

Pseudorandom Generators for Polynomial Threshold Functions 1 Raghu Meka UT Austin (joint work with David Zuckerman)

Linear-Degree Extractors and the Inapproximability of Max Clique and Chromatic Number David Zuckerman University of Texas at Austin.

Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)

Analysis of Algorithms

Counting Algorithms for Knapsack and Related Problems 1 Raghu Meka (UT Austin, work done at MSR, SVC) Parikshit Gopalan (Microsoft Research, SVC) Adam.

“Devo verificare un’equivalenza polinomiale…Che fò? Fò dù conti” (Prof. G. Di Battista)

Randomized Algorithms Kyomin Jung KAIST Applied Algorithm Lab Jan 12, WSAC

Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.

Time vs Randomness a GITCS presentation February 13, 2012.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006

Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR) TexPoint fonts used in EMF. Read the TexPoint manual before.

Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR)

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo C&O Joint work with Isaac Fung TexPoint fonts used in EMF. Read.

CS151 Complexity Theory Lecture 7 April 20, 2004.

An Algorithm for Polytope Decomposition and Exact Computation of Multiple Integrals.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006

Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005

Derandomization: New Results and Applications Emanuele Viola Harvard University March 2006.

1 Introduction to Linear and Integer Programming Lecture 9: Feb 14.

Introduction to Linear and Integer Programming Lecture 7: Feb 1.

Arithmetic Hardness vs. Randomness Valentine Kabanets SFU.

1 Streaming Computation of Combinatorial Objects Ziv Bar-Yossef U.C. Berkeley Omer Reingold AT&T Labs – Research Ronen.

Lattices for Distributed Source Coding - Reconstruction of a Linear function of Jointly Gaussian Sources -D. Krithivasan and S. Sandeep Pradhan - University.

Chapter 11: Limitations of Algorithmic Power

Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.

Lecture 20: April 12 Introduction to Randomized Algorithms and the Probabilistic Method.

Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.

Online Learning Algorithms

Chapter 11 Limitations of Algorithm Power. Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples:

Computational Complexity Polynomial time O(n k ) input size n, k constant Tractable problems solvable in polynomial time(Opposite Intractable) Ex: sorting,

Pseudorandom Generators for Combinatorial Shapes 1 Parikshit Gopalan, MSR SVC Raghu Meka, UT Austin Omer Reingold, MSR SVC David Zuckerman, UT Austin.

Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.

Quantum Computing MAS 725 Hartmut Klauck NTU TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A.

Why Extractors? … Extractors, and the closely related “Dispersers”, exhibit some of the most “random-like” properties of explicitly constructed combinatorial.

Randomized and De-Randomized Algorithms Jeff Kinne, Indiana State University Slides online at kinnejeff.comkinnejeff.com.

1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.

Umans Complexity Theory Lectures Lecture 1a: Problems and Languages.

Parallel computation Section 10.5 Giorgi Japaridze Theory of Computability.

Amplification and Derandomization Without Slowdown Dana Moshkovitz MIT Joint work with Ofer Grossman (MIT)

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Quantum Computing MAS 725 Hartmut Klauck NTU

Optimization/Decision Problems Optimization Problems – An optimization problem is one which asks, “What is the optimal solution to problem X?” – Examples:

Pseudo-random generators Talk for Amnon ’ s seminar.

Technion Haifa Research Labs Israel Institute of Technology Underapproximation for Model-Checking Based on Random Cryptographic Constructions Arie Matsliah.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Pseudorandomness: New Results and Applications Emanuele Viola IAS April 2007.

Complexity Theory and Explicit Constructions of Ramsey Graphs Rahul Santhanam University of Edinburgh.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Computational Complexity Theory

Lap Chi Lau we will only use slides 4 to 19

Introduction to Randomized Algorithms and the Probabilistic Method

New Characterizations in Turnstile Streams with Applications

Topics in Algorithms Lap Chi Lau.

Fooling intersections of low-weight halfspaces

Pseudorandomness when the odds are against you

Complexity of Expander-Based Reasoning and the Power of Monotone Proofs Sam Buss (UCSD), Valentine Kabanets (SFU), Antonina Kolokolova.

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Pseudo-derandomizing learning and approximation

Linear sketching with parities

Linear sketching over

Linear sketching with parities

Chapter 11 Limitations of Algorithm Power

Presentation transcript:

Yi Wu (CMU) Joint work with Parikshit Gopalan (MSR SVC) Ryan O’Donnell (CMU) David Zuckerman (UT Austin) Pseudorandom Generators for Halfspaces TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AAAA

Outline Introduction Pseudorandom Generators Halfspaces Pseudorandom Generators for Halfspaces Our Result Proof Conclusion 2

Deterministic Algorithm Program InputOutput The algorithm deterministically outputs the correct result. 3

Randomized Algorithm Program Input Output Random Bits. The algorithm outputs the correct result with high probability. 4

Randomized Algorithms Primality testing ST-connectivity Order statistics Searching Polynomial and matrix identity verification Interactive proof systems Faster algorithms for linear programming Rounding linear program solutions to integer Minimum spanning trees shortest paths minimum cuts Counting and enumeration Matrix permanent Counting combinatorial structures Primality testing ST-connectivity Order statistics Searching Polynomial and matrix identity verification Interactive proof systems Faster algorithms for linear programming Rounding linear program solutions to integer Minimum spanning trees shortest paths minimum cuts Counting and enumeration Matrix permanent Counting combinatorial structures Primality testing ST-connectivity Order statistics Searching Polynomial and matrix identity verification Interactive proof systems Faster algorithms for linear programming Rounding linear program solutions to integer Minimum spanning trees shortest paths minimum cuts Counting and enumeration Matrix permanent Counting combinatorial structures Primality testing ST-connectivity Order statistics Searching Polynomial and matrix identity verification Interactive proof systems Faster algorithms for linear programming Rounding linear program solutions to integer Minimum spanning trees shortest paths minimum cuts Counting and enumeration Matrix permanent Counting combinatorial structures 5

Is Randomness Necessary? Open Problem: Can we simulate every randomized polynomial time algorithm by a deterministic polynomial time algorithm (the “BPP P” cojecture)? Derandomization of randomized algorithms. Primality testing [AKS] ST-connectivity [Reingold] Quadratic residues [?] 6

How to generate randomness? Question: How togenerate randomness for every randomized algorithm? Simpler Question: How to generate “pseudorandomness” for some class of programs? 7

Pseudorandom Generator (PRG) PRG n random bit k<<n random bit Yes /No Both program Answer Yes/No with almost the same probability n “pseudorandom” bit Program Input Program 8

Why study PRGs? Algorithmic Applications When k = log (n), we can derandomize the algorithm in polynomial time. Streaming Algorithm. Complexity Theoretic Implications Lower Bound of Circuit Class. Learning Theory. 9

PRG for Classes of Program Space Bounded Program [Nis92] Constant-depth circuits [Nis91, Baz07, Bra09] Halfspaces [DGJSV09, MZ09] 10

Outline Introduction Pseudorandom Generators Halfspaces Pseudorandom Generators for Halfspaces Our Result Proof Conclusion 11

Halfspaces Halfspaces: Boolean functions h:R n → {-1,1} of the form h(x) = sgn(w 1 x 1 +…+w n x n - θ ) where w 1,…, w n, θ  R Well-studied in complexity theory Widely used in Machine Learning: Perceptron, Winnow, boosting, Support Vector Machines, Lasso, Liner Regression. 12

Product Distribution For halfspace h(x), x is sampled from some product distribution; i.e., each x i is independently sampled from distribution D i. For example, each D i can be 1. Uniform distribution on {-1,1} 2. Uniform distribution on [-1,1] 3. Gaussian Distribution 13

Index Introduction Pseudorandom Generators Halfspaces Pseudorandom Generators for Halfspaces Main Result Proof Conclusion 14

PRG for halfspaces h(x) = sign(w 1 x 1 +…+w n x n - θ ) PRG x 1, x 2 …x n from some product distribution k<<n random bit Yes/No Both program Answer Yes/No with almost the same probability Pseudorandom Variable x 1, x 2 …x n 15

Geometric Interpretation, PRG for uniform distribution over [-1,1] 2 16

Geometric Interpretation, PRG for uniform distribution over [-1,1] 2 Total Number of points = poly(dim) Number of points in the halfspace is proportional to area. 17

Application to Machine Learning How many testing points is it enough to estimate the accuracy of the N dimensional linear classifier? Good PRG implies we only need deterministically check the accuracy on a set of poly(N) points! 18

Other Theoretical Applications Discrepancy Set for Convex Polytopes Circuit Lower bound on functions of halfspaces Counting the Solution of Knapsacks 19

Outline Introduction Pseudorandom Generator Halfspace Pseudorandom Generators for Halfspaces Our Results Proof Conclusion 20

Previous Result [DiGoJaSeVi,MeZu] PRG For Halfspace over uniform distribution on boolean cube ({-1,1} n ) with seed length O(log n). 21

Our Results: Arbitrary Product Distributions PRG for halfspaces under arbitrary product distribution over R n with the same seed length. Only requirement: E[x i 4 ] is a constant. 1. Gaussian Distribution 2. Uniform distribution on the solid cube. 3. Uniform distribution on the hypercube. 4. Biased distribution on the hypercube. 5. Almost any “natural distribution” 22

Our Results Functions of k-Halfspaces PRG for the intersections of k-halfspaces with seed length k log (n). PRG for arbitrary functions of k-halfspaces with seed length k 2 log (n). 23

Outline Introduction Pseudorandom Generator Halfspace Pseudorandom Generators for Halfspaces Our Result Proof Conclusion 24

Key Observation: Dichotomy of Halfspaces Under product distributions, every halfspace is close to one of the following: “Dictator” (halfspaces depending on very few variables, e.g. f(x) = sgn(x 1 )) “majority”(no variables has too much weight, e.g. f(x) = sgn(x 1 +x 2 +x 3 +…+x n ). 25

Dichotomy of weight distribution Weights are stable after certain index. Weights decreasing fast (Geometrically) 26

Weights Decrease fast (Geometrically) Intuition: for sign(2 n x n-1 x n-2 x 3 +…x n ) If each x i is from {-1,1}, it is just sign(x 1 ). 27

Weights are stable Intuition: for sign(100 x 1 + x 2 + x 3 +…x n ) Then by for every fixing of x 1, it is a majority on the rest of the variables. 28

Our PRG for Halfspace (Rough) 1. Randomly hashing all the coordinate into groups. 2. Use 4-wise independent distribution within each group. If it is “Dictator-like”: All the important variables are in different groups. If it is “Majority-like” (x 1 + x x n ) is close to Gaussian. 4-wise independent Distribution (somehow) can handle Gaussian. 29

Outline Introduction Pseudorandom Generator Halfspace Pseudorandom Generators for Halfspaces Our Result Proof Conclusion 30

Conclusion We construct PRG for halfspaces under arbitrary product distribution and functions of k halfspaces with small seed length. Future Work Building PRG for larger classes of program; e.g., Polynomial Threshold function (SVM with polynomial kernel). 31

32