Topics in Algorithms 2007 Ramesh Hariharan. Random Projections.

Slides:



Advertisements
Similar presentations
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Advertisements

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
Today Composing transformations 3D Transformations
Topics in learning from high dimensional data and large scale machine learning Ata Kaban School of Computer Science University of Birmingham.
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
3D Geometry for Computer Graphics
UBI 516 Advanced Computer Graphics
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
15 MULTIPLE INTEGRALS.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
6. One-Dimensional Continuous Groups 6.1 The Rotation Group SO(2) 6.2 The Generator of SO(2) 6.3 Irreducible Representations of SO(2) 6.4 Invariant Integration.
Computer Graphics Recitation 5.
An Algorithm for Polytope Decomposition and Exact Computation of Multiple Integrals.
3D Geometry for Computer Graphics. 2 The plan today Least squares approach  General / Polynomial fitting  Linear systems of equations  Local polynomial.
Dimensionality Reduction and Embeddings
Dimensionality Reduction
Lesson 5 Representing Fields Geometrically. Class 13 Today, we will: review characteristics of field lines and contours learn more about electric field.
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
Chapter 9 Mathematical Preliminaries. Stirling’s Approximation Fig by trapezoid rule take antilogs Fig by midpoint formula take antilogs.
Polymers PART.2 Soft Condensed Matter Physics Dept. Phys., Tunghai Univ. C. T. Shih.
Data Basics. Data Matrix Many datasets can be represented as a data matrix. Rows corresponding to entities Columns represents attributes. N: size of the.
I. Isomorphisms II. Homomorphisms III. Computing Linear Maps IV. Matrix Operations V. Change of Basis VI. Projection Topics: Line of Best Fit Geometry.
Chapter 5: Path Planning Hadi Moradi. Motivation Need to choose a path for the end effector that avoids collisions and singularities Collisions are easy.
Orthogonality and Least Squares
MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.
Dimensionality Reduction
Computer Graphics Recitation The plan today Least squares approach  General / Polynomial fitting  Linear systems of equations  Local polynomial.
On Kernels, Margins, and Low- dimensional Mappings or Kernels versus features Nina Balcan CMU Avrim Blum CMU Santosh Vempala MIT.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Last lecture summary independent vectors x
Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.
Last lecture summary Fundamental system in linear algebra : system of linear equations Ax = b. nice case – n equations, n unknowns matrix notation row.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
SVD(Singular Value Decomposition) and Its Applications
Mathematical Fundamentals
Algorithms 2005 Ramesh Hariharan.
PATTERN RECOGNITION AND MACHINE LEARNING
Recap of linear algebra: vectors, matrices, transformations, … Background knowledge for 3DM Marc van Kreveld.
Mathematics for Computer Graphics (Appendix A) Won-Ki Jeong.
7.1 Scalars and vectors Scalar: a quantity specified by its magnitude, for example: temperature, time, mass, and density Chapter 7 Vector algebra Vector:
Mathematical Foundations Sections A-1 to A-5 Some of the material in these slides may have been adapted from university of Virginia, MIT and Åbo Akademi.
AN ORTHOGONAL PROJECTION
Kernels, Margins, and Low-dimensional Mappings [NIPS 2007 Workshop on TOPOLOGY LEARNING ] Maria-Florina Balcan, Avrim Blum, Santosh Vempala.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
13 th Nov Geometry of Graphs and It’s Applications Suijt P Gujar. Topics in Approximation Algorithms Instructor : T Kavitha.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Algorithms 2005 Ramesh Hariharan. Algebraic Methods.
Topics in Algorithms 2007 Ramesh Hariharan. Tree Embeddings.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
CS 551/651 Advanced Graphics Technical Background.
Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)
Topics in Algorithms 2005 The Turnpike Problem Ramesh Hariharan.
Chapter 61 Chapter 7 Review of Matrix Methods Including: Eigen Vectors, Eigen Values, Principle Components, Singular Value Decomposition.
Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.
Chapter 7 Rotational Motion and The Law of Gravity.
ELECTROMAGNETICS THEORY (SEE 2523).  An orthogonal system is one in which the coordinates are mutually perpendicular.  Examples of orthogonal coordinate.
Introduction to Vectors and Matrices
CA 302 Computer Graphics and Visual Programming
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
CSE 411 Computer Graphics Lecture #2 Mathematical Foundations
Feature space tansformation methods
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Introduction to Vectors and Matrices
Linear Algebra A gentle introduction
Sublinear Algorihms for Big Data
Translation in Homogeneous Coordinates
Presentation transcript:

Topics in Algorithms 2007 Ramesh Hariharan

Random Projections

Solving High Dimensional Problems How do we make the problem smaller? Sampling, Divide and Conquer, what else?

Projections Can n points in m dimensions be projected to d<<m dimensions while maintaining geometry (pairwise distances)? Johnson-Lindenstrauss: YES for d ~ log n/ε 2, each distance stretches/contracts by only an O(ε) factor So an algorithm with running time f(n,d) now takes f(n,log n) and results don’t change very much (hopefully!)

Which d dimensions? Any d coordinates? Random d coordinates? Random d dimensional subspace

Random Subspaces How is this defined/chosen computationally? How do we choose a random (spherically symmetric) line (1-d subspace)

Random Subspaces We need to choose m coordinates Choices: Independent Uniform distributions on cartesian coordinates Not defined: infinite ranges Not spherically symmetric: different axis segments of the same length map surface patches with different surface areas Independent Uniform distributions on polar coordinates Works in 2d Does not work in higher dimensions!! Why? for a 3d sphere, half and not one-third the area is located within 30 degrees

Random Subspaces What works? Normal distribution on cartesian coordinates Choose independent random variables X 1 …X m each N(0,1)

Why do Normals work? Take 2d: Which points on the circle are more likely? e -x 2 /2 dx X e -y 2 /2 dy = e -x 2 /2-y 2 /2 dydx Distribution is spherically symmetric… dx dy

A random d-dim subspace How do we extend to d dimensions? Choose d random vectors Choose independent random variables X i 1 …X i m each N(0,1), i=1..d

Distance preservation There are n C 2 distances What happens to each after projection? What happens to one after projection; consider single unit vector along x axis Length of projection sqrt [( X 1 1 / l 1 ) 2 + … + ( X d 1 / l d ) 2 ]

Orthogonality Not exactly The random vectors aren’t orthogonal How far away from orthogonal are they? In high dimensions, they are nearly orthogonal!!

Assume Orthogonality Assume orthogonality for the moment How do we determine bounds on the distribution of the projection length sqrt [( X 1 1 / l 1 ) 2 + … + ( X d 1 / l d ) 2 ] Expected value of [( X 1 1 ) 2 + … + ( X d 1 ) 2 ] is d (by linearity of expectation) Expected value of each l i 2 is m (by linearity of expectation) Roughly speaking, overall expectation is sqrt(d/m) A distance scales by sqrt(d/m) after projection in the “expected” sense; how much does it depart from this value?

What does Expectation give us But E(A/B) != E(A)/E(B) And even if it were, the distribution need not be tight around the expectation How do we determine tightness of a distribution? Tail Bounds for sums of independent random variables; summing gives concentration (unity gives strength!!)

Tail Bounds P(|Σ k X i 2 – k| > εk) < 2e -ε 2 k/4 sqrt [( X 1 1 / l 1 ) 2 + … + ( X d 1 / l d ) 2 ] Each l i 2 is within ( 1 +/- ε)m with probability inverse exponential in ε 2 m [( X 1 1 ) 2 + … + ( X d 1 ) 2 ] is within ( 1 +/- ε)d with probability inverse exponential in ε 2 d sqrt [( X 1 1 / l 1 ) 2 + … + ( X d 1 / l d ) 2 ] is within ( 1 +/- O( ε ) ) sqrt(d/m) with probability inverse exponential in ε 2 d (by the union bound)

One distance to many distances So one distance D has length D ( 1 +/- O( ε ) ) sqrt(d/m) after projection with probability inverse exponential in ε 2 d How about many distances (could some of them go astray?) There are n C 2 distances Each has inverse exponential in ε 2 d probability of failure, i.e., stretching/compressing beyond ( 1 +/- O( ε ) ) sqrt(d/m) What is the probability of no failure? Choose d so that e - ε 2 d/4 * n C 2 is << 1 (union bound again), so d is Θ(log n/ ε 2 )

Orthogonality What is the distribution of angles between two spherically symmetric vectors Tight around 90 degrees!!

Orthogonality Distribution for one vector X is e – (X T X)/2 d X Invariant to rotation Rotation is multiplication by square matrix R where R T R=I Take Y=RX so R T Y=X Distribution of Y is e – ((RY) T (RY)/2 d Y = e – (Y T Y)/2 d Y Y is sph sym in the n-1 dim space ortho to X Z= RX so that Z= RX, RY are sph sym So RY comprises n independent N(0,1)’s The portion of RY orthogonal to RX has n-1 independent N(0,1)’s

Orthogonality Start with v=10000… Projection on first vector has length X/L where X is N(0,1) and L is sqrt of sum of squares of N(0,1) Project v and all other vectors in the space ortho to the first vector Again rotate within this orthogonal space so v=sqrt[1- (X /L 1 ) 2 ] All other vectors are still sph sym in this ortho space So projection of v on second vector has length X’/L’ * sqrt[1- (X /L 1 ) 2 ] And so on.. overall dampening is (sqrt[1- (X /L 1 ) 2 ] ) d = (1- log n/n) d/2 whp

Tail Bound Proof Show P(|Σ k X i 2 – k| > εk) < 2e -ε 2 k/2 Two parts P(Σ k X i 2 < (1-ε)k) < e -ε 2 k/2 P(Σ k X i 2 > (1+ε)k) < e -ε 2 k/2

Tail Bound Proof P(Σ k X i 2 < (1-ε)k) < e -ε 2 k/4 = P(e -tΣX 2 > e -t(1-ε)k ) :t>0 <= E(e -tΣ X 2 )/e -t(1-ε)k :Markov’s <= E(e -tX 2 ) k e t(1-ε)k :Independence <= (2t+1) -k/2 e t(1-ε)k :Why? Minimize for t and complete the proof??

Exercises Complete proof of tail bound Take a random walk in which you move left with prob p and right with prob 1-p; what can you say about your distance from the start after n steps?