Topics in learning from high dimensional data and large scale machine learning Ata Kaban School of Computer Science University of Birmingham.

Slides:



Advertisements
Similar presentations
Nonparametric Methods: Nearest Neighbors
Advertisements

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
A Nonlinear Approach to Dimension Reduction Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb TexPoint fonts used in EMF.
CS 450: COMPUTER GRAPHICS LINEAR ALGEBRA REVIEW SPRING 2015 DR. MICHAEL J. REALE.
Self Organization of a Massive Document Collection
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Dimensionality Reduction and Embeddings
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
The Dot Product Sections 6.7. Objectives Calculate the dot product of two vectors. Calculate the angle between two vectors. Use the dot product to determine.
Dimensional reduction, PCA
1 Visualizing the Legislature Howard University - Systems and Computer Science October 29, 2010 Mugizi Robert Rwebangira.
Regionalized Variables take on values according to spatial location. Given: Where: A “structural” coarse scale forcing or trend A random” Local spatial.
Face Recognition Jeremy Wyatt.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.
Separate multivariate observations
1 MAC 2103 Module 10 lnner Product Spaces I. 2 Rev.F09 Learning Objectives Upon completing this module, you should be able to: 1. Define and find the.
Topics in Algorithms 2007 Ramesh Hariharan. Random Projections.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
Vectors Tools for Graphics.  To review vector arithmetic, and to relate vectors to objects of interest in graphics.  To relate geometric concepts to.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Kansas State University Department of Computing and Information Sciences CIS 736: Computer Graphics Monday, 26 January 2004 William H. Hsu Department of.
Introduction Continued on Next Slide  Section 3.1 in Textbook.
ME451 Kinematics and Dynamics of Machine Systems Review of Linear Algebra 2.1 through 2.4 Th, Sept. 08 © Dan Negrut, 2011 ME451, UW-Madison TexPoint fonts.
Kansas State University Department of Computing and Information Sciences CIS 736: Computer Graphics Wednesday, January 19, 2000 William H. Hsu Department.
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Lionel F.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
The first 2 steps of the Gram Schmitt Process
1 Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.
Inner Products, Length, and Orthogonality (11/30/05) If v and w are vectors in R n, then their inner (or dot) product is: v  w = v T w That is, you multiply.
5.1 Orthogonal Projections. Orthogonality Recall: Two vectors are said to be orthogonal if their dot product = 0 (Some textbooks state this as a T b =
Section 5.1 Length and Dot Product in ℝ n. Let v = ‹v 1­­, v 2, v 3,..., v n › and w = ‹w 1­­, w 2, w 3,..., w n › be vectors in ℝ n. The dot product.
Graphic Communication
AGC DSP AGC DSP Professor A G Constantinides©1 Signal Spaces The purpose of this part of the course is to introduce the basic concepts behind generalised.
Drawing With Lines and Shapes!
Objective Identify and draw dilations..
A rule that combines two vectors to produce a scalar.
Transformations: Projection CS 445/645 Introduction to Computer Graphics David Luebke, Spring 2003.
Inner Product, Length and Orthogonality Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB.
Support Vector Machines Exercise solutions Ata Kaban The University of Birmingham.
Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.
GEOMETRY HELP Circle A with 3-cm diameter and center C is a dilation of concentric circle B with 8-cm diameter. Describe the dilation. The circles are.
1 MAC 2103 Module 11 lnner Product Spaces II. 2 Rev.F09 Learning Objectives Upon completing this module, you should be able to: 1. Construct an orthonormal.
Scatter plots are simple but yet very powerful presentations of two variables and how they are related. Two sets of random variables contain the coordinates.
Section 8.7 Dilations OBJECTIVE: TO UNDERSTAND DILATION IMAGES OF FIGURES BIG IDEAS:TRANSFORMATIONS COORDINATE GEOMETRY ESSENTIAL UNDERSTANDING: A SCALE.
Elementary Linear Algebra Howard Anton Copyright © 2014 by John Wiley & Sons, Inc. All rights reserved. Chapter 3.
Introduction to Vectors and Matrices
Linear Algebra Review.
Background on Classification
Lecture: Face Recognition and Feature Reduction
COORDINATE PLANE The plane containing the "x" axis and "y" axis.
Lecture 03: Linear Algebra
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
Y. Kotidis, S. Muthukrishnan,
Orthogonal Drawings Orthogonal drawings are 2d drawings of 3d objects created by orthographic projection. Orthogonal: intersecting or lying at right angles.
Machine Learning Math Essentials Part 2
Day 54 – Standard Deviation
Dimension reduction : PCA and Clustering
Feature space tansformation methods
Linear Algebra Lecture 38.
CS5112: Algorithms and Data Structures for Applications
Introduction to Vectors and Matrices
Lecture 15: Least Square Regression Metric Embeddings
Multiplication and Division
Rates, Ratios and Proportions
Presentation transcript:

Topics in learning from high dimensional data and large scale machine learning Ata Kaban School of Computer Science University of Birmingham

High dimensional data

Problem 1 We have seen that the working of machine learning algorithms depends in some way or other on the geometry of data – lengths of vectors, distances, angles, shapes. High dimensional geometry is very different from low dimensional geometry. It defeats our intuitions -- we can draw in 2D, we can imagine things in 3D, what happens in larger D?

Problem 2 Most machine learning methods take computation time that increases quickly (exponentially) with the dimensionality of the data. These and a suite of other issues caused by high dimensionality are usually referred to as “the curse of dimensionality”. Fortunately, high dimensionality also has blessings!

Concentration of norms Generate points in 2D with coordinates drawn independently at random from a distribution with mean 0 and variance 1/d. Create a histogram of the norms (lengths) of these points. Repeat at larger dimensions. What happens?

Near-orthogonality Now, generate pairs of vectors in the same way and look at the distributions of their dot products. Recall, the dot product is 0 iff the vectors are orthogonal. What happens as you increase d?

What is happening as d  ∞? We can see from the simulation plots that: – As d increases, any two of our random vectors end up being nearly orthogonal to each other – As d increases any of our random vectors ends up having about the same length Can we explain why these things are happening? Yes we can, but we need some math tools for that… [on separate slides]

Consequences for machine learning When data has little structure (e.g. many attributes that re independent of each other – like in the data we generated in the earlier slides) then the ‘nearest neighbour’ is at about the same distance as the furthest one! When the data does have structure (e.g. nicely separated classes, or lives on a smaller dimensional subspace), then we can use a small collection of random vectors to project our high dimensional data to without losing much of the structure! – Cheap dimensionality reduction by Random Projections

Random Projections This result can be used for large scale machine learning. You just generate a kxd matrix, where k<<d, with entries i.i.d. random e.g. from a standard normal distribution, and pre-multiply your data points with it to get k- dimensional data that has much the same structure than the original.

Summary Curse of dimensionality for data that has little structure: – Nearly equal lengths – Near orthogonality – Nearest neighbour becomes meaningless (as well as other methods that also rely on distances) Blessing of dimensionality for data that has structure – Random Projections can be used as a cheap dimensionality reduction technique that has surprisingly strong guarantees of preserving the data geometry.

Related readings R. J. Durrant and A. Kaban. When is 'Nearest Neighbor' Meaningful: A Converse Theorem and Implications Journal of Complexity. Volume 25, Issue 4, August 2009, pp Our Tutorial at ECML’12, “Random Projections for Machine Learning and Data Mining: Theory and Applications” – with many references therein! dom