CS 231A Section 1: Linear Algebra & Probability Review

Name: CS 231A Section 1: Linear Algebra & Probability Review
Uploaded: 2017-07-06T14:20:24+00:00
Duration: PTM22S20
Channel: Megan Booker
Description: CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review
Jonathan Krause 9/28/2012

Topics Support Vector Machines Boosting Linear Algebra Review
Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability Axioms Basic Properties Bayes Theorem, Chain Rule 9/28/2012

Which hyperplane is best?
Linear classifiers Find linear function (hyperplane) to separate positive and negative examples w, b Which hyperplane is best? 9/28/2012

Support vector machines
Find hyperplane that maximizes the margin between the positive and negative examples Support vectors Margin 9/28/2012

Support Vector Machines (SVM)
Wish to perform binary classification, i.e. find a linear classifier Given data and labels where When data is linearly separable we can solve the optimization problem to find our linear classifier 9/28/2012

Nonlinear SVMs Datasets that are linearly separable work out great:
But what if the dataset is just too hard? We can map it to a higher-dimensional space: x x x2 One way to deal with non-separable problems x Slide credit: Andrew Moore 9/28/2012

Nonlinear SVMs General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x → φ(x) lifting transformation Slide credit: Andrew Moore 9/28/2012

SVM – l1 regularization What if data is not linearly separable?
Can use regularization to solve this problem We solve a new optimization problem and “tune” our regularization parameter C Another way to deal with non-separable problems 9/28/2012

Solving the SVM There are many different packages for solving SVMs
In PS0 we have you use the liblinear package. This is an efficient implementation but can only use a linear kernel If you wish to have more flexibility with your choice of kernel you can use the LibSVM package Other tricks for e.g. large scale 9/28/2012

Boosting It is a sequential procedure: xt=1 Each data point has xt
Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence, 14(5): , September, 1999. xt=1 Each data point has a class label: xt xt=2 +1 ( ) -1 ( ) yt = and a weight: wt =1 It is a sequential procedure that builds a complex classifier out of simpler ones by combining them additively. It is a sequential procedure: 9/28/2012 11

Toy example Weak learners from the family of lines Each data point has
a class label: +1 ( ) -1 ( ) yt = and a weight: wt =1 h => p(error) = 0.5 it is at chance 9/28/2012

Toy example Each data point has a class label: +1 ( ) yt = -1 ( )
+1 ( ) -1 ( ) yt = and a weight: wt =1 This one seems to be the best This is a ‘weak classifier’: It performs slightly better than chance. 9/28/2012

Toy example Each data point has a class label: +1 ( ) yt = -1 ( )
+1 ( ) -1 ( ) yt = We update the weights: wt wt exp{-yt Ht} 9/28/2012

Toy example f1 f2 f4 f3 The strong (nonlinear) classifier is built as the combination of all the weak (linear) classifiers. 9/28/2012

Boosting Defines a classifier using an additive model: Strong
Strong classifier Weak classifier Weight Features vector 9/28/2012

Boosting Defines a classifier using an additive model:
Strong classifier Weak classifier Weight Features vector We need to define a family of weak classifiers form a family of weak classifiers 9/28/2012

Why boosting? A simple algorithm for learning robust classifiers
Freund & Shapire, 1995 Friedman, Hastie, Tibshhirani, 1998 Provides efficient algorithm for sparse visual feature selection Tieu & Viola, 2000 Viola & Jones, 2003 Easy to implement, doesn’t require external optimization tools. 9/28/2012

Boosting - mathematics
Weak learners value of rectangle feature threshold Final strong classifier 9/28/2012 22

Weak classifier 4 kind of Rectangle filters Value =
∑ (pixels in white area) – ∑ (pixels in black area) For real problems results are only as good as the features used... This is the main piece of ad-hoc (or domain) knowledge Rather than the pixels, we have selected a very large set of simple functions Sensitive to edges and other critical features of the image ** At multiple scales Since the final classifier is a perceptron it is important that the features be non-linear… otherwise the final classifier will be a simple perceptron. We introduce a threshold to yield binary features Credit slide: S. Lazebnik 9/28/2012 23

Weak classifier Source Result Credit slide: S. Lazebnik 9/28/2012 24

Viola & Jones algorithm
1. Evaluate each rectangle filter on each example ….. 0.8 0.7 0.2 0.3 0.8 0.1 Weak classifier threshold P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. 9/28/2012

For a 24x24 detection region, P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. 9/28/2012 26

2. Select best filter/threshold combination Normalize the weights For each feature, j Choose the classifier, ht with the lowest error 3. Reweight examples P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. 9/28/2012

4. The final strong classifier is The final hypothesis is a weighted linear combination of the T hypotheses where the weights are inversely proportional to the training errors P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. 9/28/2012

Boosting for face detection
For each round of boosting: Evaluate each rectangle filter on each example Select best filter/threshold combination Reweight examples 9/28/2012 29

The implemented system
Training Data 5000 faces All frontal, rescaled to 24x24 pixels 300 million non-faces 9500 non-face images Faces are normalized Scale, translation Many variations Across individuals Illumination Pose This situation with negative examples is actually quite common… where negative examples are free. P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. 9/28/2012 30

System performance Training time: “weeks” on 466 MHz Sun workstation
38 layers, total of 6061 features Average of 10 features evaluated per window on test set “On a 700 Mhz Pentium III processor, the face detector can process a 384 by 288 pixel image in about .067 seconds” 15 Hz 15 times faster than previous detector of comparable accuracy (Rowley et al., 1998) 2001 is forever ago P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. 9/28/2012 31

Output of Face Detector on Test Images
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. 9/28/2012 32

Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability Axioms Basic Properties Bayes Theorem, Chain Rule Probably know most of this, just a review. 9/28/2012

Linear Algebra in Computer Vision
Representation 3D points in the scene 2D points in the image (Images are matrices) Transformations Mapping 2D to 2D Mapping 3D to 2D Also explains why so many people use matlab 9/28/2012

Notation We adopt the notation for a matrix which is a real valued matrix with m rows, and n columns We adopt the notation for a column vector, and a row vector respectively For vectors, the ‘by 1’ is implicit in R^n 9/28/2012

Notation To indicate the element in the ith row and jth column of a matrix we use Similarly to indicate the ith entry in a vector we use 9/28/2012

Norms Intuitively the norm of a vector is the measure of its “length”
The l2 norm is defined as in this class we will use the l2 norm unless otherwise noted. Thus we drop the 2 subscript on the norm for convenience. Note that There are other norms Formally, norms have to satisfy some properties (positive scalability, triangle inequality, p(v) = 0 -> v = 0), but that’s not important now 9/28/2012

Linear Independence and Rank
A set of vectors is linearly independent if no vector in the set can be represented as a linear combination of the remaining vectors in the set The rank of a matrix is the maximal number of linearly independent column or rows of a matrix A^T A is called the gram matrix 9/28/2012

Range and Nullspace The range of a matrix is the span of the columns of the matrix, denoted by the set The nullspace of a matrix, is the set of vectors that when multiplied by the matrix result in 0, given by the set 9/28/2012

Eigenvalues and Eigenvectors
Given a matrix, and are said to be an eigenvalue and the corresponding eigenvector of the matrix if We can solve for the eigenvalues by solving for the roots of the polynomial generated by 9/28/2012

Eigenvalue Properties
The rank of a matrix is equal to the number of its non-zero eigenvalues Eigenvalues of a diagonal matrix, are simply the diagonal entries A matrix is said to be diagonalizable if we can write Lambda is diagonal whose elements are the eigenvalues, X contains the eigenvectors 9/28/2012

Eigenvalues & Eigenvectors of Symmetric Matrices
Eigenvalues of symmetric matrices are real Eigenvectors of symmetric matrices are orthonormal Consider the optimization problem involving the symmetric matrix the maximizing is the eigenvector corresponding to the largest eigenvalue Problem equivalent to max x^T A x s.t. \|x\|_2 = 1 9/28/2012

Generalized Eigenvalues
Generalized Eigenvalue problem Generalized eigenvalues must satisfy This reduces to the original eigenvalue problem when exists Generalized eigenvalues are used in Fisherfaces 9/28/2012

Singular Value Decomposition (SVD)
The SVD of matrix is given by Where are the columns of and called the left singular vectors is a diagonal matrix whose values are , and called the singular values are the columns of , and are called the right singular vectors 9/28/2012

SVD If the matrix has rank , then has non-zero singular values
are an orthonormal basis for Singular values of are the square root of the non-zero eigenvalues of or 9/28/2012

Matlab [V,D] = eig(A) The eigenvectors of A are the columns of V. D is a diagonal matrix whose entries are the eigenvalues of A. [V,D] = eig(A,B) The generalized eigenvectors are the columns of V. D is a diagonal matrix whose entries of the generalized eigenvalues. [U,S,V] = svd(X) The columns of U are the left singular vectors of X. S is a diagonal matrix whose entries are the singular values of X. The columns of V are the right singular vectors of X. Recall X = U*S*V’; 9/28/2012

Matrix Calculus -- Gradient
Let then the gradient is given by is always the same size as , thus if we just have a vector the gradient is simply Mention transpose notation Really just a way to express many equations at once Hugely useful in ML, optimization 9/28/2012

Gradients From partial derivatives Some common gradients 9/28/2012

Probability in Computer Vision
Foundation for algorithms to solve Tracking problems Human activity recognition Object recognition Segmentation 9/28/2012

Probability Axioms Sample space: The set of all the outcomes of a random experiment. Denoted by Event space: A set whose elements are subsets of The event space is denoted by For example Probability measure: A function that satisfies Rigorously this is part of measure theory Sigma-algebra Emphasize that a probability measure operates on sets of events 9/28/2012

Basic Properties All derivable from the axioms 9/28/2012

Conditional Probability
Two events are independent if Conditional Independence Conditional probability is a definition! There are other equivalent ways to express independence 9/28/2012

Product Rule From the definition of conditional probability we can write From the product rule we can derive the chain rule of probability 9/28/2012

Bayes Theorem Likelihood Prior Probability Posterior Probability
Trivial to prove! Just one step beyond the definition of the conditional probability Bayesian statistics Normalizing Constant 9/28/2012

CS 231A Section 1: Linear Algebra & Probability Review

Similar presentations

Presentation on theme: "CS 231A Section 1: Linear Algebra & Probability Review"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 231A Section 1: Linear Algebra & Probability Review

Similar presentations

Presentation on theme: "CS 231A Section 1: Linear Algebra & Probability Review"— Presentation transcript:

Similar presentations

About project

Feedback