Download presentation
Presentation is loading. Please wait.
Published byMeagan Norton Modified over 9 years ago
1
Machine Learning Week 4 Lecture 1
2
Hand In Data Is coming online later today. I keep test set with approx. 1000 test images That will be your real test You are most welcome to add regularization as we discussed last week. It is not a requirement. Hand in Version 4 available
3
Recap What is going on Ways to fix it
4
Overfitting Data Increases -> Overfitting Decreases Noise Increases -> Overfitting Increases Target Complexity Increase -> Overfitting Increases
5
Learning Theory Perspective In Sample Error + Model Complexity Instead of picking simpler hypothesis set Prefer “Simpler” hypotheses h from Define what “simple” means in complexity measure Minimize
6
Regularization In Sample Error + Model Complexity Weight Decay Decay Every round we take a step towards the zero vector
7
Why are small weights better Practical Perspective Because in practice we believe that Noise is Noisy Stochastic Noise High Frequency Deterministic Noise also non-smooth Sometimes weight are weighed differently Bias Term gets free ride
8
Regularization Summary More Art Than Science Use VC and Bias Variance as guides Weight Decay universal technique – practical believe that noise is noisy (non-smooth) – Question. Which λ to use Many other regularizers exist. Extremely Important. Quote Book: “Necessary Evil”
9
Validation Regularization Estimates Validation Estimate Remember the test set
10
Model Selection t Models m 1,…,m t Which is better? E val (m 1 ) E val (m 2 ). E val (m t ) Pick the minimum one Compute Train on D train Validate on D val Use to find λ for my weight decay
11
Cross Validation Increasing K Dilemma E val estimate tightens E val increases Small K Large K We would like to have both. Cross Validation
12
K-Fold Cross Validation – Split Data in N/K Parts of size K – Test Train all but one set. Test on remaining. – Pick one who is best on average over N/K partitions Usual K = N/10 (we do not have all day)
13
Today: Support Vector Machines Margins Intuition Optimization Problem Convex Optimization Lagrange Multipliers Lagrange for SVM WARNING: Linear Algebra and function analysis coming up
14
Support Vector Machines Today Next Time
15
Notation Target y is in {-1,+1} We write parameters as w and b The hyperplane we consider is w T x + b = 0 Data D = {x i,y i ) For now assume D is linear separable
16
Hyperplanes again Return +1 Else Return -1 For w,b to classify x i correctly
17
Functional Margins Which Prediction are you more certain about? Intuition: Find w such that for all x i |w t x i + b| is large Classify x i correctly
18
Functional Margins (Useful Later) For each point we define the functional margin Define the functional margin of the hyperplane, e.g. the parameters w,b as Negative if w,b misclassifies a point
19
Geometric Margin Idea: Maximize Geometric Margin Lets get to work
20
Learning Theory Perspective There are much fewer large margin hyperplanes
21
Geometric Margin xixi How far is x i from the hyperplane? How long is segment from x i to L? Hyperplane L w Since L on hyperplane Definition of L Multiply in Solve
22
Geometric Mean xixi L w If x i is on the other side of the hyperplane we get an identical calculation. In general we get
23
Geometric Margin Distance to hyperplane Length of projection of x i onto w w xixi Origin is a orthogonal basisas normalized ||w|| = 1 Distance in w direction + shift
24
Geometric Margins For each point we define the geometric margin Define the geometric margin of the hyperplane, e.g. the parameters w,b as w xixi Origin Geometric Margin is invariant under scale of w,b.
25
Margins functional and Geometrical w Related by ||w||
26
Optimization Maximize Subject To Geometric Margin Point Margins Scale Constraint Maximize Subject To We may scale w,b any way want Rescaling w,b rescales Force
27
Optimization Maximize Subject To Minimize maximize 1/|x| is equal to minimize x 2 Quadratic Programming - Convex Force w
28
Linear Separable SVM Subject To Minimize Constrained Problem We need to study the theory of Lagrange Multipliers to understand it in detail
29
Lagrange Multipliers Define The Lagrangian Only consider convex f, g i, and affine h i (method is more general) α,β are called Lagrange Multipliers
30
Primal Problem If x is primal infeasible: g i (x) >0 for some i maximize over α i >0 then α i g i (x) is unbounded h i (x) ≠ 0 for some i maximize over β then β i h i (x) is unbounded x is primal infeasible if g i (x) < 0 for some i or h i (x) ≠ 0 for some i Primal Problem
31
If x is primal feasible: g i (x) ≤ 0 for all i maximize over α i ≥0 then optimal is α i =0 h i (x) = 0 for all i maximize over β then β i h i (x) = 0, β is irrelevant
32
Primal Problem Made constraints into ∞ value in optimization function Which is what we are looking for!!! is an optimal x
33
Dual Problem α,β are dual feasible if α i ≥ 0 for all i This implies
34
Weak and Strong Duality Question: When are they equal?
35
Strong Duality: Slaters Condition If f,g i are convex and h i is affine and the problem is strictly feasible e.g. exist primal feasible x such g i (x) < 0 for all i then d* = p * (strong duality) Assume that is the case
36
Complementary Slackness Let x* be primal optimal α*,β* dual optimal (p*=d*) All Non-Negative for all i Complimentary Slackness
37
Karush-Kuhn-Tucker (KKT) Conditions Let x* be primal optimal α*,β* dual optimal (p*=d*) g i (x*) ≤ 0, for all i α i * ≥ 0 for all i α i * g i (x*) = 0 for all i h i (x*) = 0 for all i Primal Feasibility Dual Feasibility Complementary Slackness Since x* minimizes Stationary KKT Conditions for optimality, necessary and sufficient.
38
Finally Back To SVM Subject To Minimize Define the Lagrangian (no β required)
39
SVM Dual Form Need to minimize. We take derivatives and solve for 0 Solve for 0 w is a vector that is a specific linear combination of input points
40
SVM Dual Form Which must be 0. We get constraint
41
SVM Dual Form Insert Above
42
SVM Dual Form Insert Above
43
SVM Dual Form
44
SVM Dual Problem Found the minimum over w,b now maximize over α Subject To Remember
45
Intercerpt b* Case: y i = 1 Cases: y i =-1 Constraint
46
Making Predictions Sign of Support Vectors
47
w Complementary Slackness Support vectors are the vectors that support the plane
48
SVM Summary Subject To Support Vectors w
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.