Download presentation
Presentation is loading. Please wait.
1
Machine Learning Week 2 Lecture 1
2
Quiz and Hand in data Test what you know so I can adapt!
We need data for the hand in
3
Any Problems Any Questions
Quiz Any Problems Any Questions
4
Recap Target: House Price Input: Size, Rooms, Age, Garage, …
Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set Classification (10 classes) Target: House Price Input: Size, Rooms, Age, Garage, … Data: Historical Data of House Sales Regression
5
Linear Models Example: Target House Price Input: Size, Rooms, Age,
Garage, … Data: Historical House Sales Weigh each input dimension to effect the target function in a good way House Price = 1234 x x size + 42 x Rooms x age x Garage θ0 x0 θ x4 θ x3 θ2 * x2 θ1 x1 (matrix product) Linear in θ Nonlinear Transform
6
Three Models Classification Logistic Regression (Perceptron)
w Logistic Regression Estimating Probabilities Classify y = 1 if Equivalent to w
7
Maximum Likelihood Use Logarithm to make into a sum. Then Optimize.
Assumption: Independent Data Likelihood Use Logarithm to make into a sum. Then Optimize. For Logistic Regression we get cross entropy error:
8
Convex Optimization Convex Non-convex f and g are convex, h is affine
x,f(x) y,f(y) f and g are convex, h is affine Local Minima are Global Minima x,f(x) y,f(y) f(x)+f’(x)(y-x)
9
Descent Methods Iteratively move toward a better solution
where f is twice continuously differentiable Iteratively move toward a better solution Pick start point x Repeat Until Stopping Criterion Satisfied Compute Descent Direction v Line Search: Compute Step Size t Update: x = x + t v
10
Simple Gradient Descent
Pick start point x LR = 0.1 Repeat 50 rounds Set v Update: x = x + LR v Descent Direction is Step size:
11
Learning Rate Learning Rate Learning Rate
Do you want code for generating these figures?
12
Gradient Descent Jump Around
Use Exact Line Search Starting From (10,1)
13
Gradient Checking If You Use Gradient Descent
Compute Gradient Correctly. Choose small h and compute Use this two sided formula. Reduces the estimation error significantly. Move epsilon out and check n-dimensional gradient: Use formula for each variable Usually works well
14
Handin 1. It comes online after class today
Supervised Learning It comes online after class today Include Matlab examples but not a long intro. Google is your friend. Questions are always welcome Get Busy
15
Today Learning feasibility Probabilistic Approach Learning Formalized
16
Learning Diagram Unknown Target f Data Set (x1,y1,...,xn,yn)
Hypothesis h h(x) ≈ f(x) Learning Algorithm Hypothesis Set
17
Impossibility of Learning!
x1 x2 x3 f(x) 1 ? What is f? There are 256 potential functions 8 of them has in sample error 0 Assumptions are needed
18
No Free Lunch "All models are wrong, but some models are useful.”
George Box Machine Learning has many different models and algorithms. There is no single best model that works best for all problems (No Free Lunch Theorem) Assumptions that works well in one domain may fail in another.
19
Probabilistic Games
20
Probabilistic Approach
Repeat N times independently μ is unknown Sample mean: ν #heads/N What does sample mean say about μ? Sample: h,h,h,t,t,h,t,t,h With Certainty? Nothing really Probabilistically? Yes sample mean is likely close to bias
21
Hoeffdings Inequality Binary Variables
Sample mean is probably close to μ coin bias μ Probability increase with #samples N Sample mean ν Hoeffdings Inequality Bound is independent of sample mean and actual probability, e.g. the probability distribution P(x) Sample mean is probably approximately correct PAC
22
Classification Connection Testing a Hypothesis
Unknown Target Fixed Hypothesis Probability Distribution over x is probability of picking x such that f(x) ≠ h(x) is probability of picking x such that f(x) = h(x) μ is the sum of the probability of all the points X where hypothesis is wrong This is just the sum Sample Mean Out of sample Error
23
Unknown Input Probability Distribution P(x)
Learning Diagram Unknown Target f Unknown Input Probability Distribution P(x) Data Set (x1,y1,...,xn,yn) Hypothesis h h(x) ≈ f(x) Learning Algorithm Hypothesis Set
24
Coins to hypotheses Sample size N: h,h,h,t,t,h,t,t,h Sample mean
unknown μ
25
Not Learning Yet Hypothesis fixed before seeing data
Every hypothesis has its own error (different coin for each hypothesis) In learning we have a training algorithm that picks the “best” hypothesis from the set We are only verifying fixed hypothesis Hoeffding has left the building again.
26
Coin Analogy – Exercise 1.10 Book
Flip a fair coin 10 times What is Probability of 10 heads? Repeat 1000 times (1000 coins) What is the probability that some coin has 10 heads? Approximately 63%
27
Crude Approach Apply Union Bound
≤ P(True for some hypothesis) . Apply Union Bound and Then Hoeffding to each one
28
Result Classification Problem. Error is f(x)≠h(x)
Finite Hypothesis set with M hypotheses. Data Set with N points It explains the idea of what we are looking for (model complexity is a factor it seems) Our “simple” linear models have infinite size hypothesis sets…
29
Input Probability Distribution P(x)
New Learning Diagram Unknown Target f Input Probability Distribution P(x) Data Set (x1,y1,...,xn,yn) Hypothesis h h(x) ≈ f(x) Learning Algorithm Hypothesis Set finite X
30
Learning Feasibility Deterministic/No assumptions NOT SO MUCH
Probabilisticly YES: Generalization: Out of sample error Close to In Sample Error Make In Sample Error Small If target function is complex learning should be harder? But the bound does not seem to care. But complex functions needs complex hypothesis sets which should increase their complexity, e.g. M is a very simple and crude measure.
31
Error Functions User Specified, Heavily Problem Dependent.
Identity System, Fingerprints. Is the person who he says he is. h(x)/f(x) Lying True Estimate Lying True Negative False Negative Estimate True False Positive True Positive Walmart. Discount for a given person Error Function CIA Access (Friday bar stock) Error Function h(x)/f(x) Lying True Est. Lying Est. True h(x)/f(x) Lying True Est. Lying Est. True 1000 1 1 1000
32
Error Functions If Not Given
Base it on making problem “solvable”.. Making the problem smooth and convex seems like a good idea. Least Squares Linear Regression was very nice indeed. Base on assumptions about target and noise Logistic Regression: Gives Cross Entropy Assume linear and Gaussian noise: Gives Least Squares If no one tells you.
33
Unknown Probability Distribution P(x)
Formalize Everything Unknown Target Unknown Probability Distribution P(x) Data Set (x1,y1,...,xn,yn) Hypothesis h h(x) ≈ f(x) Learning Algorithm Hypothesis Set
34
Unknown Probability Distribution P(x)
Final Diagram Unknown Target Unknown Probability Distribution P(x) P(y | x) Learn Importance Data Set Learning Algorithm Hypothesis Set If x has very low probability then it is not really gonna count. Error Measure e Final Hypothesis
35
Words on out of sample error
Imagine X,y are finite sets
36
Quick Summary Learning Without Assumptions is impossible
Probabilistically learning is possible Hoeffding bound Work needed for infinite hypothesis spaces! Error function depend on problem Formalized Learning Approach Ensure out of sample error is close to in sample error Minimize in sample error Complexity of hypothesis set (size M currently) matters More data helps
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.