Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 4705 Artificial Intelligence

Similar presentations


Presentation on theme: "CSE 4705 Artificial Intelligence"— Presentation transcript:

1 CSE 4705 Artificial Intelligence
Jinbo Bi Department of Computer Science & Engineering

2 Machine learning (1) Supervised learning algorithms

3 Supervised learning: definition
Given a collection of examples (training set ) Each example contains a set of attributes (independent variables), one of the attributes is the target (dependent variables). Find a model to predict the target as a function of the values of other attributes. Goal: previously unseen examples should be predicted as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

4 Supervised learning: classification
When the dependent variable is categorical, a classification problem

5 Supervised learning: regression
When the dependent variable is continuous, a regression problem

6 Practice A very simple example
We observe the following data, how can we find a function f that best approximate y = f(x) X Y This dataset was created using y = sin(2πx) + N(0,σ) where σ = 0.1

7 Simple regression algorithms
Least squares methods Linear regression Polynomial regression with least squares The very important practical issues Overfitting underfitting Regularized linear regression Ridge regression Least absolute shrinkage and selection operator

8 Least squares We wish to use some real-valued input variables x to predict the value of a target y We collect training data of pairs (xi,yi), i=1,…N Suppose we have a model f that maps each x example to a value of y’ Sum of squares function: Sum of the squares of the deviation between the observed target value y and the predicted value y’

9 Least squares Find a function f such that the sum of squares is minimized For example, your function is in the form of linear functions f (x) = wTx Least squares with a linear function of parameters w is called “linear regression”

10 Linear regression Linear regression has a closed-form solution for w
The minimum is attained at the zero derivative

11 Linear regression A simple example where we observed three data points
(x1, y1), (x2, y2) and (x3, y3) where x is a vector of 2 entries

12 Polynomial fitting with least squares
For example, our function is no longer linear but a polynomial of x Let us assume we have one independent variable x, and we are building a polynomial of order M to approximate y

13 Polynomial fitting with least squares
Similarly, Least Squares has a closed form solution with linear regression, we also have the same closed form solution when the function is a polynomial where

14 Supervised learning – practical issues
Underfitting Overfitting Before introducing these important concept, let us study the behavior of the simple regression algorithm – polynomial regression

15 Polynomial curve fitting
x is evenly distributed from [0,1] y = f(x) + random error y = sin(2πx) + ε, ε ~ N(0,σ)

16 Polynomial curve fitting

17 Polynomial curve fitting

18 Sum-of-Squares error function

19 0-th order polynomial

20 1-th order polynomial

21 3-rd order polynomial

22 9-th order polynomial

23 Over-fitting Root-Mean-Square (RMS) Error:

24 Over-fitting If we create 100 more points in the same simulation procedure, Root-Mean-Square (RMS) Error:

25 Polynomial coefficients

26 Data set size 9th Order Polynomial

27 Data set size 9th Order Polynomial

28 Regularization Penalize large coefficient values Ridge regression

29 Regularization Let us fit again a polynomial of order 9, but this time we use regularization

30 Regularization

31 Regularization vs.

32 Polynomial coefficients

33 Ridge regression Derive the analytic solution to the optimization problem for ridge regression Again, set the first order derivative = 0

34 Regularization Penalize large coefficient values using another way LASSO – least absolute shrinkage and selection operator No closed-form solution, we need to run iterative algorithm

35 Questions?


Download ppt "CSE 4705 Artificial Intelligence"

Similar presentations


Ads by Google