Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:

Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final: MLPs, PCA

Rules for Final Open book, notes, computer, calculator No discussion with others You can ask me or Dona general questions about a topic Read each question carefully Hand in your own work only Turn in to box at CS front desk or to me (hardcopy or e-mail) by 5pm Wednesday, March 21. No extensions

Short recap of important topics

Perceptrons

Training a perceptron 1.Start with random weights, w = (w 1, w 2,..., w n ). 2.Select training example (x k, t k ). 3.Run the perceptron with input x k and weights w to obtain o. 4.Let  be the learning rate (a user-set parameter). Now, 5.Go to 2.

Support Vector Machines

Here, assume positive and negative instances are to be separated by the hyperplane Equation of line: x2x2 x1x1

Intuition: the best hyperplane (for future generalization) will “maximally” separate the examples

Definition of Margin

Minimizing ||w|| Find w and b by doing the following minimization: This is a quadratic optimization problem. Use “standard optimization tools” to solve it.

Dual formulation: It turns out that w can be expressed as a linear combination of a small subset of the training examples x i : those that lie exactly on margin (minimum distance to hyperplane): such that x i lie exactly on the margin. These training examples are called “support vectors”. They carry all relevant information about the classification problem.

The results of the SVM training algorithm (involving solving a quadratic programming problem) are the  i and the bias b. The support vectors are all x i such that  i > 0. Clarification: In the slides below we use  i to denote |  i |  y i, where y i  {−1, 1}.

For a new example x, We can now classify x using the support vectors: This is the resulting SVM classifier.

SVM review Equation of line: w 1 x 1 + w 2 x 2 + b = 0 Define margin using: Margin distance: To maximize the margin, we minimize ||w|| subject to the constraint that positive examples fall on one side of the margin, and negative examples on the other side: We can relax this constraint using “slack variables”

SVM review To do the optimization, we use the dual formulation: The results of the optimization “black box” are and b. The support vectors are all x i such that  i != 0.

SVM review Once the optimization is done, we can classify a new example x as follows: That is, classification is done entirely through a linear combination of dot products with training examples. This is a “kernel” method.

Example 12 -2 1 2

Example 12 -2 1 2 Input to SVM optimzer: x 1 x 2 class 111 121 211 -10-1 0-1-1 -1-1-1

Example 12 -2 1 2 Input to SVM optimzer: x 1 x 2 class 111 121 211 -10-1 0-1-1 -1-1-1 Output from SVM optimzer: Support vectorα (-1, 0)-.208 (1, 1).416 (0, -1)-.208 b = -.376

Example 12 -2 1 2 Input to SVM optimzer: x 1 x 2 class 111 121 211 -10-1 0-1-1 -1-1-1 Output from SVM optimzer: Support vectorα (-1, 0)-.208 (1, 1).416 (0, -1)-.208 b = -.376 Weight vector:

Example 12 -2 1 2 Input to SVM optimzer: x 1 x 2 class 111 121 211 -10-1 0-1-1 -1-1-1 Output from SVM optimzer: Support vectorα (-1, 0)-.208 (1, 1).416 (0, -1)-.208 b = -.376 Weight vector: Separation line:

Example 12 -2 1 2 Classifying a new point:

Precision/Recall/ROC

Results of classifier ThresholdAccuracyPrecisionRecall.9.8.7.6.5.4.3.2.1 -∞ Creating a Precision/Recall Curve

Results of classifier ThresholdAccuracyTPRFPR.9.8.7.6.5.4.3.2.1 -∞ Creating a ROC Curve

Precision/Recall versus ROC curves 26 http://blog.crowdflower.com/2008/06/aggregate-turker-judgments-threshold-calibration/

Decision Trees

Naive Bayes

Naive Bayes classifier: Assume Given this assumption, here’s how to classify an instance x = : We can estimate the values of these various probabilities over the training set.

In-class example Training set: a 1 a 2 a 3 class 0 1 0 + 101 + 001 − 110 − 100 − What class would be assigned by a NB classifier to 111 ?

Laplace smoothing (also called “add-one” smoothing) For each class c j and attribute a i with value z, add one “virtual” instance. That is, recalculate: where k is the number of possible values of attribute a. a 1 a 2 a 3 classSmoothed P(a 1 =1 | +) = 0 1 0 + Smoothed P(a 1 =0 | +) = 001 +Smoothed P(a 1 =1 | −) = 111 −Smoothed P(a 1 =0 | −) = 110 − 101 −

Bayesian Networks

What is P(Cloudy| Sprinkler)?

What is P(Cloudy| Wet Grass)?

Markov Chain Monte Carlo Algorithm Markov blanket of a variable X i : – parents, children, children’s other parents MCMC algorithm: For a given set of evidence variables {X j =x k } Repeat for NumSamples: –Start with random sample from variables, with evidence variables fixed: (x 1,..., x n ). This is the current “state” of the algorithm. –Next state: Randomly sample value for one non-evidence variable X i, conditioned on current values in “Markov Blanket” of X i. Finally, return the estimated distribution of each non-evidence variable X i

Example Query: What is P(Sprinkler =true | WetGrass = true)? MCMC: –Random sample, with evidence variables fixed: [Cloudy, Sprinkler, Rain, WetGrass] = [true, true, false, true] –Repeat: 1.Sample Cloudy, given current values of its Markov blanket: Sprinkler = true, Rain = false. Suppose result is false. New state: [false, true, false, true] Note that current values of Markov blanket remain fixed. 2.Sample Sprinkler, given current values of its Markov blanket: Cloudy = false, Rain= false, Wet = true. Suppose result is true. New state: [false, true, false, true].

Each sample contributes to estimate for query P(Sprinkler = true| WetGrass = true) Suppose we perform 50 such samples, 20 with Sprinkler = true and 30 with Sprinkler= false. Then answer to the query is Normalize (  20,30  ) = .4,.6 

Adaboost

Sketch of algorithm Given data S and learning algorithm L: Repeatedly run L on training sets S t  S to produce h 1, h 2,..., h T. At each step, derive S t from S by choosing examples probabilistically according to probability distribution w t. Use S t to learn h t. At each step, derive w t+1 by giving more probability to examples that were misclassified at step t. The final ensemble classifier H is a weighted sum of the h t ’s, with each weight being a function of the corresponding h t ’s error on its training set.

Adaboost algorithm Given S = {(x 1, y 1 ),..., (x N, y N )} where x  X, y i  {+1, -1} Initialize w 1 (i) = 1/N. (Uniform distribution over data)

For t = 1,..., T: –Select new training set S t from S with replacement, according to w t –Train L on S t to obtain hypothesis h t –Compute the training error  t of h t on S : –If  t  0.5, break from loop. –Compute coefficient

–Compute new weights on data: where Z t is a normalization factor chosen so that w t+1 will be a probability distribution:

At the end of T iterations of this algorithm, we have h 1, h 2,..., h T We also have  1,  2,...,  T, where Ensemble classifier: Note that hypotheses with higher accuracy on their training sets are weighted more strongly.

A Simple Example t =1 S = Spam8.train: x 1, x 2, x 3, x 4 (class +1) x 5, x 6, x 7, x 8 (class -1) w 1 = {1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8} S 1 = {x 1, x 2, x 2, x 5, x 5, x 6, x 7, x 8 } Run svm_light on S 1 to get h 1 Run h 1 on S. Classifications: {1, -1, -1, -1, -1, -1, -1, -1} Calculate error:

Calculate  ’s: Calculate new w’s:

t =2 w 2 = {0.102, 0.163, 0.163, 0.163, 0.102, 0.102, 0.102, 0.102} S 2 = {x 1, x 2, x 2, x 3, x 4, x 4, x 7, x 8 } Run svm_light on S 2 to get h 2 Run h 2 on S. Classifications: {1, 1, 1, 1, 1, 1, 1, 1} Calculate error:

Calculate  ’s: Calculate w’s:

t =3 w 3 = {0.082, 0.139, 0.139, 0.139, 0.125, 0.125, 0.125, 0.125} S 3 = {x 2, x 3, x 3, x 3, x 5, x 6, x 7, x 8 } Run svm_light on S 3 to get h 3 Run h 3 on S. Classifications: {1, 1, -1, 1, -1,- 1, 1, -1} Calculate error:

Calculate  ’s: Ensemble classifier:

On test examples 1-8: S1S1 S2S2 S3S3 x1x1 111 x2x2 1 x3x3 1 x4x4 111 x5x5 11 x6x6 11 x7x7 1 x8x8 11 Test accuracy: 3/8

Genetic Algorithms

Selection methods Fitness proportionate selection Rank selection Elite selection Tournament selection

Example individual 1: 30 individual 2: 20 individual 3: 50 individual 4: 10 Fitness proportionate probabilities? Rank probabilities? Elite probabilities (top 50%)? Fitness

Reinforcement Learning / Q Learning

Q learning algorithm –For each (s, a), initialize Q(s,a) to be zero (or small value). –Observe the current state s. –Do forever: Select an action a and execute it. Receive immediate reward r Learn: –Observe the new state s –Update the table entry for Q(s,a) as follows: Q(s,a)  Q(s,a) + η (r + γ max a´ Q(s´,a´) – Q(s, a)) s  s

Simple illustration of Q learning C gives reward of 5 points. Each action has reward of -1. No other rewards or penalties. States are numbered squares Actions (N, E, S, W) are selected at random. Assume γ = 0.8, η = 1 R C 123 456

Step 1 Current state s = 1 R C 123 456 Q(s,a)Q(s,a)NSEW 10000 20000 30000 40000 50000 60000

Step 1 Current state s = 1 Select action a = Move South R C 123 456 Q(s,a)Q(s,a)NSEW 10000 20000 30000 40000 50000 60000

Step 1 Current state s = 1 Select action a = Move South Reward r = -1 New state s´ = 4 R C 123 456 Q(s,a)Q(s,a)NSEW 10000 20000 30000 40000 50000 60000

Step 1 Current state s = 1 Select action a = Move South Reward r = -1 New state s´ = 4 R C 123 456 Q(s,a)Q(s,a)NSEW 1000 20000 30000 40000 50000 60000 Learn: Q(s, a)  Q(s,a) + η (r + γ max a´ Q(s´,a´) – Q(s, a))

Step 1 Current state s = 1 Select action a = Move South Reward r = -1 New state s´ = 4 R C 123 456 Q(s,a)Q(s,a)NSEW 1000 20000 30000 40000 50000 60000 Learn: Q(s, a)  Q(s,a) + η (r + γ max a´ Q(s´,a´) – Q(s, a)) Update state: Current state = 4

Step 2 Current state s = 4 Select action a = Reward r = New state s´ = R C 123 456 Q(s,a)Q(s,a)NSEW 1000 20000 30000 40000 50000 60000 Learn: Q(s, a)  Q(s,a) + η (r + γ max a´ Q(s´,a´) – Q(s, a)) Update state: Current state =

Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:

Similar presentations

Presentation on theme: "Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:

Similar presentations

Presentation on theme: "Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:"— Presentation transcript:

Similar presentations

About project

Feedback