The joy of data Plus, bonus feature: fun with differentiation Reading: DH&S Ch. 9.6.0-9.6.3.

Slides:



Advertisements
Similar presentations
Miscellaneous Topics Calculus Drill!! Developed by Susan Cantey at Walnut Hills H.S
Advertisements

TECHNIQUES OF INTEGRATION
Function Input and Output Lesson The Magic Box Consider a box that receives numbers in the top And alters them somehow and sends a (usually) different.
Introducción a la Optimización de procesos químicos. Curso 2005/2006 BASIC CONCEPTS IN OPTIMIZATION: PART II: Continuous & Unconstrained Important concepts.
Today’s Daily Quiz will be given at the end of the hour. Please close your laptops and turn off and put away your cell phones, and get out your note-taking.
Please open your laptops, log in to the MyMathLab course web site, and open Quiz 6.1. If you have any time left after finishing the quiz problems, CHECK.
MTH070 Elementary Algebra Chapter 1 Review of Real Numbers and Problem Solving Copyright © 2010 by Ron Wallace, all rights reserved.
More Methodology; Nearest-Neighbor Classifiers Sec 4.7.
Intro to Linear Methods Reading: Bishop, 3.0, 3.1, 4.0, 4.1 hip to be hyperplanar...
Bayesian Learning, Part 1 of (probably) 4 Reading: DH&S, Ch. 2.{1-5}, 3.{1-4}
Decision Tree Algorithm
Steep learning curves Reading: Bishop Ch. 3.0, 3.1.
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.
Three kinds of learning
Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes.
Decision trees and empirical methodology Sec 4.3,
Steep learning curves Reading: DH&S, Ch 4.6, 4.5.
Bayesian Learning 1 of (probably) 2. Administrivia Readings 1 back today Good job, overall Watch your spelling/grammar! Nice analyses, though Possible.
Finding a free maximum Derivatives and extreme points Using second derivatives to identify a maximum/minimum.
D Nagesh Kumar, IIScOptimization Methods: M2L3 1 Optimization using Calculus Optimization of Functions of Multiple Variables: Unconstrained Optimization.
The joy of Entropy.
Supervised Learning I, Cont’d Reading: Bishop, Ch 14.4, 1.6, 1.5.
Bayesian Learning, cont’d. Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with.
The joy of Entropy. Administrivia Reminder: HW 1 due next week No other news. No noose is good noose...
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
Please open your laptops, log in to the MyMathLab course web site, and open Quiz 4.1/4.2. IMPORTANT NOTE: If you have time left after you finish this quiz,
Learning Chapter 18 and Parts of Chapter 20
Make sure you know the day and time of the final exam for this section of Math 110: Day: ______ Date:______ Time: ______ to _______ All Math 110 finals.
Graphing Calculators Using technology not as a crutch for basic calculations, but rather as a link to higher levels of understanding and to critical thinking.
ME451 Kinematics and Dynamics of Machine Systems Review of Elements of Calculus – 2.5 Vel. and Acc. of a Point fixed in a Ref Frame – 2.6 Absolute vs.
ME751 Advanced Computational Multibody Dynamics Introduction January 21, 2010 Dan Negrut University of Wisconsin, Madison © Dan Negrut, 2010 ME751, UW-Madison.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Miscellaneous Topics Calculus Drill!! Developed by Susan Cantey at Walnut Hills H.S
Miscellaneous Topics Calculus Drill!! Developed by Susan Cantey at Walnut Hills H.S
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Numerical Methods Applications of Loops: The power of MATLAB Mathematics + Coding 1.
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Miscellaneous Topics I’m going to ask you about various unrelated but important calculus topics. It’s important to be fast as time is your enemy on the.
Data Structures and Algorithms Discrete Math Review.
Block 4 Nonlinear Systems Lesson 14 – The Methods of Differential Calculus The world is not only nonlinear but is changing as well 1 Narrator: Charles.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
ME 2304: 3D Geometry & Vector Calculus Dr. Faraz Junejo Gradient of a Scalar field & Directional Derivative.
INTEGRALS The Fundamental Theorem of Calculus INTEGRALS In this section, we will learn about: The Fundamental Theorem of Calculus and its significance.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
MATH 31 REVIEWS Chapter 2: Derivatives.
the Antiderivative date: 1/30 hw: p225 #1-41 EOO.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
TECHNIQUES OF INTEGRATION Due to the Fundamental Theorem of Calculus (FTC), we can integrate a function if we know an antiderivative, that is, an indefinite.
Exercises Decision Trees In decision tree learning, the information gain criterion helps us select the best attribute to split the data at every node.
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
Calculus: Hughs-Hallett Chap 5 Joel Baumeyer, FSC Christian Brothers University Using the Derivative -- Optimization.
Division of decimals (by a whole number) When dividing a decimal by a whole number: Divide as you would for a whole number. There are NO remainders! If.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Miscellaneous Topics Calculus Drill!!. Miscellaneous Topics I’m going to ask you about various unrelated but important calculus topics. It’s important.
Classification and Regression Trees
Regression Math 12. Regression You can use this when the question does not specify that you must solve “algebraically” You can use regression when you.
Computer Graphics Mathematical Fundamentals Lecture 10 Taqdees A. Siddiqi
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
ME451 Kinematics and Dynamics of Machine Systems Review of Elements of Calculus – 2.5 Vel and Acc of a Point fixed in a Ref Frame – 2.6 Jan. 29, 2009 ©
Test Corrections On a separate sheet (not the original test), correct all your mistakes carefully. Hand both in, not stapled together. Do this work on.
Section 14.2 Computing Partial Derivatives Algebraically
Classification with Perceptrons Reading:
Math 200 Week 4 - Monday Partial derivatives.
15.5 Directional Derivatives
Presentation transcript:

The joy of data Plus, bonus feature: fun with differentiation Reading: DH&S Ch

Administrivia Homework 1 due date moved to this Thurs (Feb 2) If you didn’t know this, subscribe to ml-class mail list

Where we are Last time: Bias Entropy The full DT alg Today: HW1 FAQ Some mathematical reminders Measuring performance

HW1 FAQ Q1: Waaaaah! I don’t know where to start! A1: Be at peace grasshopper. Enlightenment is not a path to a door, but a road leading forever towards the horizon. A1’: I suggest the following problem order: 8, 1, 11, 5, 6

HW1 FAQ Q2: For problem 11, what should I turn in for the answer? A2: Basically, this calls for 3 things: 1. An algorithm (pseudocode) demonstrating how to learn a cost-sensitive tree csTree=buildCostSensDT(X,y,Lambda) 2. An algorithim (pseudocode) for classifying a point given such a tree label=costDTClassify(x) 3. A description of why these are the right algorithms

HW1 FAQ Q3: On problem 6, what’s Δi(s,t)? A3: A mistake. At least, that’s our best guess... A3’: I think what they mean is: “the change in impurity at node s, due to splitting on attribute t”. A3’’: What they really want is: “prove that Equation 5 is always non-negative”. Treat it as just Δi(N).

HW1 FAQ Q4: What’s with that whole concavity thing? A4: It all boils down to this picture: P1P1 i(P1)i(P1) Pb1Pb1 Pa1Pa1 } Δi(N)Δi(N)

HW1 FAQ Q4: What’s with that whole concavity thing? A4: It all boils down to this picture: Now you just have to prove that algebraically, for the general (c>2) case...

HW1 FAQ Q5: Do I really have to show you every, single calculation for problem 8? A5: Yes. I’m a slave driver and I enjoy making students do mindless arithmetic make-work that a computer is better suited for. I also torture kittens and donate brussel sprouts to food drives. A5’: No, of course not. You can use a computer to do some of the calculations. But I do want you to demonstrate: You know how to calculate each formula. You know which formula gets applied when. The impurity and gain at each node/split. Why each split you choose is the best one.

HW1 FAQ Q6: How do I show what the maximum entropy of something is? Or that it’s concave, for that matter? A6: That brings us to...

5 minutes of math Finding the maximum of a function (of 1 variable): Old rule from calculus:

Ok, so what’s the multivariate equivalent? Ex: What’s the derivative of this with respect to x ? 5 minutes of math

Rule is: for a scalar function of a vector argument f(x) First derivative w.r.t. x is the vector of first partials:

5 minutes of math Second derivative: Hessian matrix Matrix of all possible second partial combinations

5 minutes of math Second derivative: Hessian matrix Matrix of all possible second partial combinations

5 minutes of math Equivalent of the second derivative test: Hessian matrix is negative definite I.e., all eigenvalues of H are negative Use this to show An extremum is a maximum System is concave

Exercise Given the function: Find the extremum Show that the extremum is really a minimum

Back to performance...

Separation of train & test Fundamental principle (1st amendment of ML): Don’t evaluate accuracy (performance) of your classifier (learning system) on the same data used to train it!

Holdout data Usual to “hold out” a separate set of data for testing; not used to train classifier A.k.a., test set, holdout set, evaluation set, etc.

Holdout data Usual to “hold out” a separate set of data for testing; not used to train classifier A.k.a., test set, holdout set, evaluation set, etc. E.g.,

Holdout data Usual to “hold out” a separate set of data for testing; not used to train classifier A.k.a., test set, holdout set, evaluation set, etc. E.g., is training set (or empirical) accuracy is test set (or generalization) accuracy

Gotchas... What if you’re unlucky when you split data into train/test? E.g., all train data are class A and all test are class B? No “red” things show up in training data Best answer: stratification Try to make sure class (+feature) ratios are same in train/test sets (and same as original data) Why does this work?

Gotchas... What if you’re unlucky when you split data into train/test? E.g., all train data are class A and all test are class B? No “red” things show up in training data Almost as good: randomization Shuffle data randomly before split Why does this work?

Gotchas What if the data is small? N=50 or N=20 or even N=10 Can’t do perfect stratification Can’t get representative accuracy from any single train/test split

Gotchas No good answer Common answer: cross-validation Shuffle data vectors Break into k chunks Train on first k-1 chunks Test on last 1 Repeat, with a different chunk held-out Average all test accuracies together

Gotchas In code: for (i=0;i<k;++i) { [Xtrain,Ytrain,Xtest,Ytest]= splitData(X,Y,N/k,i); model[i]=train(Xtrain,Ytrain); cvAccs[i]=measureAcc(model[i],Xtest,Ytest); } avgAcc=mean(cvAccs); stdAcc=stddev(cvAccs);

CV in pix [X;y][X;y] Original data [X’;y’] Random shuffle k -way partition [X1’ Y1’] [X2’ Y2’] [Xk’ Yk’]... k train/ test sets k accuracies 53.7%85.1%73.2%