Supervised Learning I, Cont’d Reading: Bishop, Ch 14.4, 1.6, 1.5.

Slides:



Advertisements
Similar presentations
Machine Learning: Intro and Supervised Classification
Advertisements

CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Approximations of points and polygonal chains
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Tree under MapReduce Week 14 Part II. Decision Tree.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Support Vector Machines and Kernel Methods
Bayesian Learning, Part 1 of (probably) 4 Reading: Bishop Ch. 1.2, 1.5, 2.3.
Supervised Learning I, Cont’d Reading: DH&S, Ch
Machine Learning II Decision Tree Induction CSE 473.
More Methodology; Nearest-Neighbor Classifiers Sec 4.7.
Intro to Linear Methods Reading: Bishop, 3.0, 3.1, 4.0, 4.1 hip to be hyperplanar...
Bayesian Learning, Part 1 of (probably) 4 Reading: DH&S, Ch. 2.{1-5}, 3.{1-4}
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
More on Decision Trees. Numerical attributes Tests in nodes can be of the form x j > constant Divides the space into rectangles.
Margins, support vectors, and linear programming, oh my! Reading: Bishop, 4.0, 4.1, 7.0, 7.1 Burges tutorial (on class resources page)
Steep learning curves Reading: Bishop Ch. 3.0, 3.1.
Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Decision Trees Chapter 18 From Data to Knowledge.
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
Decision trees and empirical methodology Sec 4.3,
Linear Methods, cont’d; SVMs intro. Straw poll Which would you rather do first? Unsupervised learning Clustering Structure of data Scientific discovery.
SVMs Reprised Reading: Bishop, Sec 4.1.1, 6.0, 6.1, 7.0, 7.1.
The joy of data Plus, bonus feature: fun with differentiation Reading: DH&S Ch
Supervised Learning & Classification, part I Reading: DH&S, Ch 1.
Supervised Learning & Classification, part I Reading: W&F ch 1.1, 1.2, , 3.2, 3.3, 4.3, 6.1*
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Linear Discriminant Functions Chapter 5 (Duda et al.)
The joy of Entropy.
The joy of Entropy. Administrivia Reminder: HW 1 due next week No other news. No noose is good noose...
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.
Mohammad Ali Keyvanrad
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
Machine Learning Queens College Lecture 2: Decision Trees.
BIT 142:Programming & Data Structures in C#. A2 due date  A2 is due this Friday, June 12 th, by 11:30am BIT 142: Intermediate Programming2.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Computational Geometry Piyush Kumar (Lecture 1: Introduction) Welcome to CIS5930.
CS 445/545 Machine Learning Winter, 2014 See syllabus at
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Oliver Schulte Machine Learning 726 Decision Tree Classifiers.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
DECISION TREES An internal node represents a test on an attribute.
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Data Mining (and machine learning)
Orthogonal Range Searching and Kd-Trees
Data Mining for Business Analytics
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning in Practice Lecture 17
Junheng, Shengming, Yunsheng 10/19/2018
CS639: Data Management for Data Science
Presentation transcript:

Supervised Learning I, Cont’d Reading: Bishop, Ch 14.4, 1.6, 1.5

Administrivia I Machine learning reading group Not part of/related to this class We read advanced (current research) papers in the ML field Might be of interest. All are welcome Meets Fri, 2:00-3:30, FEC325 conf room More info: Lecture notes online

Administrivia II Microsoft on campus for talk/recruiting Feb 5 Mock interviews Hiring both undergrad, grad, interns Office hours Wed 9:00-10:00 and 11:00-noon Final exam time/day Incorrect in syllabus (last year’s. Oops.) Should be: Tues, May 8, 10:00-noon

Yesterday & today Last time: Basic ML problem Definitions and such Statement of the supervised learning problem Today: HW 1 assigned Hypothesis spaces Intro to decision trees

Homework 1 Due: Tues, Jan 30 Bishop, problems 14.10, 14.11, 1.36, 1.38 Plus: A) Show that entropy gain is concave (anti- convex) B) Show that a binary, categorical decision tree, using information gain as a splitting criterion, always increases purity. That is, information gain is non-negative for all possible splits, and is 0 only when the split leaves the data distribution unchanged in both leaves.

Feature (attribute): Instance (example): Label (class): Feature space: Training data: Review of notation

Finally, goals Now that we have and, we have a (mostly) well defined job: Find the function that most closely approximates the “true” function The supervised learning problem:

Goals? Key Questions: What candidate functions do we consider? What does “most closely approximates” mean? How do you find the one you’re looking for? How do you know you’ve found the “right” one?

Hypothesis spaces The “true” we want is usually called the target concept (also true model, target function, etc.) The set of all possible we’ll consider is called the hypothesis space, NOTE! Target concept is not necessarily part of the hypothesis space!!! Example hypothesis spaces: All linear functions Quadratic & higher-order fns.

Space of all functions on Visually... Might be here Or it might be here...

More hypothesis spaces Rules if (x.skin==”fur”) { if (x.liveBirth==”true”) { return “mammal”; } else { return “marsupial”; } } else if (x.skin==”scales”) { switch (x.color) { case (”yellow”) { return “coral snake”; } case (”black”) { return “mamba snake”; } case (”green”) { return “grass snake”; } } } else {... }

More hypothesis spaces Decision Trees

More hypothesis spaces Decision Trees

Finding a good hypothesis Our job is now: given an in some and an, find the best we can by searching Space of all functions on

Measuring goodness What does it mean for a hypothesis to be “as close as possible”? Could be a lot of things For the moment, we’ll think about accuracy (Or, with a higher sigma-shock factor...)

Aside: Risk & Loss funcs. The quantity is called a risk function A.k.a., empirical loss function Approximation to true (expected) loss: (Sort of) measure of distance between “true” concept and approximation to it All functions on

Constructing DT’s, intro Hypothesis space: Set of all trees, w/ all possible node labelings and all possible leaf labelings How many are there? Proposed search procedure: 1. Propose a candidate tree, t i 2. Evaluate accuracy of t i w.r.t. X and y 3. Keep max accuracy t i 4. Go to 1 Will this work?

A more practical alg. Can’t really search all possible trees Instead, construct single tree Greedily Recursively At each step, pick decision that most improves the current tree

A more practical alg. DecisionTree buildDecisionTree(X,Y) { // Input: instance set X, label set Y if (Y.isPure()) { return new LeafNode(Y); } else { Feature a=getBestSplitFeature(X,Y); DecisionNode n=new DecisionNode(a); [X0,...,Xk,Y0,...,Yk]=a.splitData(X,Y); for (i=0;i<=k;++i) { n.addChild(buildDecisionTree(Xi,Yi)); } return n; } }

A bit of geometric intuition x 1 : petal length x 2 : sepal width

The geometry of DTs Decision tree splits space w/ a series of axis orthagonal decision surfaces A.k.a. axis parallel Equivalent to a series of half-spaces Intersection of all half-spaces yields a set of hyper-rectangles (rectangles in D>3 dimensional space) In each hyper-rectangle, DT assigns a constant label So a DT is a piecewise-constant approximator over a sequence of hyper-rectangular regions