Machine Learning CS311 David Kauchak Spring 2013 Some material borrowed from: Sara Owsley Sood and others.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning III Decision Tree Induction
Imbalanced data David Kauchak CS 451 – Fall 2013.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Linear Classifiers/SVMs
Classification Algorithms
PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
Classification Techniques: Decision Tree Learning
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
Lecture 13 Last time: Games, minimax, alpha-beta Today: Finish off games, summary.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Machine Learning II Decision Tree Induction CSE 473.
Ensemble Learning: An Introduction
Induction of Decision Trees
Uninformed Search Reading: Chapter 3 by today, Chapter by Wednesday, 9/12 Homework #2 will be given out on Wednesday DID YOU TURN IN YOUR SURVEY?
ICS 273A Intro Machine Learning
CS121 Heuristic Search Planning CSPs Adversarial Search Probabilistic Reasoning Probabilistic Belief Learning.
Machine Learning and Review Reading: C Bayesian Approach  Each observed training example can incrementally decrease or increase probability of.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Ensemble Learning (2), Tree and Forest
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
INTRODUCTION TO MACHINE LEARNING David Kauchak CS 451 – Fall 2013.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
Adversarial Search CS311 David Kauchak Spring 2013 Some material borrowed from : Sara Owsley Sood and others.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013.
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Today’s Topics HW1 Due 11:55pm Today (no later than next Tuesday) HW2 Out, Due in Two Weeks Next Week We’ll Discuss the Make-Up Midterm Be Sure to Check.
MDPs (cont) & Reinforcement Learning
1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Ensemble Methods in Machine Learning
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
Graph Search II GAM 376 Robin Burke. Outline Homework #3 Graph search review DFS, BFS A* search Iterative beam search IA* search Search in turn-based.
COM24111: Machine Learning Decision Trees Gavin Brown
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CMPT 463. What will be covered A* search Local search Game tree Constraint satisfaction problems (CSP)
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Decision Trees: Another Example
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
CS Fall 2015 (Shavlik©), Midterm Topics
Decision Tree Learning
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Machine Learning: Lecture 3
Lecture 05: Decision Trees
Overfitting and Underfitting
COMP61011 : Machine Learning Decision Trees
A task of induction to find patterns
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
A task of induction to find patterns
Presentation transcript:

Machine Learning CS311 David Kauchak Spring 2013 Some material borrowed from: Sara Owsley Sood and others

Admin Two talks this week –Tuesday lunch –Thursday lunch Midterm exam posted later today Assignment 4 –How’s it going? –Due Friday at 6pm No office hours on Friday

Classifiers so far Naïve Bayes k-nearest neighbors (k-NN) Both fairly straightforward to understand and implement How do they work?

Many, many classifiers ification e_learning_algorithms

Many, many classifiers: what we will cover Problem setup –Training vs. testing –Feature-based learning –Evaluation Introduction to a few models Model comparison –When to apply one vs the other (maybe…) –How models differ –Pros/cons

Many, many classifiers: what we won’t cover Quite a few models Won’t dive too much into the theoretical underpinnings meta-learning (i.e. combining classifiers) recommender systems aka collaborative filtering (but can be viewed as a classification problem)

Bias/Variance Bias: How well does the model predict the training data? –high bias – the model doesn’t do a good job of predicting the training data (high training set error) –The model predictions are biased by the model Variance: How sensitive to the training data is the learned model? –high variance – changing the training data can drastically change the learned model

Bias/Variance Another way to think about it is model complexity Simple models –may not model data well –high bias Complicated models –may overfit to the training data –high variance

Bias/variance trade-off We want to fit a polynomial to this, which one should we use?

Bias/variance trade-off High variance OR high bias? Bias: How well does the model predict the training data? –high bias – the model doesn’t do a good job of predicting the training data (high training set error) –The model predictions are biased by the model Variance: How sensitive to the training data is the learned model? –high variance – changing the training data can drastically change the learned model

Bias/variance trade-off High bias Bias: How well does the model predict the training data? –high bias – the model doesn’t do a good job of predicting the training data (high training set error) –The model predictions are biased by the model Variance: How sensitive to the training data is the learned model? –high variance – changing the training data can drastically change the learned model

Bias/variance trade-off High variance OR high bias? Bias: How well does the model predict the training data? –high bias – the model doesn’t do a good job of predicting the training data (high training set error) –The model predictions are biased by the model Variance: How sensitive to the training data is the learned model? –high variance – changing the training data can drastically change the learned model

Bias/variance trade-off High variance Bias: How well does the model predict the training data? –high bias – the model doesn’t do a good job of predicting the training data (high training set error) –The model predictions are biased by the model Variance: How sensitive to the training data is the learned model? –high variance – changing the training data can drastically change the learned model

Bias/variance trade-off What do we want? Bias: How well does the model predict the training data? –high bias – the model doesn’t do a good job of predicting the training data (high training set error) –The model predictions are biased by the model Variance: How sensitive to the training data is the learned model? –high variance – changing the training data can drastically change the learned model

Bias/variance trade-off Compromise between bias and variance Bias: How well does the model predict the training data? –high bias – the model doesn’t do a good job of predicting the training data (high training set error) –The model predictions are biased by the model Variance: How sensitive to the training data is the learned model? –high variance – changing the training data can drastically change the learned model

k-NN vs. Naive Bayes k-NN has high variance and low bias. –more complicated model –can model any boundary –but very dependent on the training data NB has low variance and high bias. –Decision surface has to be linear (more on this later) –Cannot model all data –but, less variation based on the training data How do k-NN and NB sit on the variance/bias spectrum?

Bias vs. variance : Choosing the correct model capacity Which separating line should we use?

Playing tennis You want to decide whether or not to play tennis today –Outlook: Sunny, Overcast, Rain –Humidity: High, normal –Wind: Strong, weak Tell me what you’re classifier should do?

Decision tree is an intuitive way of representing a decision Tree with internal nodes labeled by features Branches are labeled by tests on that feature outlook = sunny x > 100 Leaves labeled with classes

Another decision tree Document classification: wheat or not wheat?

Another decision tree Document: wheat, agriculture, buschl Which category?

Decision tree learning Class 1 Class 2 Class 3 What does a decision node look like in the feature space? x y

Decision tree learning Class 1 Class 2 Class 3 A node in the tree is threshold on that dimension Test x > < x y

Decision tree learning Class 1 Class 2 Class 3 Features are x and y A node in the tree is threshold on that dimension How could we learn a tree from data?

Decision tree learning Class 1 Class 2 Class 3 Features are x and y A node in the tree is threshold on that dimension How could we learn a tree from data? Test x > <

Decision tree learning Class 1 Class 2 Class 3 Features are x and y A node in the tree is threshold on that dimension How could we learn a tree from data? Test x > < Test y ><

Decision Tree Learning Start at the top and work our way down –Examine all of the features to see which feature best separates the data –Split the data into subsets based on the feature test –Test the remaining features to see which best separates the data in each subset –Repeat this process in all branches until:

Decision Tree Learning Start at the top and work our way down –Examine all of the features to see which feature best separates the data –Split the data into subsets based on the feature test –Test the remaining features to see which best separates the data in each subset –Repeat this process in all branches until: all examples in a subset are of the same type there are no examples left (or some small number left) there are no attributes left

Decision Tree Learning Start at the top and work our way down –Examine all of the features to see which feature best separates the data –Split the data into subsets based on the feature test –Test the remaining features to see which best separates the data in each subset –Repeat this process in all branches until: all examples in a subset are of the same type there are no examples left there are no attributes left Ideas?

KL-Divergence Given two probability distributions P and Q When is this large? small?

KL-Divergence Given two probability distributions P and Q When P = Q, D KL (P||Q) = 0

KL-Divergence Given two probability distributions P and Q D KL (P||Q) = 6.89 P(1) = P(2) = Q(1) = Q(2) = KL-divergence is a measure of the distance between two probability distributions (though it’s not a distance metric!)

Information Gain KL-divergence is a measure of the distance between two probability distributions (though it’s not a distance metric! What is this asking?

Information Gain What is the distance from the probability of a class (i.e. the prior) and the probability of that class conditioned on f ? What information do we gain about the class decision, given the feature f ? Use information gain to decide the most informative feature to split on

Decision tree learning Class 1 Class 2 Features are x and y A node in the tree is threshold on that dimension What would be the learned tree?

Decision tree learning Class 1 Class 2 Features are x and y A node in the tree is threshold on that dimension Do you think this is right?

Overfitting Decision trees: high-variance or high-bias? Bias: How well does the model predict the training data? –high bias – the model doesn’t do a good job of predicting the training data (high training set error) –The model predictions are biased by the model Variance: How sensitive to the training data is the learned model? –high variance – changing the training data can drastically change the learned model

Overfitting Decision trees can have a high variance The model can be too complicated and overfit to the training data Ideas?

Pruning WHEAT Ideas?

Pruning Measure accuracy on a hold-out set (i.e. not used for training) –Stop splitting when when accuracy decreases –Prune tree from the bottom up, replacing split nodes with majority label, while accuracy doesn’t decrease Other ways look at complexity of the model with respect to characteristics of the training data –Don’t split if information gain gets below a certain threshold –Don’t split if number of examples is below some threshold –Don’t split if tree depth goes beyond a certain depth –…

Decision trees: good and bad Good –Very human friendly easy to understand people can modify –fairly quick to train Bad –overfitting/pruning can be tricky –greedy approach: if you make a split you’re stuck with it –performance is ok, but can do better

Midterm Open book –still only 2 hours, so don’t rely on it too much Anything we’ve talked about in class or read about is fair game Written questions are a good place to start

Review Intro to AI –what is AI –goals –challenges –problem areas

Review Uninformed search –reasoning through search –agent paradigm (sensors, actuators, environment, etc.) –setting up problems as search state space (starting state, next state function, goal state) actions costs –problem characteristics observability determinism known/unknown state space –techniques BFS DFS uniform cost search depth limited search Iterative deepening

Review Uninformed search cont. –things to know about search algorithms time space completeness optimality when to use them –graph search vs. tree search Informed search –heuristic function admissibility combining functions dominance –methods greedy best-first search A*

Review Adversarial search –game playing through search ply depth branching factor state space sizes optimal play –game characteristics observability # of players discrete vs. continuous real-time vs. turn-based determinism

Review Adversarial search cont –minimax algorithm –alpha-beta pruning optimality, etc. –evalution functions (heuristics) horizon effect –improvements transposition table history/end-game tables –dealing with chance/non-determinism expected minimax –dealing with partially observable games

Review Local search –when to use/what types of problems –general formulation –hill-climbing greedy random restarts randomness simulated annealing local beam search –genetic algorithms

Review Basic probability –why probability (vs. say logic)? –vocabulary experiment sample event random variable probability distribution –unconditional/prior probability –joint distribution –conditional probability –Bayes rule –estimating probabilities

Review Machine learning (up through last Thursday) –Bayesian classification problem formulation, argmax, etc. NB model –k-nearest neighbor –training, testing, evaluation –bias vs. variance –model characteristics