Download presentation
Presentation is loading. Please wait.
Published byMaximillian Richardson Modified over 9 years ago
1
Machine Learning CPS4801
2
Research Day Keynote Speaker o Tuesday 9:30-11:00 STEM Lecture Hall (2 nd floor) o Meet-and-Greet 11:30 STEM 512 Faculty Presentation o Tuesday 11:00-3:00 STEM o Prof. Liou 2:00 Room 415 Student Poster o Wednesday 10:00-3:00 o Computer Science 10:00-12:00 STEM Atrium Schedule:http://orsp.kean.edu/ResearchDa ys_Schedule.htmlhttp://orsp.kean.edu/ResearchDa ys_Schedule.html
3
Outline Introduction Decision tree learning Clustering Artificial Neural Networks Genetic algorithms
4
Learning from Examples An agent is learning if it improves its performance on future tasks after making observations about the world. One class of learning problem: o from a collection of input-output pairs, learn a function that predicts the output for new inputs.
5
Why learning? The designer cannot anticipate all possible situations o A robot designed to navigate mazes must learn the layout of each new maze. The designer cannot anticipate all changes o A program designed to predict tomorrow’s stock market prices must learn to adapt when conditions change. Programmers sometimes have no idea how to program a solution o recognizing faces
6
Types of Learning Supervised learning o example input-output pairs and learns a function Unsupervised learning o correct answers not given o clustering: a taxi agent must develop a concept of “good traffic days” and “bad traffic days” Reinforcement learning o rewards or punishments o taxi agent: lack of a tip o chess game: two points for a win
7
Supervised Learning Learning a function/rule from specific input- output pairs is also called inductive learning. Given a training set of N example pairs: o (x1,y1), (x2,y2),..., (xN, yN) o target unknown function y = f(x) Problem: find a hypothesis h such that h ≈ f h is generalized well if it correctly predicts the value of y for novel examples (test set).
8
Supervised Learning When the output y is one of the finite set of values (sunny, cloudy, rainy), the learning problem is called classification. o Boolean or binary classification When y is a number (tomorrow’s temperature), the problem is called regression.
9
Inductive learning method The points are in the (x,y) plane, where y = f(x). We approximate f with h selected from a hypothesis space H. Construct/adjust h to agree with f on training set
10
Inductive learning method Construct/adjust h to agree with f on training set E.g., curve fitting:
11
Inductive learning method Construct/adjust h to agree with f on training set E.g., curve fitting:
12
Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:
13
Inductive learning method Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting: How to choose from among multiple consistent hypotheses?
14
Inductive learning method Ockham’s razor: prefer the simplest hypothesis consistent with data (14 th -century English philosopher William of Ockham) There is a tradeoff between complex hypotheses that fit the training data well and simpler hypotheses that may generalize better.
15
15Cross-Validation Model Lather, rinse, repeat (10 times) 9 folds (approx. 1409)1 fold (approx. 157) Train Evaluate Report average Split into 10 folds Labeled data (1566)
16
Learning decision trees One of the simplest and yet most successful forms of machine learning. A decision tree represents a function that takes as input a vector of attribute values and returns a “decision” – a single output. o discrete input, Boolean classification
17
Learning decision trees Problem: decide whether to wait for a table at a restaurant, based on the following attributes: 1.Alternate : is there an alternative restaurant nearby? 2.Bar : is there a comfortable bar area to wait in? 3.Fri/Sat : is today Friday or Saturday? 4.Hungry : are we hungry? 5.Patrons : number of people in the restaurant (None, Some, Full) 6.Price : price range ($, $$, $$$) 7.Raining : is it raining outside? 8.Reservation : have we made a reservation? 9.Type : kind of restaurant (French, Italian, Thai, Burger) 10. WaitEstimate : estimated waiting time (0-10, 10-30, 30-60, >60)
18
Decision trees One possible representation for hypotheses (no Price and Type) “true” tree for deciding whether to wait:
19
Expressiveness Decision trees can express any function of the input attributes. E.g., for Boolean functions, truth table row → path to leaf: Goal (Path1 v Path2 v Path3 v...) Trivially, there is a consistent decision tree for any training set with one path to leaf for each example. Prefer to find more compact decision trees
20
Decision trees One possible representation for hypotheses (no Price and Type) “true” tree for deciding whether to wait:
21
21 Constructing the Decision Tree Goal: Find the smallest decision tree consistent with the examples divide-and-conquer: Test the most important attribute first, divides the problem up into smaller subproblems that can be solved recursively. o “Most important”: attribute that best splits examples Form tree with root = best attribute For each value v i (or range) of best attribute Selects those examples with best=v i Construct subtree i by recursively calling decision tree with subset of examples, all attributes except best Add a branch to tree with label=v i and subtree=subtree i
22
Decision tree learning Aim: find a small tree consistent with the training examples Idea: (recursively) choose "most significant" attribute as root of (sub)tree
23
Choosing an attribute Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" Which is a better choice?
24
Attribute-based representations Examples described by attribute values A training set of 12 examples E.g., situations where I will/won't wait for a table: Classification of examples is positive (T) or negative (F)
25
25 Choosing the Best Attribute: Binary Classification Want a formal measure that returns a maximum value when attribute makes a perfect split and minimum when it makes no distinction Information theory (Shannon and Weaver 49) o Entropy : a measure of uncertainty of a random variable A coin that always comes up heads --> 0 A flip of a fair coin (Heads or tails) --> 1(bit) The roll of a fair four-sided die --> 2(bit) o Information gain : the expected reduction in entropy caused by partitioning the examples according to this attribute
26
26 Formula for Entropy Examples: Suppose we have a collection of 10 examples, 5 positive, 5 negative: H(1/2,1/2) = -1/2log 2 1/2 -1/2log 2 1/2 = 1 bit Suppose we have a collection of 100 examples, 1 positive and 99 negative: H(1/100,99/100) = -.01log 2.01 -.99log 2.99 =.08 bits
27
Information gain Information gain (from attribute test) = difference between the original information requirement and new requirement Information Gain (IG) or reduction in entropy from the attribute test: Choose the attribute with the largest IG
28
Information gain For the training set, p = n = 6, I(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type (and others too): Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root
29
Example contd. Decision tree learned from the 12 examples: Substantially simpler than the “true” tree
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.