MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Advertisements

Decision Trees Decision tree representation ID3 learning algorithm
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
ICS320-Foundations of Adaptive and Learning Systems
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Machine Learning Group University College Dublin Decision Trees What is a Decision Tree? How to build a good one…
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Decision Tree Algorithm
Induction of Decision Trees
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Machine Learning: Symbol-Based
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 4.
Artificial Intelligence University Politehnica of Bucharest Adina Magda Florea
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.
For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.
Learning Holy grail of AI. If we can build systems that learn, then we can begin with minimal information and high-level strategies and have systems better.
For Friday No reading No homework. Program 4 Exam 2 A week from Friday Covers 10, 11, 13, 14, 18, Take home due at the exam.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Ch10 Machine Learning: Symbol-Based
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Learning from Observations Chapter 18 Through
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Learning, page 1 CSI 4106, Winter 2005 Symbolic learning Points Definitions Representation in logic What is an arch? Version spaces Candidate elimination.
KU NLP Machine Learning1 Ch 9. Machine Learning: Symbol- based  9.0 Introduction  9.1 A Framework for Symbol-Based Learning  9.2 Version Space Search.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Machine Learning A Quick look Sources: Artificial Intelligence – Russell & Norvig Artifical Intelligence - Luger By: Héctor Muñoz-Avila.
Machine Learning Concept Learning General-to Specific Ordering
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05 Prof. Pushpak Bhattacharyya Fundamentals of Information.
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
DECISION TREES An internal node represents a test on an attribute.
CS 9633 Machine Learning Concept Learning
Classification and Prediction
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Concept Learning.
Learning Chapter 18 and Parts of Chapter 20
Machine Learning Chapter 2
Implementation of Learning Systems
Machine Learning Chapter 2
Presentation transcript:

MACHINE LEARNING

What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) Any change in a system that allows it to perform better (Simon 1983) Any change in a system that allows it to perform better (Simon 1983)

What do we learn: Descriptions Rules how to recognize/classify objects, states, events Rules how to transform an initial situation to achieve a goal (final state)

How do we learn: Rote learning - storage of computed information. Taking advice from others. (Advice may need to be operationalized.) Learning from problem solving experiences - remembering experiences and generalizing from them. (May add efficiency but not new knowledge.) Learning from examples. (May or may not involve a teacher.) Learning by experimentation and discovery. (Decreasing burden on teacher, increasing burden on learner.)

Approaches to Machine Learning Symbol-based Connectionist Learning Evolutionary learning

Inductive Symbol-Based Machine Learning Concept Learning Version space search Version space search Decision trees: ID3 algorithm Decision trees: ID3 algorithm Explanation-based learning Explanation-based learning Supervised learning Supervised learning Reinforcement learning Reinforcement learning

Version space search for concept learning Concepts – describe classes of objects Concepts – describe classes of objects Concepts consist of feature sets Concepts consist of feature sets Operation on concept descriptions Operation on concept descriptions Generalization: Replace a feature with a variable Generalization: Replace a feature with a variable Specialization: Instantiate a variable with a feature Specialization: Instantiate a variable with a feature

Positive and Negative examples of a concept The concept description has to match all positive examples The concept description has to match all positive examples The concept description has to be false for the negative examples The concept description has to be false for the negative examples

Plausible descriptions The version space represents all the alternative plausible descriptions of the concept A plausible description is one that is applicable to all known positive examples and no known negative example.

Algorithm: Candidate elimination Given: A representation language A set of positive and negative examples expressed in that language Compute: A concept description that is consistent with all the positive examples and none of the negative examples

Hypotheses The version space contains two sets of hypotheses: G – the most general hypotheses that match the training data S – the most specific hypotheses that match the training data Each hypothesis is represented as a vector of values of the known attributes

Example of Version space Consider the task to obtain a description of the concept: Japanese Economy car. The attributes under consideration are: Origin, Manufacturer, Color, Decade, Type training data: Positive ex: (Japan, Honda, Blue, 1980, Economy) Positive ex: (Japan, Honda, White, 1980, Economy) Negative ex: (Japan, Toyota, Green, 1970, Sports)

Example continued The most general hypothesis that match the data is: (?, Honda, ?, ?, Economy) the symbol ‘?’ means that the attribute may take any value The most specific hypothesis that match the examples is: (Japan, Honda, ?,?, Economy)

Algorithm: Candidate elimination Initialize G to contain one element: the null description (all features are variables). Initialize S to contain one element: the first positive example. Accept a new training example.

Matching positive examples Remove from G any descriptions that do not cover the example. Update the S set to contain the most specific set of descriptions in the version space that cover the example and the current elements of the S set (i.e., generalize the elements of S as little as possible so that they cover the new training example )

Matching negative examples Remove from S any descriptions that cover the negative example. Update the G set to contain the most general set of descriptions in the version space that do not cover the example (i.e., specialize the elements of G as little as possible so that the negative example is no longer covered by any of the elements of G).

Comparing G and S If S and G are both singleton sets, then: if they are identical, output their value and halt. if they are different, the training cases were inconsistent. Output this result and halt. Else continue accepting new training examples

Learning the concept of "Japanese economy car" Features: Origin, Manufacturer, Color, Decade, Type POSITIVE EXAMPLE: (Japan, Honda, Blue, 1980, Economy) Initialize G to singleton set that includes everything Initialize S to singleton set that includes first positive example G = {(?, ?, ?, ?, ?)} S = {(Japan, Honda, Blue, 1980, Economy)}

Example continued NEGATIVE EXAMPLE: (Japan, Toyota, Green, 1970, Sports) Specialize G to exclude negative example G = {(?, Honda, ?, ?, ?), (?, ?, Blue, ?, ?) (?, ?, ?, 1980, ?) (?, ?, ?, ?, Economy)} S = {(Japan, Honda, Blue, 1980, Economy)}

Example continued POSITIVE EXAMPLE: (Japan, Toyota, Blue, 1990, Economy) Remove from G descriptions inconsistent with positive example Generalize S to include positive example G = { (?, ?, Blue, ?, ?) (?, ?, ?, ?, Economy)} S = {(Japan, ?, Blue, ?, Economy)}

Example continued NEGATIVE EXAMPLE: (USA, Chrysler, Red, 1980, Economy) Specialize G to exclude negative example (but staying within version space, i.e., staying consistent with S) G = {(?, ?, Blue, ?, ?) (Japan, ?, ?, ?, Economy)} S = {(Japan, ?, Blue, ?, Economy)}

Example continued POSITIVE EXAMPLE: (Japan, Honda, White, 1980, Economy) Remove from G descriptions inconsistent with positive example Generalize S to include the positive example G = {(Japan, ?, ?, ?, Economy)} S = {(Japan, ?, ?, ?, Economy)} S = G, both singleton => done!

Decision trees A decision tree is a structure that represents a procedure for classifying objects based on their attributes. Each object is represented as a set of attribute/value pairs and a classification.

Example A set of medical symptoms might be represented as follows: Cough Fever Weight Pain Classification Mary no yes normal throat flu Fred no yes normal abdomen appendicitis Julie yes yes skinny none flu Elvis yes no obese chest heart disease The system is given a set of training instances along with their correct classifications and develops a decision tree based on these examples.

Choosing Good Attributes If a crucial attribute is not represented, then no decision tree will be able to learn the concept. If two training instances have the same representation but belong to different classes, then the attribute set is said to be inadequate. It is impossible for the decision tree to distinguish the instances.

Learning of Decision Trees Algorithm: The ID3 learning algorithm (Quinlan, 1986) If all examples from E belong to the same class Cj then label the leaf with Cj else select the “best” decision attribute A with values v1, v2, …, vn for next node divide the training set S into S1, …, Sn according to values v1,…,vn recursively build subtrees T1, …, Tn for S1, …, Sn generate decision tree T Which attribute is best?

Entropy S S - a sample of training examples; p + (p - ) is a proportion of positive (negative) examples in S Entropy(S) = expected number of bits needed to encode the classification of an arbitrary member of S Information theory: optimal length code assigns -log 2 p bits to message having probability p Expected number of bits to encode “+” or “-” of random member of S: Entropy(S)  - p -  log 2 p - - p +  log 2 p + Generally for c different classes Entropy(S)   c - p i  log 2 p i

Information Gain Search Heuristic Gain(S,A) - the expected reduction in entropy caused by partitioning the examples of S according to the attribute A. a measure of the effectiveness of an attribute in classifying the training data Values(A) - possible values of the attribute A Sv - subset of S, for which attribute A has value v The best attribute has maximal Gain(S,A) Aim is to minimise the number of tests needed for class.

Examples of Training Examples

Sources Ashwin Ram, Assistant Professor, College of Computing Georgia Institute of Technology, Atlanta J. Kubalik. Machine Learning I – Outline. Gerstner Laboratory for Intelligent Decision Making and Control