Machine learning learning... a fundamental aspect of intelligent systems – not just a short-cut to kn acquisition / complex behaviour.

Slides:



Advertisements
Similar presentations
Introduction to Artificial Intelligence CS440/ECE448 Lecture 21
Advertisements

CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Tree Approach in Data Mining
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Machine Learning Decision Trees. Exercise Solutions.
Information Extraction Lecture 6 – Decision Trees (Basic Machine Learning) CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Learning: Identification Trees Larry M. Manevitz All rights reserved.
Decision Tree Learning
Tree-based methods, neutral networks
Machine learning learning... a fundamental aspect of intelligent systems – not just a short-cut to kn acquisition / complex behaviour.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
12 March 1999Dip HI KBS Module1 Machine Learning Lucy Hederman.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Classification.
Sparse vs. Ensemble Approaches to Supervised Learning
MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Information Theory, Classification & Decision Trees Ling 572 Advanced Statistical Methods in NLP January 5, 2012.
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
Learning: Nearest Neighbor Artificial Intelligence CMSC January 31, 2002.
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
Inductive learning Simplest form: learn a function from examples
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.
EIE426-AICV1 Machine Learning Filename: eie426-machine-learning-0809.ppt.
1 CSI 5388:Topics in Machine Learning Inductive Learning: A Review.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Learning from observations
1 COMP3503 Inductive Decision Trees with Daniel L. Silver Daniel L. Silver.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
Learning with Decision Trees Artificial Intelligence CMSC February 20, 2003.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
Searching by Authority Artificial Intelligence CMSC February 12, 2008.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Data Mining and Decision Support
Learning with Decision Trees Artificial Intelligence CMSC February 18, 2003.
Classification and Regression Trees
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Medical Decision Making Learning: Decision Trees Artificial Intelligence CMSC February 10, 2005.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Iterative Dichotomiser 3 (ID3) Algorithm
k-Nearest neighbors and decision tree
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Instance Based Learning
K Nearest Neighbor Classification
Data Mining Practical Machine Learning Tools and Techniques
Learning with Identification Trees
Statistical Learning Dong Liu Dept. EEIS, USTC.
AI and Machine Learning
CS639: Data Management for Data Science
Presentation transcript:

machine learning learning... a fundamental aspect of intelligent systems – not just a short-cut to kn acquisition / complex behaviour

learning not well understood by philosophy, psychology, physiology AI asks: how can a machine... –generate rules from case histories –reorganise its kn as it expands –generate its own s/w –learn to discriminate diff. phenomenon

learning not well understood by philosophy, psychology, physiology AI asks: how can a machine... –generate rules – induction –reorganise kn – generalisation & induction –generate s/w – evolution (& others?) –discriminate phenomena – neural nets (etc)

implementation issues supervised vs unsupervised –with / without known outputs classification vs regression –discrete versus continuous values stuff learned “inductive bias” –types of functions learned –algorithms restrict what is learned

a generic model(?)

lecture programme induction intro rules from semantic nets (tutorial) nearest neighbour splitting feature space forming decision trees (& then rules) generalisation (semantic nets) near-miss evolution neural networks

induction #1 def: “automatically acquire a function from inputs to output values, based on previously seen inputs and output values” input: feature values output: classification eg: speech recognition, object identification

induction #2 aims: generate rules from examples formally: given collection of {x → y} find hypothesis h : h(x) ≈ y for (nearly) all x & y

issues feature selection what features to use? how do they relate to each other? how sensitive is the technique to feature selection? (irrelevant, noisy, absent feature; feature types) complexity & generalisation matching training data vs performance on new data NB: some of following slides are based on examples from Machine learning programme at The University of Chicago Department of Computer Science

induction – principles Occam’s razor the world is inherently simple so the most likely hypothesis is the simplest one which is consistent with observations other –use -ve as well as +ve evidence –seek concomitant variation in cause/result –more frequently observed associations are more reliable

rules from semantic nets a tutorial problem think bicycle, cart, car, motor-bike build nets from examples (later) build rules from nets

nearest neighbour supervised, classification (usually) training: record inputs  outputs use: –find “nearest” trained case & return associated output value +ve: fast, general purpose -ve: expensive prediction, definition of distance is complex, sensitive to noise

feature space splitting supervised, classification training: record inputs  outputs +ve: fast, tolerant to (some) noise -ve: some limitations, issues about feature selection, etc

splitting feature-space

real examples have many dimensions splitting by clusters can give “better” rules wider empty zones between clusters give “better” rules

identification trees (aka “decision trees”) supervised, classification +ve: copes better with irrelevant attributes & noise, fast in use -ve: more limited that nearest neighbour (& feature space)

ID trees train: build tree by forming subsets of least disorder use: –traverse tree based on feature tests & assign leaf node label –OR: use a ruleset +ve: robust to irrelevant features & some noise, fast prediction, readable rules -ve: poor feature combination, poor handling of feature dependencies, optimal trees not guaranteed

identification trees namehairheightweightscreenresult sarahblondeavelightNburn danablondetallaveYok alexdarktallaveYok annieblondeshortaveNburn emilyredaveheavyNburn petedarktallheavyNok johndarkaveheavyNok katieblondeshortlightYok

sunburn goal: predict burn/no burn for new cases cannot do exact match (same features) same output (feature space too large) nearest neighbour? but: what is close? which features matter?

Sunburn Identification Tree Hair Color Lotion Used Blonde Red Brown Alex: None John: None Pete: None Emily: Burn NoYes Sarah: Burn Annie: Burn Katie: None Dana: None

building ID trees aim: build a small tree such that all samples at leaves have same label at each node, pick tests so branches are closest to having same class –Split into subsets with least “disorder” –(Disorder ~ Entropy) find test that minimizes disorder

Minimizing Disorder Hair Color Blonde Red Brown Alex: N Pete: N John: N Emily: B Sarah: B Dana: N Annie: B Katie: N Height Weight Lotion Short Average Tall Alex:N Annie:B Katie:N Sarah:B Emily:B John:N Dana:N Pete:N Sarah:B Katie:N Light Average Heavy Dana:N Alex:N Annie:B Emily:B Pete:N John:N No Yes Sarah:B Annie:B Emily:B Pete:N John:N Dana:N Alex:N Katie:N

Minimizing Disorder Height Weight Lotion Short Average Tall Annie:B Katie:N Sarah:B Dana:N Sarah:B Katie:N Light Average Heavy Dana:N Annie:B No Yes Sarah:B Annie:B Dana:N Katie:N

measuring disorder Problem: –large DB’s don’t yield homogeneous subsets Solution: –IS defines theoretic measure of disorder Homogeneous set: least disorder = 0 Even split: most disorder = 1

sunburn entropy #1 Hair color = 4/8(-2/4 log 2/4 - 2/4log2/4) + 1/8*0 + 3/8 *0 = 0.5 Height = 0.69 Weight = 0.94 Lotion = 0.61

sunburn entropy #2 Height = 2/4(-1/2log1/2-1/2log1/2) + 1/4*0+1/4*0 = 0.5 Weight = 2/4(-1/2log1/2-1/2log1/2) +2/4(-1/2log1/2-1/2log1/2) = 1 Lotion = 0

building ID trees with disorder until each leaf is as homogeneous as possible –select a non-homogeneous leaf node –replace node by a test creating subsets with least average disorder

features in ID Trees: Pros Feature selection: –Tests features that yield low disorder E.g. selects features that are important! –Ignores irrelevant features Feature type handling: –Discrete type: 1 branch per value –Continuous type: Branch on >= value Need to search to find best breakpoint Absent features: Distribute uniformly

Features in ID Trees: Cons features assumed independent –If want group effect, must model explicitly E.g. make new feature AorB feature tests conjunctive

From ID Trees to Rules Hair Color Lotion Used Blonde Red Brown Alex: None John: None Pete: None Emily: Burn NoYes Sarah: Burn Annie: Burn Katie: None Dana: None (if (equal haircolor blonde) (equal lotionused yes) (then None)) (if (equal haircolor blonde) (equal lotionused no) (then Burn)) (if (equal haircolor red) (then Burn)) (if (equal haircolor brown) (then None))

generalisation has yellow