Download presentation
Presentation is loading. Please wait.
Published byJulian Ambrose Lynch Modified over 9 years ago
1
machine learning learning... a fundamental aspect of intelligent systems – not just a short-cut to kn acquisition / complex behaviour
2
learning not well understood by philosophy, psychology, physiology AI asks: how can a machine... –generate rules from case histories –reorganise its kn as it expands –generate its own s/w –learn to discriminate diff. phenomenon
3
learning not well understood by philosophy, psychology, physiology AI asks: how can a machine... –generate rules – induction –reorganise kn – generalisation & induction –generate s/w – evolution (& others?) –discriminate phenomena – neural nets (etc)
4
implementation issues supervised vs unsupervised –with / without known outputs classification vs regression –discrete versus continuous values stuff learned “inductive bias” –types of functions learned –algorithms restrict what is learned
5
a generic model(?)
6
lecture programme induction intro rules from semantic nets (tutorial) nearest neighbour splitting feature space forming decision trees (& then rules) generalisation (semantic nets) near-miss evolution neural networks
7
induction #1 def: “automatically acquire a function from inputs to output values, based on previously seen inputs and output values” input: feature values output: classification eg: speech recognition, object identification
8
induction #2 aims: generate rules from examples formally: given collection of {x → y} find hypothesis h : h(x) ≈ y for (nearly) all x & y
9
issues feature selection what features to use? how do they relate to each other? how sensitive is the technique to feature selection? (irrelevant, noisy, absent feature; feature types) complexity & generalisation matching training data vs performance on new data NB: some of following slides are based on examples from Machine learning programme at The University of Chicago Department of Computer Science
10
induction – principles Occam’s razor the world is inherently simple so the most likely hypothesis is the simplest one which is consistent with observations other –use -ve as well as +ve evidence –seek concomitant variation in cause/result –more frequently observed associations are more reliable
11
rules from semantic nets a tutorial problem think bicycle, cart, car, motor-bike build nets from examples (later) build rules from nets
12
nearest neighbour supervised, classification (usually) training: record inputs outputs use: –find “nearest” trained case & return associated output value +ve: fast, general purpose -ve: expensive prediction, definition of distance is complex, sensitive to noise
13
feature space splitting supervised, classification training: record inputs outputs +ve: fast, tolerant to (some) noise -ve: some limitations, issues about feature selection, etc
14
splitting feature-space
18
real examples have many dimensions splitting by clusters can give “better” rules wider empty zones between clusters give “better” rules
19
identification trees (aka “decision trees”) supervised, classification +ve: copes better with irrelevant attributes & noise, fast in use -ve: more limited that nearest neighbour (& feature space)
20
ID trees train: build tree by forming subsets of least disorder use: –traverse tree based on feature tests & assign leaf node label –OR: use a ruleset +ve: robust to irrelevant features & some noise, fast prediction, readable rules -ve: poor feature combination, poor handling of feature dependencies, optimal trees not guaranteed
21
identification trees namehairheightweightscreenresult sarahblondeavelightNburn danablondetallaveYok alexdarktallaveYok annieblondeshortaveNburn emilyredaveheavyNburn petedarktallheavyNok johndarkaveheavyNok katieblondeshortlightYok
22
sunburn goal: predict burn/no burn for new cases cannot do exact match (same features) same output (feature space too large) nearest neighbour? but: what is close? which features matter?
23
Sunburn Identification Tree Hair Color Lotion Used Blonde Red Brown Alex: None John: None Pete: None Emily: Burn NoYes Sarah: Burn Annie: Burn Katie: None Dana: None
24
building ID trees aim: build a small tree such that all samples at leaves have same label at each node, pick tests so branches are closest to having same class –Split into subsets with least “disorder” –(Disorder ~ Entropy) find test that minimizes disorder
25
Minimizing Disorder Hair Color Blonde Red Brown Alex: N Pete: N John: N Emily: B Sarah: B Dana: N Annie: B Katie: N Height Weight Lotion Short Average Tall Alex:N Annie:B Katie:N Sarah:B Emily:B John:N Dana:N Pete:N Sarah:B Katie:N Light Average Heavy Dana:N Alex:N Annie:B Emily:B Pete:N John:N No Yes Sarah:B Annie:B Emily:B Pete:N John:N Dana:N Alex:N Katie:N
26
Minimizing Disorder Height Weight Lotion Short Average Tall Annie:B Katie:N Sarah:B Dana:N Sarah:B Katie:N Light Average Heavy Dana:N Annie:B No Yes Sarah:B Annie:B Dana:N Katie:N
27
measuring disorder Problem: –large DB’s don’t yield homogeneous subsets Solution: –IS defines theoretic measure of disorder Homogeneous set: least disorder = 0 Even split: most disorder = 1
28
sunburn entropy #1 Hair color = 4/8(-2/4 log 2/4 - 2/4log2/4) + 1/8*0 + 3/8 *0 = 0.5 Height = 0.69 Weight = 0.94 Lotion = 0.61
29
sunburn entropy #2 Height = 2/4(-1/2log1/2-1/2log1/2) + 1/4*0+1/4*0 = 0.5 Weight = 2/4(-1/2log1/2-1/2log1/2) +2/4(-1/2log1/2-1/2log1/2) = 1 Lotion = 0
30
building ID trees with disorder until each leaf is as homogeneous as possible –select a non-homogeneous leaf node –replace node by a test creating subsets with least average disorder
31
features in ID Trees: Pros Feature selection: –Tests features that yield low disorder E.g. selects features that are important! –Ignores irrelevant features Feature type handling: –Discrete type: 1 branch per value –Continuous type: Branch on >= value Need to search to find best breakpoint Absent features: Distribute uniformly
32
Features in ID Trees: Cons features assumed independent –If want group effect, must model explicitly E.g. make new feature AorB feature tests conjunctive
33
From ID Trees to Rules Hair Color Lotion Used Blonde Red Brown Alex: None John: None Pete: None Emily: Burn NoYes Sarah: Burn Annie: Burn Katie: None Dana: None (if (equal haircolor blonde) (equal lotionused yes) (then None)) (if (equal haircolor blonde) (equal lotionused no) (then Burn)) (if (equal haircolor red) (then Burn)) (if (equal haircolor brown) (then None))
34
generalisation has yellow
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.