Download presentation
Presentation is loading. Please wait.
Published byStuart O’Connor’ Modified over 9 years ago
1
Support Vector Machines: a different approach to finding the decision boundary, particularly good at generalisation finishing off last lecture …
2
Suppose we can divide the classes with a simple hyperplane
3
There will be infinitely many such lines
4
One of them is ‘optimal’
5
Beause it maximises the average distance of the hyperplane from the ‘support vectors’ – instances that are closest to instances of different class
6
A Support Vector Machine (SVM) finds this hyperplane
7
But, usually there is no simple hyperplane that separates the classes!
8
One dimension (x), two classes
9
Two dimensions (x, x*sin(x)),
10
Now we can separate the classes
11
SVMs do ths: If we add enough extra dimensions/fields using arbitrary functions of the existing fields, then it becomes very likely we can separate the data. SVMs - apply such a transformation - then find the optimal separating hyperplane. The ‘optimality’ of the sep hyp means good generalisation properties
13
Decision Trees
14
Real world applications of DTs See here for a list: http://www.cbcb.umd.edu/~salzberg/docs/murthy_thesis/ survey/node32.html http://www.cbcb.umd.edu/~salzberg/docs/murthy_thesis/ survey/node32.html Includes: Agriculture, Astronomy, Biomedical Engineering, Control Systems, Financial analysis, Manufacturing and Production, Medicine, Molecular biology, Object recognition, Pharmacology, Physics, Plant diseases, Power systems, Remote Sensing, Software development, Text processing:
15
Field names
16
Field values
17
Field names Field values Class values
18
Why decision trees? Popular, since they are interpretable... and correspond to human reasoning/thinking about decision-making Can perform quite well in accuracy when compared with other approaches... and there are good algorithms to learn decision trees from data
20
Figure 1. Binary Strategy as a tree model. Mohammed MA, Rudge G, Wood G, Smith G, et al. (2012) Which Is More Useful in Predicting Hospital Mortality -Dichotomised Blood Test Results or Actual Test Values? A Retrospective Study in Two Hospitals. PLoS ONE 7(10): e46860. doi:10.1371/journal.pone.0046860 http://www.plosone.org/article/info:doi/10.1371/journal.pone.0046860
21
Figure 1. Binary Strategy as a tree model.
22
We will learn the ‘classic’ algorithm to learn a DT from categorical data:
23
ID3ID3
24
Suppose we want a tree that helps us predict someone’s politics, given their gender, age, and wealth genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
25
Choose a start node (field) at random genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
26
Choose a start node (field) at random ? genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
27
Choose a start node (field) at random Age genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
28
Add branches for each value of this field Age young mid old genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
29
Check to see what has filtered down Age young mid old 1 L, 2 R 1 L, 1 R0 L, 1 R genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
30
Where possible, assign a class value Age young mid old 1 L, 2 R 1 L, 1 R0 L, 1 R Right-Wing genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
31
Otherwise, we need to add further nodes Age young mid old 1 L, 2 R 1 L, 1 R0 L, 1 R ? ? Right-Wing genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
32
Repeat this process every time we need a new node Age young mid old 1 L, 2 R 1 L, 1 R0 L, 1 R ? ? Right-Wing genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
33
Starting with first new node – choose field at random Age young mid old 1 L, 2 R 1 L, 1 R0 L, 1 R wealth ? Right-Wing genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
34
Check the classes of the data at this node… Age young mid old 1 L, 2 R 1 L, 1 R0 L, 1 R wealth ? Right-Wing rich poor 1 L, 0 R 1 L, 1 R genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
35
And so on … Age young mid old 1 L, 2 R 1 L, 1 R0 L, 1 R wealth ? Right-Wing rich poor 1 L, 1 R Right-wing genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
36
But we can do better than randomly chosen fields! genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
37
This is the tree we get if first choice is `gender’ genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
38
gender male female Right-Wing Left-Wing genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing This is the tree we get if first choice is `gender’
39
Algorithms for building decision trees (of this type) Initialise: tree T contains one ‘unexpanded’ node Repeat until no unexpanded nodes remove an unexpanded node U from T expand U by choosing a field add the resulting nodes to T
40
Algorithms for building decision trees (of this type) – expanding a node ?
41
Algorithms for building decision trees (of this type) – the essential step Field ? ?? Value = X Value = Y Value = Z
42
So, which field? Field ? ?? Value = X Value = Y Value = Z
43
Three choices: gender, age, or wealth genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
44
Suppose we choose age (table now sorted by age values) genderagewealthpolitics malemiddle-agedrichRight-wing femalemiddle-agedpoorLeft-wing maleoldpoorRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing maleyoungpoorRight-wing Two of the values have a mixture of classes
45
Suppose we choose wealth (table now sorted by wealth values) genderagewealthpolitics femalemiddle-agedpoorLeft-wing maleoldpoorRight-wing femaleyoungpoorLeft-wing maleyoungpoorRight-wing malemiddle-agedrichRight-wing maleyoungrichRight-wing One of the values has a mixture of classes - this choice is a bit less mixed up than age?
46
Suppose we choose gender (table now sorted by gender values) genderagewealthpolitics femalemiddle-agedpoorLeft-wing femaleyoungpoorLeft-wing maleoldpoorRight-wing malemiddle-agedrichRight-wing maleyoungpoorRight-wing maleyoungrichRight-wing The classes are not mixed up at all within the values
47
So, at each step where we choose a node to expand, we make the choice where the relationship between the field values and the class values is least mixed up
48
Measuring ‘mixed-up’ness: Shannon’s entropy measure Suppose you have a bag of N discrete things, and there T different types of things. Where, p T is the proportion of things in the bag that are type T, the entropy of the bag is:
49
Examples: This mixture: { left left left right right } has entropy: − ( 0.6 log(0.6) + 0.4 log(0.4)) = 0.292 This mixture: { A A A A A A A A B C } has entropy: − ( 0.8 log(0.8) + 0.1 log(0.1) + 0.1 log(0.1)) =0.278 This mixture: {same same same same same same} has entropy: − ( 1.0 log(1.0) ) = 0 Lower entropy = less mixed up
50
ID3 chooses fields based on entropy Field1 Field2 Field3 … val1 val1 val1 val2 val2 val2 val3 val3 Each val has an entropy value – how mixed up the classes are for that value choice
51
ID3 chooses fields based on entropy Field1 Field2 Field3 … val1 x p1 val1 x p1 val1 x p1 val2 x p2 val2 x p2 val2 x p2 val3 x p3 val3 x p3 Each val has an entropy value – how mixed up the classes are for that value choice And each val also has a proportion – how much of the data at this node has this val
52
ID3 chooses fields based on entropy Field1 Field2 Field3 … val1 x p1 val1 x p1 val1 x p1 val2 x p2 val2 x p2 val2 x p2 val3 x p3 val3 x p3 = = = H(D|Field1) H(D|Field2) H(D|Field3) So ID3 works out H(D|Field) for each field, which is the entropies of the values weighted by the proportions.
53
ID3 chooses fields based on entropy Field1 Field2 Field3 … val1 x p1 val1 x p1 val1 x p1 val2 x p2 val2 x p2 val2 x p2 val3 x p3 val3 x p3 = = = H(D|Field1) H(D|Field2) H(D|Field3) So ID3 works out H(D|Field) for each field, which is the entropies of the values weighted by the proportions. The one with the lowest value is chosen – this maximises ‘Information Gain’
54
Back here gender, age, or wealth genderagewealthpolitics malemiddle-agedrichRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing femalemiddle-agedpoorLeft-wing maleyoungpoorRight-wing maleoldpoorRight-wing
55
Suppose we choose age (table now sorted by age values) genderagewealthpolitics malemiddle-agedrichRight-wing femalemiddle-agedpoorLeft-wing maleoldpoorRight-wing maleyoungrichRight-wing femaleyoungpoorLeft-wing maleyoungpoorRight-wing H(D| age) = proportion-weighted entropy = 0.3333 x − ( 0.5 x log(0.5) + 0.5 x log(0.5) ) + 0.1666 x − ( 1 x log(1) ) + x − ( 0.33 x log(0.33) + 0.66 xlog(0.66) ) 0.3333 0.16666 0.5
56
Suppose we choose wealth (table now sorted by wealth values) genderagewealthpolitics femalemiddle-agedpoorLeft-wing maleoldpoorRight-wing femaleyoungpoorLeft-wing maleyoungpoorRight-wing malemiddle-agedrichRight-wing maleyoungrichRight-wing H(D|wealth) = 0.3333 x − ( 0.5 x log(0.5) + 0.5 x log(0.5) ) + x − ( 1 x log(1) ) 0.6666 0.3333
57
Suppose we choose gender (table now sorted by gender values) genderagewealthpolitics femalemiddle-agedpoorLeft-wing femaleyoungpoorLeft-wing maleoldpoorRight-wing malemiddle-agedrichRight-wing maleyoungpoorRight-wing maleyoungrichRight-wing H(D| gender) = 0.3333 x − ( 1 x log (1) ) + x − ( 1 x log (1) ) 0.3333 0.6666 This is the one we would choose...
58
Alternatives to Information Gain - all, somehow or other, give a measure of mixed-upness and have been used in building DTs Chi Square Gain Ratio, Symmetric Gain Ratio, Gini index Modified Gini index Symmetric Gini index J-Measure Minimum Description Length, Relevance RELIEF Weight of Evidence
59
Decision Trees Further reading is on google Interesting topics in context are: Pruning: close a branch down before you hit 0 entropy ( why?) Discretization and regression: trees that deal with real valued fields Decision Forests: what do you think these are?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.