Download presentation
Presentation is loading. Please wait.
Published byAriel Baldwin Modified over 9 years ago
1
Figure 1.1 Rules for the contact lens data.
2
Figure 1.2 Decision tree for the contact lens data.
3
Figure 1.3 Decision trees for the labor negotiations data. (a)(b)
4
Figure 2.1 A family tree and two ways of expressing the sister-of relation.
5
Figure 2.2 ARFF file for the weather data.
6
Figure 3.1 Decision tree for a simple disjunction.
7
Figure 3.2 The exclusive-or problem. If x=1 and y=0 then class = a If x=0 and y=1 then class = a If x=0 and y=0 then class = b If x=1 and y=1 then class = b
8
Figure 3.3 Decision tree with a replicated subtree. If x=1 and y=1 then class = a If z=1 and w=1 then class = a Otherwise class = b
9
Figure 3.4 Rules for the Iris data. Default: Iris-setosa 1 except if petal-length 2.45 and petal-length < 5.355 2 and petal-width < 1.75 3 then Iris-versicolor 4 except if petal-length 4.95 and petal-width < 1.55 5 then Iris-virginica 6 else if sepal-length < 4.95 and sepal-width 2.45 7 then Iris-virginica 8 else if petal-length 3.35 9 then Iris-virginica10 except if petal-length < 4.85 and sepal-length < 5.9511 then Iris-versicolor12
10
Figure 3.5 The shapes problem. Shaded: standing Unshaded: lying
11
Figure 3.6(a) Models for the CPU performance data: linear regression. PRP = - 56.1 + 0.049 MYCT + 0.015 MMIN + 0.006 MMAX + 0.630 CACH - 0.270 CHMIN + 1.46 CHMAX
12
Figure 3.6(b) Models for the CPU performance data: regression tree.
13
Figure 3.6(c) Models for the CPU performance data: model tree.
14
Figure 3.7 Different ways of partitioning the instance space. (a)(b)(c)(d)
15
Figure 3.8 Different ways of representing clusters. 1 2 3 a 0.40.1 0.5 b 0.10.8 0.1 c 0.30.3 0.4 d 0.10.1 0.8 e 0.40.2 0.4 f 0.10.4 0.5 g 0.70.2 0.1 h 0.50.4 0.1 … (a)(b) (c)(d)
16
Figure 4.1 Pseudo-code for 1R.
17
Figure 4.2 Tree stumps for the weather data. (a)(b) (c)(d)
18
Figure 4.3 Expanded tree stumps for the weather data. (a)(b) (c)
19
Figure 4.4 Decision tree for the weather data.
20
Figure 4.5 Tree stump for the ID code attribute.
21
Figure 4.6 (a) Operation of a covering algorithm; (b) decision tree for the same problem. (a) (b)
22
Figure 4.7 The instance space during operation of a covering algorithm.
23
Figure 4.8 Pseudo-code for a basic rule learner.
24
Figure 5.1 A hypothetical lift chart.
25
Figure 5.2 A sample ROC curve.
26
Figure 5.3 ROC curves for two learning schemes.
27
Figure 6.1 Example of subtree raising, where node C is “raised” to subsume node B. (a)(b)
28
Figure 6.2 Pruning the labor negotiations decision tree.
29
Figure 6.3 Generating rules using a probability measure.
30
Figure 6.4 Definitions for deriving the probability measure. p number of instances of that class that the rule selects; t total number of instances that the rule selects; p total number of instances of that class in the dataset; t total number of instances in the dataset.
31
Figure 6.5 Algorithm for forming rules by incremental reduced error pruning.
32
Figure 6.6 Algorithm for expanding examples into a partial tree.
33
Figure 6.7 Example of building a partial tree. (a) (b) (c)
34
Figure 6.7 (continued) Example of building a partial tree. (d)(e)
35
Figure 6.8 Rules with exceptions, for the Iris data. Exceptions are represented as Dotted paths, alternatives as solid ones.
36
Figure 6.9 A maximum margin hyperplane.
37
Figure 6.10 A boundary between two rectangular classes.
38
Figure 6.11 Pseudo-code for model tree induction.
39
Figure 6.12 Model tree for a dataset with nominal attributes.
40
Figure 6.13 Clustering the weather data. (a) (c) (b)
41
Figure 6.13 (continued) Clustering the weather data. (d) (e)
42
Figure 6.13 (continued) Clustering the weather data. (f)
43
Figure 6.14 Hierarchical clusterings of the Iris data. (a)
44
Figure 6.14 (continued) Hierarchical clusterings of the Iris data. (b)
45
Figure 6.15 A two-calss mixture model. A 51 A 43 B 62 B 64 A 45 A 42 A 46 A 45 A 45 B 62 A 47 A 52 B 64 A 51 B 65 A 48 A 49 A 46 B 64 A 51 A 52 B 62 A 49 A 48 B 62 A 43 A 40 A 48 B 64 A 51 B 63 A 43 B 65 B 66 B 65 A 46 A 39 B 62 B 64 A 52 B 63 B 64 A 48 B 64 A 48 A 51 A 48 B 64 A 42 A 48 A 41 data model A =50, A =5, p A =0.6 B =65, B =2, p B =0.4
46
Figure 7.1 Attribute space for the weather dataset.
47
Figure 7.2 Discretizing temperature using the entropy method.
48
Figure 7.3 The result of discretizing temperature. 64 65 68 69 70 71 72 75 80 81 83 85 no yes yes no yes yes yes no no yes yes no yes yes F E D C B A 66.5 70.5 73.5 77.5 80.5 84
49
Figure 7.4 Class distribution for a two-class, two-attribute problem.
50
Figure 7.5 Number of international phone calls from Belgium, 1950–1973.
51
Figure 7.6 Algorithm for bagging. model generation Let n be the number of instances in the training data. For each of t iterations: Sample n instances with replacement from training data. Apply the learning algorithm to the sample. Store the resulting model. classification For each of the t models: Predict class of instance using model. Return class that has been predicted most often.
52
Figure 7.7 Algorithm for boosting. model generation Assign equal weight to each training instance. For each of t iterations: Apply learning algorithm to weighted dataset and store resulting model. Compute error e of model on weighted dataset and store error. If e equal to zero, or e greater or equal to 0.5: Terminate model generation. For each instance in dataset: If instance classified correctly by model: Multiply weight of instance by e / (1 - e). Normalize weight of all instances. classification Assign weight of zero to all classes. For each of the t (or less) models: Add -log(e / (1 - e)) to weight of class predicted by model. Return class with highest weight.
53
Figure 8.1 Weather data: (a) in spreadsheet; (b) comma-separated. (a)(b)
54
Figure 8.1 Weather data: (c) in ARFF format. (c)
55
Figure 8.2 Output from the J4.8 decision tree learner.
56
Figure 8.3 Using Javadoc: (a) the front page; (b) the weka.core package. (a) (b)
57
Figure 8.4 A class of the weka.classifiers package.
58
Figure 8.5 Output from the M5´ program for numeric prediction.
59
Figure 8.6 Output from J4.8 with cost-sensitive classification.
60
Figure 8.7 Effect of AttributeFilter on the weather dataset.
61
Figure 8.8 Output from the APRIORI association rule learner.
62
Figure 8.9 Output from the EM clustering scheme.
63
Figure 8.10 Source code for the message classifier.
64
Figure 8.10 (continued)
67
Figure 8.11 Source code for the ID3 decision tree learner.
68
Figure 8.11 (continued)
70
Figure 8.11
71
Figure 8.12 Source code for a filter that replaces the missing values in a dataset.
72
Figure 8.12
73
Figure 9.1 Representation of Iris data: (a) one dimension.
74
Figure 9.1 Representation of Iris data: (b) two dimensions.
75
Figure 9.2 Visualization of classification tree for grasses data.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.