Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining CSCI 307, Spring 2019 Lecture 6

Similar presentations


Presentation on theme: "Data Mining CSCI 307, Spring 2019 Lecture 6"— Presentation transcript:

1 Data Mining CSCI 307, Spring 2019 Lecture 6
Output: Tables, Linear Models, Trees

2 Output: Representing Structural Patterns
Many different ways of representing patterns Decision trees, rules, instance-based, … Also called “knowledge” representation Representation determines inference method Understanding the output is the key to understanding the underlying learning methods Different types of output for different learning problems (e.g. classification, regression, …)

3 Tables Simplest way of representing output:
Use the same format as input! Decision table for the weather problem: Simply find the row with the appropriate conditions and assign the class, in this case, play or not. Outlook Temperature Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes ...... If it is numeric prediction, the concept is the same, except instead of calling it a decision table, it's called a regression table.

4 Tables Sometimes some of the attributes aren't necessary for the decision. What if we don't need temperature and windy attributes? A smaller, condensed table might be better: Main problem: selecting the right attributes so as to make the right decision. Outlook Humidity Play Sunny High No Sunny Normal Yes Overcast High Yes Overcast Normal Yes Rainy High No Rainy Normal No

5 Another Simple Representation: Linear Models
Regression model Used when all the inputs (attribute values) and the output are numeric Output is the sum of weighted attribute values The trick is to find good values for the weights

6 A Linear Regression Function for the CPU Performance Data
Only the cache attribute here is used to predict the CPU performance. (It is easier to see in two dimensions.) PRP = CACH The is the "bias" term and is a weight as is the cache weight of The least squares linear regression method was used to come up with the weights. (We'll see how in Chapter 4.) The training data is used to come up with the weights. Given a test instance, plug the value of the cache attribute into the expression and the value of performance (i.e. the output/class) will be on the line. 6

7 Linear Models for Binary Classification
The line separates the two classes Decision boundary - defines where the decision changes from one class value to the other Prediction is made by plugging in observed values of the attributes into the expression Predict one class if output >= 0, and the other class if output < 0 Boundary becomes a high-dimensional plane (hyperplane) when there are multiple attributes

8 A Linear Decision Boundary Separating Iris Setosas from Iris Versicolors
setosa if result is >= 0 versicolor if result is < 0 2.0 – 0.5PetalLength – 0.8PetalWidth = 0

9 Trees “Divide-and-conquer” approach produces tree
Nodes involve testing a particular attribute Usually, attribute value is compared to constant Other possibilities: Comparing values of two attributes Using a function of one or more attributes Option nodes (choose more than one branch), i.e. an instance leads to two (or more) leaves, then the alternative predictions must be combined somehow (majority voting) Leaves assign classification, set of classifications, or probability distribution to instances Unknown instance is routed down the tree

10 Nominal and Numeric Attributes
number of children usually equal to number values ==> attribute won’t get tested more than once Other possibility: division into two subsets, so may get tested more than once

11 Nominal and Numeric Attributes
test whether value is greater or less than constant ==> attribute may get tested several times Other possibility: three-way split (or multi-way split) Integer: less than, equal to, greater than Real: below, within (i.e. close enough to be equal), above

12 Missing Values Does absence of value have some significance?
Yes ==> “missing” is a separate value No ==> “missing” must be treated in a special way Solution A: assign instance to most popular branch Solution B: split instance into pieces Pieces receive weights according to fraction of training instances that go down each branch Classifications from leaf nodes are combined using the weights that have percolated to them


Download ppt "Data Mining CSCI 307, Spring 2019 Lecture 6"

Similar presentations


Ads by Google