CSCI 347, Data Mining Chapter 4 – Functions, Rules, Trees, and Instance Based Learning
Audio With PowerPoints Audio can be included in PowerPoint slides, when the slides are viewed as a slideshow. While watching a slides show, click the icon below to hear a description of the slide:
Functions/Linear Models
Example Let x be the value of a house in Butte, MT x = c + b * num_bedrooms + d * num_bathrooms + p * price_houses_in_neigh Where c, b, d, and p are coefficients “learned” from data mining algorithm 4
Simple Linear Regression Equation for the CPU Performance Data 5 PRP = CACH
More Precise Linear Regression Equation for the CPU Data 6 PRP = MYCT MMIN MMAX CACH CHMIN CHMAX
Linear Models Work most naturally with numeric attributes The outcome is a linear combination of attributes, a 1, a 2, …, a k, and weights w 0, w 1, …, w k : x = w 0 + w 1 *a 1 + w 2 *a 2 + … + w n *a n
Rules as Covering Algorithms
Covering Algorithms Rather than looking at what attribute to split on, start with a particular class Class by class, develop rules that “cover” the class
Example: Generating a Rule 10 If x > 1.2 then class = a If x > 1.2 and y > 2.6 then class = a If ??? then class = a
Example: Generating a Rule Possible rule set for class “b”: Could add more rules, get “perfect” rule set 11 If x > 1.2 and y > 2.6 then class = a If x 1.2 then class = b If x > 1.2 and y 2.6 then class = b
Decision Trees
Divide and conquer strategy Can be expressed recursively
Decision Tree Algorithm Constructing a decision tree can be expressed recursively: Select an attribute to place as the root node Make one branch for each possible value, splitting the example set into subsets, one for every value of the attribute. Repeat the process for each branch (recursion) Base case - stop if all instances have the same class, there are no more attributes to split on, or a pre-defined depth of the tree has been split.
Recursion Recursion – Recursion is the process of repeating items in a similar way. Example: Definition of a person’s ancestors: One’s parents are one’s ancestors (base case) The ancestor of one’s ancestors are also one’s ancestors (recursion step) Example: Definition of the Fibonacci sequence Fib(0)=0 and Fib(1)=1 (base cases) For all integers n>1, Fib(n) = Fin(n-1) + Fib(n-2) (recursion step)
Recursion
Which Attribute to Select? 17
Rules vs. Trees Corresponding decision tree: (produces exactly the same predictions) Covering algorithm concentrates on one class value at a time whereas decision tree learner takes all class values into account 18
Instance-Based Learning
No structure is learned Given an instance to predict, simply predict the class of its nearest neighbor Alternatively, predict the class which appears most frequently for the nearest k neighbors
Example Predict the class value of the following: OutlookTempHumidityWindyPlay rainyhotnormalfalse?
Example Predict the class value of the following: OutlookTempHumidityWindyPlay rainy false?
Manhattan Distance In two dimensions: if p = (p 1, p 2 ) and q = (q 1, q 2 )
Euclidean Distance Ordinary distance which one would measure with a ruler In two dimensions: if p = (p 1, p 2 ) and q = (q 1, q 2 ) Uses the Pythagorean Theorem
Euclidean Distance In n dimensions: