Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 347, Data Mining Chapter 4 – Functions, Rules, Trees, and Instance Based Learning.

Similar presentations


Presentation on theme: "CSCI 347, Data Mining Chapter 4 – Functions, Rules, Trees, and Instance Based Learning."— Presentation transcript:

1

2 CSCI 347, Data Mining Chapter 4 – Functions, Rules, Trees, and Instance Based Learning

3 Audio With PowerPoints Audio can be included in PowerPoint slides, when the slides are viewed as a slideshow. While watching a slides show, click the icon below to hear a description of the slide:

4 Functions/Linear Models

5 Example Let x be the value of a house in Butte, MT x = c + b * num_bedrooms + d * num_bathrooms + p * price_houses_in_neigh Where c, b, d, and p are coefficients “learned” from data mining algorithm 4

6 Simple Linear Regression Equation for the CPU Performance Data 5 PRP = 37.06 + 2.47CACH

7 More Precise Linear Regression Equation for the CPU Data 6 PRP = - 56.1 + 0.049 MYCT + 0.015 MMIN + 0.006 MMAX + 0.630 CACH - 0.270 CHMIN + 1.46 CHMAX

8 Linear Models  Work most naturally with numeric attributes  The outcome is a linear combination of attributes, a 1, a 2, …, a k, and weights w 0, w 1, …, w k : x = w 0 + w 1 *a 1 + w 2 *a 2 + … + w n *a n

9 Rules as Covering Algorithms

10 Covering Algorithms  Rather than looking at what attribute to split on, start with a particular class  Class by class, develop rules that “cover” the class

11 Example: Generating a Rule 10 If x > 1.2 then class = a If x > 1.2 and y > 2.6 then class = a If ??? then class = a

12 Example: Generating a Rule Possible rule set for class “b”: Could add more rules, get “perfect” rule set 11 If x > 1.2 and y > 2.6 then class = a If x  1.2 then class = b If x > 1.2 and y  2.6 then class = b

13 Decision Trees

14  Divide and conquer strategy  Can be expressed recursively

15 Decision Tree Algorithm Constructing a decision tree can be expressed recursively:  Select an attribute to place as the root node  Make one branch for each possible value, splitting the example set into subsets, one for every value of the attribute.  Repeat the process for each branch (recursion)  Base case - stop if all instances have the same class, there are no more attributes to split on, or a pre-defined depth of the tree has been split.

16 Recursion Recursion – Recursion is the process of repeating items in a similar way. Example: Definition of a person’s ancestors:  One’s parents are one’s ancestors (base case)  The ancestor of one’s ancestors are also one’s ancestors (recursion step) Example: Definition of the Fibonacci sequence  Fib(0)=0 and Fib(1)=1 (base cases)  For all integers n>1, Fib(n) = Fin(n-1) + Fib(n-2) (recursion step)

17 Recursion

18 Which Attribute to Select? 17

19 Rules vs. Trees Corresponding decision tree: (produces exactly the same predictions) Covering algorithm concentrates on one class value at a time whereas decision tree learner takes all class values into account 18

20 Instance-Based Learning

21  No structure is learned  Given an instance to predict, simply predict the class of its nearest neighbor  Alternatively, predict the class which appears most frequently for the nearest k neighbors

22 Example Predict the class value of the following: OutlookTempHumidityWindyPlay rainyhotnormalfalse?

23 Example Predict the class value of the following: OutlookTempHumidityWindyPlay rainy89.065.0false?

24 Manhattan Distance In two dimensions: if p = (p 1, p 2 ) and q = (q 1, q 2 )

25 Euclidean Distance Ordinary distance which one would measure with a ruler In two dimensions: if p = (p 1, p 2 ) and q = (q 1, q 2 ) Uses the Pythagorean Theorem

26 Euclidean Distance In n dimensions:


Download ppt "CSCI 347, Data Mining Chapter 4 – Functions, Rules, Trees, and Instance Based Learning."

Similar presentations


Ads by Google