Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Classification for Data Mining

Similar presentations


Presentation on theme: "Data Classification for Data Mining"— Presentation transcript:

1 Data Classification for Data Mining
Wei-Min Shen Information Sciences Institute University of Southern California 2/5/98 UCLA Data Mining Short Course (1)

2 UCLA Data Mining Short Course (1)
Outline What is Data Classification? Instance Space and Class Space Representation of Instance and Class Issues in Data Classification Symbolic algorithms Statistical algorithms 2/5/98 UCLA Data Mining Short Course (1)

3 Data Classification: predict future based on past
Supervised learning (vs unsupervised) Inputs: Attributes, Training Instances Attributes: [independents, dependent/target] [Day, Outlook, Temperature, Humidity, Wind, PlayTennis] Instance: (D1, Sunny, Hot, High, Weak, Yes) Outputs: Expressions of predication made from dependents to targets (Outlook=Sunny) ==> (PlayTennis = Yes) 2/5/98 UCLA Data Mining Short Course (1)

4 Instance and Class Spaces
Attributes: X={a,b,c}, Y={d,e} Instance: I1=[a,d], I2=[a,e], ... Instance Space I = {I1,I2,I3,I4,I5,I6} Class Space: one class is a set of instance, the number of classes = 2I Classes have partial orders: (X=a) is a subset (X=aX=b) Draw a picture of all possible classes of I 2/5/98 UCLA Data Mining Short Course (1)

5 Biases in Concept Learning
Bias = A restricted subset H of 2I Measurement of Bias: Size of H: BI(H) = |H| Capacity of H (relative to an instance set S): LetH(S) = { S  h | h in H }, be a set of all subsets of S that can be “expressed” by H, the the capacity of H is: H(m) = max of H(S), over all S of size m Forms of restrictions Pure conj. atoms, ... X-DNF, ..., 2/5/98 UCLA Data Mining Short Course (1)

6 Criteria for Correctness
Exactly Correct Reliable and Useful never say anything wrong, say “don’t know” Probably Approximately Correct (PAC) Learn with a high probability (1-) a concept with a small error  for any fixed training and testing distribution Use sample size that is a polynomial of 1- , and the complexity of the target concept At least as hard as Exactly Correct Distribution-free and distribution-sensitive (active) 2/5/98 UCLA Data Mining Short Course (1)

7 UCLA Data Mining Short Course (1)
Representation How to represent an instance and classes? Attribute-based instance and classes Decision Trees Decision Lists Logical Rules Relation-based instance and classes Instances: […, On(Book1, Table1), …] Classes: Graphs of of generalized objects and relations 2/5/98 UCLA Data Mining Short Course (1)

8 Issues of Data Classification
Large amounts of data Hugh possibilities of classes Data has noise and error Instances have missing values 2/5/98 UCLA Data Mining Short Course (1)

9 UCLA Data Mining Short Course (1)
Decision Trees Outlook Sunny Rain Overcast Yes Humidity Wind Weak High Normal Strong No Yes No Yes 2/5/98 UCLA Data Mining Short Course (1)

10 Build Decision Trees from Data
Given a node in tree and all its examples S Select the best attribute A for this node For each value vi of A, grow a subtree (or a leaf) under the node 2/5/98 UCLA Data Mining Short Course (1)

11 Select the best attribute?
Let S be all examples, A an attribute, s an instance, v a value, c the number of class Sv= {s  S | A(s)=v } Entropy(S) =  i=1,,,c -pi lg pi The Gain of an attribute A respect to S is Gain(S,A) = Entropy(S) – vValues(A)(|Sv|/|S|)Entropy(Sv) 2/5/98 UCLA Data Mining Short Course (1)

12 UCLA Data Mining Short Course (1)
ID3 Algorithm ID3(Examples, TargetAttr, Attributes): Create a Root for the tree; If all Examples are positive (negative), return the Root with label=Yes (label= No); If Attributes is empty, return Root with the most common class value in Examples; Select the best A from Attributes, and for each value vi of A, add a new branch for A=vi, let E’ be the examples with A=vi; if E’ is empty, then add a leaf node, else add a node=ID3(E’, TargetAttr, Attributes-{A}) Return Root 2/5/98 UCLA Data Mining Short Course (1)

13 UCLA Data Mining Short Course (1)
An Example [Day, Outlook, Temp, Humidity, Wind, PlayTennis] (D1 Sunny Hot High Weak No) (D2 Sunny Hot High Strong No) (D3 Overcast Hot High Weak Yes) (D4 Rain Mild High Weak Yes) (D5 Rain Cool Normal Weak Yes) (D6 Rain Cool Normal Strong No) (D7 Overcast Cool Normal Strong Yes) (D8 Sunny Mild High Weak No) (D9 Sunny Cool Normal Weak Yes) (D10 Rain Mild Normal Weak Yes) (D11 Sunny Mild Normal Strong Yes) (D12 Overcast Mild High strong Yes) (D13 Overcast Hot Normal Weak Yes) (D14 Rain Mild High Strong No) 2/5/98 UCLA Data Mining Short Course (1)

14 UCLA Data Mining Short Course (1)
An Example Tree Built Gain(S, Outlook) = Gain(S, Humidity) = Gain(S, Wind) = Gain(S, Temperature) = 0.029 [D1,D2,…...D14] Outlook Sunny Rain Overcast Yes ? ? [D3,D7,D12,D13] [D1,D2,D8,D9,D11] [D4,D5,D6,D10,D14] E’ = [D1,D2,D8,D9,D11] Gain(E’, Humidity) = 0.97 Gain(E’, Temperature) = 0.57 Gain(E’, Wind) 0.019 2/5/98 UCLA Data Mining Short Course (1)

15 Issues in Decision Tree Learning
Avoid overfitting (use pruning) Splitting continuous values (split S examples into c subsets) Missing value (most common value, or use frequency) Different cost of attributes (use Gain(S,A)/Cost(A)) 2/5/98 UCLA Data Mining Short Course (1)

16 UCLA Data Mining Short Course (1)
Decision Lists L = (f1, v1), ..., (fj, vj), ..., (fr, vr). Decision test: fj, decision (class) value vj, the last decision test fr is always true Classify an instance from left to right k-DL is more expressive than k-DNF, k-CNF, and k-Decision Tree 2/5/98 UCLA Data Mining Short Course (1)

17 Decision Lists and Trees
x1 1 x3 x2 1 1 [1] [0] [1] x4 1 [0] [1] DL1 = (~x1~x3, 1) (x1x2, 1) (x1~x2x4, 1) (true, 0) DL2 = (~x1~x3, 1) (~x1x3, 0) (~x2~x4, 0) (true, 1) DL3 = (~x1x3, 0) (x2, 1) (x1~x4, 0) (true, 1) 2/5/98 UCLA Data Mining Short Course (1)

18 Complementary Discrimination Learning
Maintain the boundary between a concept hypothesis H and its complement ~H Move the boundary based on training examples until H become the target concept Algorithm: for every training example: determine if and where to move the boundary find out how much the movement should be actually move the boundary 2/5/98 UCLA Data Mining Short Course (1)

19 UCLA Data Mining Short Course (1)
CDL2 Algorithm Let x be a new instance and vx be its concept value Loop: Let Dj=(fj,vj) be the decision on x if vx = vj that is, the decision is correct then store x as an example of Dj and return, Let (g1,…,gd) = DIFFERENCES(examplesOf(Dj)=x); Replace (fj,vj) by (fj g1, vj),…,(fj gd, vj), Distribute the examples of Dj into the new decisions, If Dj was the last decision, then append (true, vx) at the end of the decision list. 2/5/98 UCLA Data Mining Short Course (1)

20 UCLA Data Mining Short Course (1)
An example of CDL2 Target concept: ~x1~x3  x1x2  x1~x2x4 Training examples, D1=00000, , D32=11111. Initial decision list: ((true, +)) with [D1,D2,D3,D4] Surprise: (D5=00100, -), difference=~x3 New decision list: ((~x3,+)(true, -)). 2/5/98 UCLA Data Mining Short Course (1)

21 UCLA Data Mining Short Course (1)
Relational Concepts Structured concepts: consists of relations e.g. relation: position(Circle1, Box2) = in. Structured concept space Language 1: existential conjunctive expressions E*(x1,x2,.., xr) f1  f2  f..  fs where objects are distinct and fi is a relation Language 2: Horn Clauses: R  L1, L2,..., Ln Allow negations (butt not constants) 2/5/98 UCLA Data Mining Short Course (1)

22 Rule learning algorithms
CN2: Learning a set classification rules FOIL: learning horn clauses GOLEM: inductive logic programming Vere's generalization and specialization CDL3: learning first-order decision lists FOIDL: learning first-order decision lists Unsupervised learning of relational patterns 2/5/98 UCLA Data Mining Short Course (1)

23 A Small Network Example
3 6 5 7 8 4 1 2 2/5/98 UCLA Data Mining Short Course (1)

24 UCLA Data Mining Short Course (1)
Positive Examples Can-reach Linked-to 2/5/98 UCLA Data Mining Short Course (1)

25 UCLA Data Mining Short Course (1)
Satisfy a Horn Rule Given a horn clause RL1, L2,..., Ln, an instance i satisfies the clause iff there is a variable binding to i that make R true, and thereafter, existing binding for unbounded variables to make each Li true. CanReach(x,y)  LinkedTo(x,z)  CanReach(z,y) The instance [0,8] satifies the rule, but not [1,8]. 2/5/98 UCLA Data Mining Short Course (1)

26 UCLA Data Mining Short Course (1)
FOIL Algorithm FOIL(POS, NEG): VARS  the variables used in R, E+  POS, Until E+ is empty, do (BODY, VARS)  LearnClauseBody(NEG, E+, VARS), Create a Horn clause R  BODY, Remove from E+ all examples satisfy BODY, Return all Horn clauses created. 2/5/98 UCLA Data Mining Short Course (1)

27 UCLA Data Mining Short Course (1)
FOIL Algorithm LearnClauseBody(E-, E+, VARS): Let BODY be empty, Until E- is empty, do: select a connected literal L that is satisfied by the most number of examples in E+ and the least number of examples in E-, add the new variables in L to VARS, conjoin L with BODY, remove from E- all examples that do not satisfy BODY, return BODY and VARS. 2/5/98 UCLA Data Mining Short Course (1)

28 Examples of FOIL Learning
To learn CanReach(x,y) from the network example The background predicates LinkedTo and CanReach are defined by the tables FOIL first builds CanReach(x,y)  LinkedTo(x,y) then it builds CanReach(x,y)  LinkedTo(x,z), CanReach(z,y) 2/5/98 UCLA Data Mining Short Course (1)


Download ppt "Data Classification for Data Mining"

Similar presentations


Ads by Google