Data Classification for Data Mining

Slides:

Advertisements

Similar presentations

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

Advertisements

Decision Trees Decision tree representation ID3 learning algorithm

Machine Learning III Decision Tree Induction

1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.

Classification Algorithms

Decision Tree Approach in Data Mining

Decision Tree Algorithm (C4.5)

ICS320-Foundations of Adaptive and Learning Systems

Classification Techniques: Decision Tree Learning

ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.

Machine Learning II Decision Tree Induction CSE 473.

Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.

Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.

CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.

Induction of Decision Trees

Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.

Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.

Machine Learning Chapter 3. Decision Tree Learning

Mohammad Ali Keyvanrad

Machine Learning Lecture 10 Decision Tree Learning 1.

CpSc 810: Machine Learning Decision Tree Learning.

Decision-Tree Induction & Decision-Rule Induction

CS690L Data Mining: Classification

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

Decision Tree Learning

1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 4-Inducción de árboles de decisión (1/2) Eduardo Poggi.

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.

Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.

CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.

Decision Tree Learning

Machine Learning Inductive Learning and Decision Trees

DECISION TREES An internal node represents a test on an attribute.

Decision Trees an introduction.

Università di Milano-Bicocca Laurea Magistrale in Informatica

CS 9633 Machine Learning Decision Tree Learning

Decision Tree Learning

Decision trees (concept learnig)

Machine Learning Lecture 2: Decision Tree Learning.

Decision trees (concept learnig)

Classification Algorithms

Decision Tree Learning

CSE543: Machine Learning Lecture 2: August 6, 2014

CS 9633 Machine Learning Concept Learning

Prepared by: Mahmoud Rafeek Al-Farra

Data Science Algorithms: The Basic Methods

Decision Trees: Another Example

Artificial Intelligence

Mining Time-Changing Data Streams

Data Science Algorithms: The Basic Methods

Introduction to Machine Learning Algorithms in Bioinformatics: Part II

Decision Tree Saed Sayad 9/21/2018.

Classification and Prediction

Machine Learning Chapter 3. Decision Tree Learning

Machine Learning: Lecture 3

Decision Trees Decision tree representation ID3 learning algorithm

Play Tennis ????? Day Outlook Temperature Humidity Wind PlayTennis

Machine Learning Chapter 3. Decision Tree Learning

Decision Trees.

Decision Trees Decision tree representation ID3 learning algorithm

Artificial Intelligence 6. Decision Tree Learning

Artificial Intelligence 9. Perceptron

Machine Learning Chapter 2

INTRODUCTION TO Machine Learning

A task of induction to find patterns

Decision Trees Jeff Storey.

A task of induction to find patterns

Version Space Machine Learning Fall 2018.

Machine Learning Chapter 2

Data Mining CSCI 307, Spring 2019 Lecture 6

Presentation transcript:

Data Classification for Data Mining Wei-Min Shen Information Sciences Institute University of Southern California 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) Outline What is Data Classification? Instance Space and Class Space Representation of Instance and Class Issues in Data Classification Symbolic algorithms Statistical algorithms 2/5/98 UCLA Data Mining Short Course (1)

Data Classification: predict future based on past Supervised learning (vs unsupervised) Inputs: Attributes, Training Instances Attributes: [independents, dependent/target] [Day, Outlook, Temperature, Humidity, Wind, PlayTennis] Instance: (D1, Sunny, Hot, High, Weak, Yes) Outputs: Expressions of predication made from dependents to targets (Outlook=Sunny) ==> (PlayTennis = Yes) 2/5/98 UCLA Data Mining Short Course (1)

Instance and Class Spaces Attributes: X={a,b,c}, Y={d,e} Instance: I1=[a,d], I2=[a,e], ... Instance Space I = {I1,I2,I3,I4,I5,I6} Class Space: one class is a set of instance, the number of classes = 2I Classes have partial orders: (X=a) is a subset (X=aX=b) Draw a picture of all possible classes of I 2/5/98 UCLA Data Mining Short Course (1)

Biases in Concept Learning Bias = A restricted subset H of 2I Measurement of Bias: Size of H: BI(H) = |H| Capacity of H (relative to an instance set S): LetH(S) = { S  h | h in H }, be a set of all subsets of S that can be “expressed” by H, the the capacity of H is: H(m) = max of H(S), over all S of size m Forms of restrictions Pure conj. atoms, ... X-DNF, ..., 2/5/98 UCLA Data Mining Short Course (1)

Criteria for Correctness Exactly Correct Reliable and Useful never say anything wrong, say “don’t know” Probably Approximately Correct (PAC) Learn with a high probability (1-) a concept with a small error  for any fixed training and testing distribution Use sample size that is a polynomial of 1- , and the complexity of the target concept At least as hard as Exactly Correct Distribution-free and distribution-sensitive (active) 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) Representation How to represent an instance and classes? Attribute-based instance and classes Decision Trees Decision Lists Logical Rules Relation-based instance and classes Instances: […, On(Book1, Table1), …] Classes: Graphs of of generalized objects and relations 2/5/98 UCLA Data Mining Short Course (1)

Issues of Data Classification Large amounts of data Hugh possibilities of classes Data has noise and error Instances have missing values 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) Decision Trees Outlook Sunny Rain Overcast Yes Humidity Wind Weak High Normal Strong No Yes No Yes 2/5/98 UCLA Data Mining Short Course (1)

Build Decision Trees from Data Given a node in tree and all its examples S Select the best attribute A for this node For each value vi of A, grow a subtree (or a leaf) under the node 2/5/98 UCLA Data Mining Short Course (1)

Select the best attribute? Let S be all examples, A an attribute, s an instance, v a value, c the number of class Sv= {s  S | A(s)=v } Entropy(S) =  i=1,,,c -pi lg pi The Gain of an attribute A respect to S is Gain(S,A) = Entropy(S) – vValues(A)(|Sv|/|S|)Entropy(Sv) 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) ID3 Algorithm ID3(Examples, TargetAttr, Attributes): Create a Root for the tree; If all Examples are positive (negative), return the Root with label=Yes (label= No); If Attributes is empty, return Root with the most common class value in Examples; Select the best A from Attributes, and for each value vi of A, add a new branch for A=vi, let E’ be the examples with A=vi; if E’ is empty, then add a leaf node, else add a node=ID3(E’, TargetAttr, Attributes-{A}) Return Root 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) An Example [Day, Outlook, Temp, Humidity, Wind, PlayTennis] (D1 Sunny Hot High Weak No) (D2 Sunny Hot High Strong No) (D3 Overcast Hot High Weak Yes) (D4 Rain Mild High Weak Yes) (D5 Rain Cool Normal Weak Yes) (D6 Rain Cool Normal Strong No) (D7 Overcast Cool Normal Strong Yes) (D8 Sunny Mild High Weak No) (D9 Sunny Cool Normal Weak Yes) (D10 Rain Mild Normal Weak Yes) (D11 Sunny Mild Normal Strong Yes) (D12 Overcast Mild High strong Yes) (D13 Overcast Hot Normal Weak Yes) (D14 Rain Mild High Strong No) 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) An Example Tree Built Gain(S, Outlook) = 0.246 Gain(S, Humidity) = 0.151 Gain(S, Wind) = 0.048 Gain(S, Temperature) = 0.029 [D1,D2,…...D14] Outlook Sunny Rain Overcast Yes ? ? [D3,D7,D12,D13] [D1,D2,D8,D9,D11] [D4,D5,D6,D10,D14] E’ = [D1,D2,D8,D9,D11] Gain(E’, Humidity) = 0.97 Gain(E’, Temperature) = 0.57 Gain(E’, Wind) 0.019 2/5/98 UCLA Data Mining Short Course (1)

Issues in Decision Tree Learning Avoid overfitting (use pruning) Splitting continuous values (split S examples into c subsets) Missing value (most common value, or use frequency) Different cost of attributes (use Gain(S,A)/Cost(A)) 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) Decision Lists L = (f1, v1), ..., (fj, vj), ..., (fr, vr). Decision test: fj, decision (class) value vj, the last decision test fr is always true Classify an instance from left to right k-DL is more expressive than k-DNF, k-CNF, and k-Decision Tree 2/5/98 UCLA Data Mining Short Course (1)

Decision Lists and Trees x1 1 x3 x2 1 1 [1] [0] [1] x4 1 [0] [1] DL1 = (~x1~x3, 1) (x1x2, 1) (x1~x2x4, 1) (true, 0) DL2 = (~x1~x3, 1) (~x1x3, 0) (~x2~x4, 0) (true, 1) DL3 = (~x1x3, 0) (x2, 1) (x1~x4, 0) (true, 1) 2/5/98 UCLA Data Mining Short Course (1)

Complementary Discrimination Learning Maintain the boundary between a concept hypothesis H and its complement ~H Move the boundary based on training examples until H become the target concept Algorithm: for every training example: determine if and where to move the boundary find out how much the movement should be actually move the boundary 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) CDL2 Algorithm Let x be a new instance and vx be its concept value Loop: Let Dj=(fj,vj) be the decision on x if vx = vj that is, the decision is correct then store x as an example of Dj and return, Let (g1,…,gd) = DIFFERENCES(examplesOf(Dj)=x); Replace (fj,vj) by (fj g1, vj),…,(fj gd, vj), Distribute the examples of Dj into the new decisions, If Dj was the last decision, then append (true, vx) at the end of the decision list. 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) An example of CDL2 Target concept: ~x1~x3  x1x2  x1~x2x4 Training examples, D1=00000, ......, D32=11111. Initial decision list: ((true, +)) with [D1,D2,D3,D4] Surprise: (D5=00100, -), difference=~x3 New decision list: ((~x3,+)(true, -)). 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) Relational Concepts Structured concepts: consists of relations e.g. relation: position(Circle1, Box2) = in. Structured concept space Language 1: existential conjunctive expressions E*(x1,x2,.., xr) f1  f2  f..  fs where objects are distinct and fi is a relation Language 2: Horn Clauses: R  L1, L2,..., Ln Allow negations (butt not constants) 2/5/98 UCLA Data Mining Short Course (1)

Rule learning algorithms CN2: Learning a set classification rules FOIL: learning horn clauses GOLEM: inductive logic programming Vere's generalization and specialization CDL3: learning first-order decision lists FOIDL: learning first-order decision lists Unsupervised learning of relational patterns 2/5/98 UCLA Data Mining Short Course (1)

A Small Network Example 3 6 5 7 8 4 1 2 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) Positive Examples Can-reach Linked-to 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) Satisfy a Horn Rule Given a horn clause RL1, L2,..., Ln, an instance i satisfies the clause iff there is a variable binding to i that make R true, and thereafter, existing binding for unbounded variables to make each Li true. CanReach(x,y)  LinkedTo(x,z)  CanReach(z,y) The instance [0,8] satifies the rule, but not [1,8]. 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) FOIL Algorithm FOIL(POS, NEG): VARS  the variables used in R, E+  POS, Until E+ is empty, do (BODY, VARS)  LearnClauseBody(NEG, E+, VARS), Create a Horn clause R  BODY, Remove from E+ all examples satisfy BODY, Return all Horn clauses created. 2/5/98 UCLA Data Mining Short Course (1)

UCLA Data Mining Short Course (1) FOIL Algorithm LearnClauseBody(E-, E+, VARS): Let BODY be empty, Until E- is empty, do: select a connected literal L that is satisfied by the most number of examples in E+ and the least number of examples in E-, add the new variables in L to VARS, conjoin L with BODY, remove from E- all examples that do not satisfy BODY, return BODY and VARS. 2/5/98 UCLA Data Mining Short Course (1)

Examples of FOIL Learning To learn CanReach(x,y) from the network example The background predicates LinkedTo and CanReach are defined by the tables FOIL first builds CanReach(x,y)  LinkedTo(x,y) then it builds CanReach(x,y)  LinkedTo(x,z), CanReach(z,y) 2/5/98 UCLA Data Mining Short Course (1)