Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Types of Variables Objective:
DECISION TREES. Decision trees  One possible representation for hypotheses.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classification: Definition l Given a collection of records (training set) l Find a model.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Data Mining Classification: Naïve Bayes Classifier
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
CSci 8980: Data Mining (Fall 2002)
Decision Tree Algorithm
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Classification Continued
Lecture 5 (Classification with Decision Trees)
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Classification.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Learning Chapter 18 and Parts of Chapter 20
Introduction to Directed Data Mining: Decision Trees
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 9 – Classification and Regression Trees
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Modul 6: Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of.
Review - Decision Trees
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Lecture Notes for Chapter 4 Introduction to Data Mining
1 Illustration of the Classification Task: Learning Algorithm Model.
Big Data Analysis and Mining Qinpei Zhao 赵钦佩 2015 Fall Decision Tree.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Classification: Basic Concepts, Decision Trees. Classification Learning: Definition l Given a collection of records (training set) –Each record contains.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining By Tan, Steinbach,
Decision Trees.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Lecture Notes for Chapter 2 Introduction to Data Mining
Classification Algorithms
Ch9: Decision Trees 9.1 Introduction A decision tree:
Decision Tree Saed Sayad 9/21/2018.
Artificial Intelligence (CS 370D)
Data Mining Classification: Basic Concepts and Techniques
Prepared by: Mahmoud Rafeek Al-Farra
Basic Concepts and Decision Trees
Machine Learning: Lecture 3
Avoid Overfitting in Classification
COP5577: Principles of Data Mining Fall 2008 Lecture 4 Dr
Presentation transcript:

Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information Systems

Dr.Abeer Mahmoud (CHAPTER-18) LEARNING FROM EXAMPLES DECISION TREES

Dr.Abeer Mahmoud Outline 1- Introduction 2- know your data 3- Classification tree construction schema 3

Dr.Abeer Mahmoud Know your data? 4

Dr.Abeer Mahmoud 5 Types of Data Sets Record Relational records Data matrix, e.g., numerical matrix, crosstabs Document data: text documents: term- frequency vector Transaction data Graph and network World Wide Web Social or information networks Molecular Structures Ordered Video data: sequence of images Temporal data: time-series Sequential Data: transaction sequences Genetic sequence data Spatial, image and multimedia: Spatial data: maps Image data: Video data:

Dr.Abeer Mahmoud Types of Variables (data that are counted) (data that are measured)

Dr.Abeer Mahmoud Types of Variables (data that are counted) (data that are measured) “named”, i.e. classified into one or more qualitative categories or description In medicine, nominal variables are often used to describe the patient. Examples of nominal variables might include:  Gender (male, female)  Eye color (blue, brown, green, hazel)  Surgical outcome (dead, alive)  Blood type (A, B, AB, O)

Dr.Abeer Mahmoud Values have a meaningful order (ranking) Types of Variables (data that are counted) (data that are measured) In medicine, ordinal variables often describe the patient’s characteristics, attitude, behavior, or status. Examples of ordinal variables might include:  Stage of cancer (stage I, II, III, IV)  Education level (elementary, secondary, college)  Pain level (mild, moderate, severe)  Satisfaction level (very dissatisfied, dissatisfied, neutral, satisfied, very satisfied)  Agreement level (strongly disagree, disagree, neutral, agree, strongly agree)

Dr.Abeer Mahmoud assume values corresponding to isolated points along a line interval. That is, there is a gap between any two values. Types of Variables (data that are counted) (data that are measured) Variables that have constant, equal distances between values, but the zero point is arbitrary. Examples of interval variables:  Intelligence (IQ test score of 100, 110, 120, etc.)  Pain level (1-10 scale)  Body length in infant

Dr.Abeer Mahmoud uncountable number of values. can assume any value along a line interval, including every possible value between any two values. Types of Variables (data that are counted) (data that are measured) –Practically, real values can only be measured and represented using a finite number of digits –Continuous attributes are typically represented as floating-point variables

Dr.Abeer Mahmoud Classification 11

Dr.Abeer Mahmoud Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. 12

Dr.Abeer Mahmoud Illustrating Classification Task 13

Dr.Abeer Mahmoud Examples of Classification Task Applications Predicting tumor cells as benign or malignant Classifying credit card transactions as legitimate or fraudulent Categorizing news stories as finance, weather, entertainment, sports, etc 14

Dr.Abeer Mahmoud Different Types of Classifiers Linear discriminant analysis (LDA) Quadratic discriminant analysis (QDA) Density estimation methods Nearest neighbor methods Logistic regression Neural networks Fuzzy set theory Decision Trees 15

Dr.Abeer Mahmoud Decision Tree Learning 16

Dr.Abeer Mahmoud Decision tree is a classifier in the form of a tree structure –Decision node: specifies a test on a single attribute –Leaf node: indicates the value of the target attribute –Arc/edge: split of one attribute –Path: a disjunction of test to make the final decision Decision trees classify instances or examples by starting at the root of the tree and moving through it until a leaf node. Definition 17

Dr.Abeer Mahmoud key requirements Attribute-value description: object or case must be expressible in terms of a fixed collection of properties or attributes (e.g., hot, mild, cold). Predefined classes (target values): the target function has discrete output values (single or multiclass) Sufficient data: enough training cases should be provided to learn the model. 18

Dr.Abeer Mahmoud Training Examples 19

Dr.Abeer Mahmoud Shall we play tennis today? 20

Dr.Abeer Mahmoud Top-down tree construction schema: Examine training database and find best splitting predicate for the root node Partition training database Recourse on each child node Decision Tree Construction 21

Dr.Abeer Mahmoud Advantages of using DT Fast to implement Simple to implement because it perform classification without much computation Can convert result to a set of easily interpretable rules that can be used in knowledge system such as database, where rules are built from the label of the nodes and the labels of the arcs. Can handle continuous and categorical variables Can handle noisy data provide a clear indication of which fields are most important for prediction or classification 22

Dr.Abeer Mahmoud "Univariate" splits/partitioning using only one attribute at a time so limits types of possible trees large decision trees may be hard to understand Perform poorly with many class and small data. Disadvantages of using DT 23

Dr.Abeer Mahmoud Five algorithmic components: Decision Tree Construction (cont..) 1.Splitting (ID3, CART, C4.5, QUEST, CHAID, ……) 2.Pruning (direct stopping rule, test dataset pruning, cost- complexity pruning, statistical tests, bootstrapping) 3.Missing Values 4.Data access (CLOUDS, SLIQ, SPRINT, RainForest, BOAT, UnPivot operator) 5. Evaluation 24

Dr.Abeer Mahmoud 1-Splitting 25

Dr.Abeer Mahmoud 2.1- Splitting Depends on Data base Attributes types Statistical calculation technique or method (ID3, CART, C4.5, QUEST, CHAID, ……) 26

Dr.Abeer Mahmoud 2.1- Split selection Selection of an attribute to test at each node - choosing the most useful attribute for classifying examples. How to specify the test condition at each node?? attributes or features Class label or decision Attribute s types 27

Dr.Abeer Mahmoud Some possibilities are: Random: Select any attribute at random Least-Values: Choose the attribute with the smallest number of possible values (fewer branches) Most-Values: Choose the attribute with the largest number of possible values (smaller subsets) Max-Gain: Choose the attribute that has the largest expected information gain, i.e. select attribute that will result in the smallest expected size of the subtrees rooted at its children. How to specify the test condition at each node?? Attribute s types 28

Dr.Abeer Mahmoud Depends on attribute types  Nominal  Ordinal  Continuous How to specify the test condition at each node?? Attribute s types 29

Dr.Abeer Mahmoud  Nominal or categorical (Discrete): Domain is a finite set without any natural ordering (e.g., occupation, marital status (single, married, divorced…)) Nominal or Categorical (Discrete) Each non-leaf node is a test, its edge partitioning the attribute into subsets (easy for discrete attribute). temperature hot mild cool Attribute s types 30

Dr.Abeer Mahmoud  Ordinal: Domain is ordered, but absolute differences between values is unknown (e.g., preference scale, severity of an injury) ????? Ordinal attributes Attribute s types 31

Dr.Abeer Mahmoud  Continuous or Numerical: Domain is ordered and can be represented on the real line (e.g., age, income, temperatures degree ) Continuous or Numerical Convert to Binary Create a new boolean attribute A c, looking for a threshold c, Discretization to form an ordinal categorical attribute where ranges can be found by equal intervals How to find best threshold?? Attribute s types 32

Dr.Abeer Mahmoud 2.1- Splitting Depends on Data base Attributes types Statistical calculation technique or method (ID3, CART, C4.5, QUEST, CHAID, ……) 33

Dr.Abeer Mahmoud Statistical calculation technique Entropy Information gain Gain ratio 34

Dr.Abeer Mahmoud Entropy at a given node t: (NOTE: p( j | t) is the relative frequency of class j at node t). Measures homogeneity of a node. Maximum (log n c ) when records are equally distributed among all classes implying least information Minimum (0.0) when all records belong to one class, implying most information Entropy 35

Dr.Abeer Mahmoud 36 NoRiskCredit history DebtCollateralIncome 1highbadhighnone0-15 $ 2highunknownhighnone15-35$ 3moderateunknownlownone15-35$ 4highunknownlownone0-15 $ 5lowunknownlownoneOver 35$ 6lowunknownlowadequateOver 35$ 7highbadlownone0-15 $ 8moderatebadlowadequateOver 35$ 9lowgoodlownoneOver 35$ 10lowgoodhighadequateOver 35$ 11highgoodhighnone0-15 $ 12moderategoodhighnone15-35$ 13lowgoodhighnoneOver 35$ 14Highbadhighnone15-35$ Example

Dr.Abeer Mahmoud 37 Example starting with the population of loans  suppose we first select the income property  this separates the examples into three partitions  all examples in leftmost partition have same conclusion – HIGH RISK  other partitions can be further subdivided by selecting another property

Dr.Abeer Mahmoud 38 ID3 & information theory the selection of which property to split on next is based on information theory the information content of a tree is defined by I[tree] =  -prob(classification i ) * log 2 ( prob(classification i ) ) e.g., In credit risk data, there are 14 samples prob(high risk) = 6/14prob(moderate risk) = 3/14 prob(low risk) = 5/14 the information content of a tree that correctly classifies these examples: I[tree] = -6/14 * log 2 (6/14) + -3/14 * log 2 (3/14) + -5/14 * log 2 (5/14) = -6/14 * /14 * /14 * = 1.531

Dr.Abeer Mahmoud 39 ID3 & more information theory- example after splitting on a property, consider the expected (or remaining) content of the subtrees E[income] = 4/14 * I[subtree 1 ] + 4/14 * I[subtree 2 ] + 6/14 * I[subtree 3 ] = 4/14 * (-4/4 log 2 (4/4) + -0/4 log 2 (0/4) + -0/4 log 2 (0/4)) + 4/14 * (-2/4 log 2 (2/4) + -2/4 log 2 (2/4) + -0/4 log 2 (0/4)) + 6/14 * (-0/6 log 2 (0/6) + -1/6 log 2 (1/6) + -5/6 log 2 (5/6)) = 4/14 * ( ) + 4/14 * ( ) + 6/14 * ( ) = = 0.57 H H H M M HL L M L L L E[property] =  (# in subtree i / # of samples) * I [subtree i ]

Dr.Abeer Mahmoud 40 Credit risk example (cont.) what about the other property options? E[debt]?E[history]?E[collateral]?  after further analysis E[income] = 0.57 E[debt] = 1.47 E[history] = 1.26 E[collateral] = 1.33 the ID3 selection rules splits on the property that produces the minimal E[property]  in this example, income will be the first property split  then repeat the process on each subtree

Dr.Abeer Mahmoud It measures how well a given attribute separates the training examples according to their target classification Parent Node, p is split into k partitions; n i is number of records in partition i Measures Reduction in Entropy achieved because of the split. Choose the split that achieves most reduction (maximizes GAIN) [the first term in the equation is entropy of the original collection S, the second term is the expected value of the entropy after S is partitioned using attribute A] Used in ID3 and C4.5 Disadvantage: Tends to prefer splits that result in large number of partitions, each being small but pure. (larger trees Information Gain 41

Dr.Abeer Mahmoud Problem: If attribute has many values, Gain will select it Imagine using Date = Jun_3_1996 as attribute GainRatio Parent Node, p is split into k partitions n i is the number of records in partition i –Adjusts Information Gain by the entropy of the partitioning (SplitINFO). Higher entropy partitioning (large number of small partitions) is penalized! –Used in C4.5 42

Dr.Abeer Mahmoud 2.3- Missing Values  What is the problem?  During computation of the splitting predicate, we can selectively ignore records with missing values (note that this has some problems)  But if a record r misses the value of the variable in the splitting attribute, r can not participate further in tree construction Algorithms for missing values address this problem.  Simplest algorithm to solve this problem : -If X is numerical (categorical), impute the overall mean - if X is discrete attribute set the most common value 43

Dr.Abeer Mahmoud Thank you End of Chapter 18-part 2 44