Decision Trees and Voting

Slides:



Advertisements
Similar presentations
Is Random Model Better? -On its accuracy and efficiency-
Advertisements

COMP3740 CR32: Knowledge Management and Adaptive Systems
DECISION TREES. Decision trees  One possible representation for hypotheses.
CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Regression. So far, we've been looking at classification problems, in which the y values are either 0 or 1. Now we'll briefly consider the case where.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Induction of Decision Trees
Basic Data Mining Techniques
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Ensemble Learning (2), Tree and Forest
Basic Data Mining Techniques
Issues with Data Mining
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Chapter 9 – Classification and Regression Trees
Chapter 7: Transformations. Attribute Selection Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer.
Learning from Observations Chapter 18 Through
Artificial Intelligence in Game Design N-Grams and Decision Tree Learning.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Lecture Notes for Chapter 4 Introduction to Data Mining
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Classification and Regression Trees
Learning From Observations Inductive Learning Decision Trees Ensembles.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Part II - Classification© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II - Classification Margaret H. Dunham Department of Computer.
Data Transformation: Normalization
Chapter 7. Classification and Prediction
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
CO Games Development 2 Week 22 Trees
Artificial Intelligence
Ch9: Decision Trees 9.1 Introduction A decision tree:
Chapter 6 Classification and Prediction
Data Science Algorithms: The Basic Methods
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Classification and Prediction
Data Mining Practical Machine Learning Tools and Techniques
Analysis and Understanding
Data Mining for Business Analytics
CIS 488/588 Bruce R. Maxim UM-Dearborn
Data Mining – Chapter 3 Classification
CIS 488/588 Bruce R. Maxim UM-Dearborn
Decision Trees By Cole Daily CSCI 446.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Machine Learning in Practice Lecture 17
Chapter 7: Transformations
©Jiawei Han and Micheline Kamber
MIS2502: Data Analytics Classification Using Decision Trees
Decision Trees Jeff Storey.
STT : Intro. to Statistical Learning
Presentation transcript:

Decision Trees and Voting Bruce R. Maxim UM-Dearborn 12/5/2018

Decision Trees Outputs prediction based in inputs Inputs can be either continuous or symbolic values Two DT varieties Classification trees output categorical values Regression trees output continuous values The decision results are stored in the leaves Decisions are made by traversing the tree 12/5/2018

12/5/2018

Data Samples The use of data samples is common in pattern recognition work This data is obtained from observations regarding the phenomena to be predicted The data set contains of a set of attributes known as predictor variables (these are the inputs) The data set also contains a response variable known as the dependent variable (this is the output) 12/5/2018

Decision Tests Conditional tests Boolean (true , false) Sign (+ , -) Class (enumerated type) Range (domain divided into classes) The more predictor variables in the data sample, the taller and wider the tree 12/5/2018

DT Traversal Algorithm node = root; repeat results = node.evaluate(sample) for each branch from node if branch.match(result) node = branch.child; until node is leaf; return node.value; 12/5/2018

12/5/2018

Tree Induction Most DT’s are small enough to be built by hand like an expert system They can be learned or induced automatically by studying examples Most induction learning algorithms are based on the recursive partitioning algorithm This approach is fast and easy to implement 12/5/2018

Recursive Partitioning - 1 Divide and conquer Algorithm creates DT incrementally by batch processing datasets Statistical tests are used to create decision nodes and necessary conditional tests Initially tree is empty and attributes are chosen to divide data into rough subsets These subsets will be further split until desired level of precision is obtained 12/5/2018

Recursive Partitioning - 2 void function partition(dataset, node) { if (create_decision(dataset,node)) for each sample in dataset result = node.evaluate(sample); subset[result]add.(sample); } 12/5/2018

Recursive Partitioning - 3 for each result in subset { partition(subset,child); node.add(branch,result); branch.add(child); } 12/5/2018

Performance DT’s created by recursive partitioning usually have heights < 10 The depth of the tree is related to the number of attributes (inputs) that must be considered Each conditional test is only used once in a given DT It is possible that the same attribute may be used in more than one conditional test In fact it can be beneficial to split twice along the same dimension but in different places 12/5/2018

Splitting Data Sets - 1 Attributes are usually selected in a greedy manner, meaning the attribute that provides the best split is chosen each time Subsets that minimize error are given preference by greedy processes Batch processing is used to minimize the shortsightedness of basing decisions on local data 12/5/2018

Splitting Data Sets - 2 The goal is to create homogeneous subsets whose attribute measures are similar Measure of impurity used for sets is entropy 0 means all examples are same 1 means 50% / 50% mix of values Entropy is defined as the weighted logarithm for each class i for a set S with c classes Entropy(S) =  -pi * log2(pi) pi = |Si|/|S| 12/5/2018

Splitting Data Sets - 3 To actually determine the best split we need to determine the information gain that results from using a particular attribute to base our split on Gain(S,A) = Entropy(S) -  pi * Entropy(Si) The information gain is computed by subtracting the entropy of the subsets created during the split from the entropy of the set 12/5/2018

12/5/2018

Finding Best Partitioning Attribute - 1 void function create_decision(dataset, node) { max = 0; // find impurity for training data set entropy = compute_entropy(dataset); for each attribute in dataset // split and compute subset entropy e = entropy - compute_entropy_split(attribute,dataset); 12/5/2018

Finding Best Partitioning Attribute - 2 // find best positive gain if (e > max) { max = e; best = attribute; } } // end for // create test there is a good attribute if (best) node.evaluation = create_test(attribute) else // otherwise use leaf node node.class = find_class(dataset); 12/5/2018

Training Procedure Training dataset management is important to prevent overfitting of DT’s Three datasets are used Training Validation Testing For CT creation it often works to randomly assign 1/3 of available samples to each dataset category Training requires the most, testing the least 12/5/2018

Pruning Branches - 1 Done as a validation activity after learning is complete If classification result can be improved by pruning a DT branch we do so This requires each node to store a decision as if it were a leaf node Must use a different data set than those used for training 12/5/2018

Pruning Branches - 2 Done as post processing process after learning is complete As the new training set is processed it is important to note the times a parent node classifies a set of inputs correctly If the parent count of correct classifications is higher than the leaf node it is pruned 12/5/2018

12/5/2018

Bagging and Boosting - 1 Techniques for combining the results from several poor classifiers to create better ones The new classifiers are usually better than any of the individual DT’s that are used as input to these techniques There is no guarantee the new classifier will be any better The cost (memory and traversal) of combining classifiers increases linearly with number of DT’s used 12/5/2018

Bagging and Boosting - 2 Bagging Boosting arbitrarily selects samples from the training set class with most votes from individual DT’s becomes combined result Boosting assigns weights to samples to influence the final results DT’s with best training performance have greatest impact (most votes) on final results 12/5/2018

DT Advantages DT are very efficient for computing estimates for unknown data samples Data structure is compact and easy to traverse (good for data mining) Allow processing of both symbolic and continuous values Good for batch learning 12/5/2018

DT Disadvantages Poor efficiency for real-time in-game learning Recursive partitioning is a greedy algorithm often creating suboptimal trees and results (even if this is good enough for games) Data overfitting problems make it less likely to generalize to unknown data samples (secondary pruning may help reduce this risk) 12/5/2018

Detox Use architecture based on a decision tree Decisions include movement, aiming, and weapon selection Learns by imitating human players Performance is poor since there is no decomposition of actions based on either behavior or capabilities (not available in archive) 12/5/2018

DT Learning Approaches DT use situation features as predictor variables (e.g. health, terrain, distance) There are four different models for using additional context information Learning weapon appropriateness Learning weapon fitness Learning weapon properties Learning property importance 12/5/2018

12/5/2018

Learning Appropriate Weapon - 1 Set of environmental features is mapped to a particular weapon type Problems with this approach DT only returns one weapon for a given situation and won’t work if weapon is no available DT provides little insight into selection process since it returns a unique choice (compiled knowledge problem) AI needs to determine best weapon manually to supervise DT learning 12/5/2018

Learning Appropriate Weapon - 2 Possible fixes Provide different DT’s each based on different pools of available weapons (takes more memory and longer learning times) Available weapons could be used as additional inputs (combinatorial explosion) and the DT would be more likely to be error prone DT could provide an ordered list of selected weapons (DT needs to be modified to allow multi-dimensional responses) 12/5/2018

Learning Weapon Fitness - 1 DT for a particular weapon maps situation features into a single fitness value and a script is find to find the weapon with highest fitness value Advantages Each weapon has its own DT Organizes selection skills by weapon and allows for modular learning 12/5/2018

Learning Weapon Fitness - 2 Disadvantages Requires additional memory and program code Supervising learning is difficult for some situations Possible fixes Use weapon type as a predictor variable and use one tree to predict fitness of all weapons Use an existing voting system to induce the DT which can approximate the voting system 12/5/2018

12/5/2018

Learning Weapon Properties - 1 DT can learn weapon properties based on situations captured in data logs Advantages Can take animat skill into account as well as different situations Can generalize to multiple situations if data is rich enough Can use learning to enhance voting situation 12/5/2018

Learning Weapon Properties - 2 Disadvantages DT is not a self-standing solution Relies on other aspects of the AI 12/5/2018

Learning Property Votes Property votes based on features are learned from an existing voting system Once learning is complete the voting system is replaced by by a hierarchy of decisions Trades accuracy (voting system) for efficiency (DT) The fitness of a weapon is the sum of the fitness of the weapon’s characteristics 12/5/2018

Methodology - 1 Learning weapon fitness is the approach used to implement Selector Weapon fitness is learned within the game by inducing an incremental regression tree for each weapon The DT’s learned to estimate damage inflicted on targets based on situation features 12/5/2018

Methodology - 2 Simulation is used to allow the AI to learn trends associated with weapons The regression trees will be queried for weapon with highest damage estimates 12/5/2018

Regression Tree Advantages Decision process is less complex (only two stages) DT is compact and can be searched efficiently AI uses generalization to deal with new situations It is possible to induce weapon selection criteria from expert advice instead of voting Induced trees are easy to modify 12/5/2018

Initialization Purpose is to describe DT variable types <predictor name=“health” type=“integer” min=“0” max=“100” /> <predictor name=“distance” type=“real” min=“0” step=“36” /> <predictor name=“visible” type=“boolean” /> <response name=“fitness” type=“real” min=“0” max=“1” step=“0.1” /> 12/5/2018

DT Node Structure Some questions need to be answered for these nodes <decision> ... </decision> <branch> <match> ... </match> <Node> ... </Node> </branch> </Node> Some questions need to be answered for these nodes Limits on attributes and branches Are pattern recognition techniques available 12/5/2018

Interface To allow for different predictor variable types to be input to DT (any is predefined) any Predict(const vector,any>& inputs); Two learning interfaces float Increment(const Sample& sample); float Batch(const vector<Sample>& samples); 12/5/2018

Data Structures Tree Nodes Decision Attributes Contains children (an array of nodes), a decision, the response Decision Base virtual class that allows any decision needed to be implemented Attributes Represent both predictor and response variables (contain name, range, type, etc.) 12/5/2018

Tree Induction Incremental and batch learning algorithms are based on recursive partitioning On-line learning approach tries to minimize tree changes by only allowing relearning when necessary 12/5/2018

Weapon Selection - 1 function select_weapon( ) { // get sensor data env = interpret_features( ); // find best weapon from inventory max = 0; for each (weapon,ammo) in inventory // compute fitness fitness = dt.predict(env + weapon + ammo); 12/5/2018

Weapon Selection - 2 // remember if it is better if (fitness > max) { fitness = max; best = weapon; } } // end for // weapon can be selected return best; 12/5/2018

DT Induction Phases Interpreting features of environment provided by the sensors Monitor fight episodes when weapons are being used Computing desired fitness for each episode 12/5/2018

Interpreting Environmental Features Distance (near, medium, far) Health (low, high) Ammo (low, medium, high) Traveling (forward, backward) Constriction (low, medium, high) Fitness [0 … 100] 12/5/2018

Monitoring Fight Episodes - 1 Self-damage Identified by pain event broadcast by body shortly after projectile launch event Hit probability Number of enemy hits/Number of bullets fired Maximal damage Tracks most pain suffered by enemy for a particular weapons 12/5/2018

Monitoring Fight Episodes - 1 Potential damage per second Average damage over total time weapon was used Note: Identifying cause of damage can be a difficult task Matching pain signals and explosions with actions can help sort this out 12/5/2018

Computing Fitness Assumptions Low personal health means minimize self-damage Low enemy health means try to increase hit probability When enemy is facing away animat should try to maximize potential damage When enemy is facing animat try to maximize potential damage per second All fitness values rescaled to [0..100] 12/5/2018

Learning Desired Fitness - 1 void function learn_weapon(weapon,episode) { // gather sensor information env = interpret_features( ); // compute fitness if (episode.self_health < 25) fitness = -episode.self_damage; else if (episode.enemy_health < 40) fitness = episode.accuracy; 12/5/2018

Learning Desired Fitness - 2 // enemy facing away else if (episode.enemy_position > 0) fitness = episode.max_potential; else fitness = episode.enemy_damage_per_second; // incrementally induce fitness dt.increment(env + weapon,fitness); } 12/5/2018

Selector Uses decision tree to evaluate each weapon’s benefit In general chooses best weapon based on the situation 12/5/2018

Using Data It appears that the super shotgun works well up close and becomes less efficient as distance to target increases The railgun works well at larger distances and up close The constant aiming errors produce realistic aiming behavior, but could be improved by taking movement and relative direction of travel into account 12/5/2018

Evaluation - 1 The animat seems to make reasonable weapon selection choices based on context The animat often has a very limited number of weapons to select from and this leads to unusual choices at times Firefights are too brief to test weapon choice Might be wise to prune weapons with no ammo from the selection process 12/5/2018

Evaluation - 2 This solution requires a lots of supervision to get the DT set up and trained properly Fitness functions need to be computed manually or an expert needs to provide the training examples It would be preferable if system could learn to choose best weapon, without having to worry about the selection criteria 12/5/2018