Induction and Decision Trees
Artificial Intelligence The design and development of computer systems that exhibit intelligent behavior. What is intelligence? Turing test: Developed in 1950 by Alan Turing (pioneer in computer science) Computer and human in one room Human “interrogator” in another room Interrogator asks questions...human OR computer answers If interrogator cannot tell whether the human or the computer is answering, then the computer is “intelligent”
Classification of AI Systems Knowledge Representation Systems Capture existing expert knowledge and use it to consult end-users and provide decision support Main types: Rule-based expert systems, Case-base reasoning systems, Frame-based knowledge systems, Semantic networks Machine Learning Algorithms that use mathematical or logical techniques for finding patterns in data and discovering or creating new knowledge Main types: Artificial neural networks, genetic algorithms, inductive decision trees, Naïve Bayesian algorithms, Clustering and pattern-recognition algorithms Data mining involves primarily a “machine learning” form of AI
Data Mining Textbook definition: Knowledge discovery in databases Using statistical, mathematical, AI, and machine learning techniques to extract useful information and subsequent knowledge from large databases Key point: identifying patterns in large data sets
5 Microsoft SQL Server Data Mining Algorithms Decision Trees Naïve Bayesian Clustering Sequence Clustering Association Rules Neural Network Time Series
Decision Trees for Machine Learning Based on Inductive Logic Three types of logical structures commonly used in AI systems: Deduction Abduction Induction
Deduction Premise (rule): if p then q Fact (axiom, observation): p Conclude: q This is classical logic (Modus Ponens). If the rule is correct, and the fact is correct, then you know that the conclusion will be correct. We are given the rule
Abduction Premise (rule): if p then q Fact (axiom, observation): q Conclude: p This form of reasoning is a logical fallacy called “affirming the consequent” (Post hoc ergo propter hoc). The conclusion may be wrong, but it is a plausible explanation of the fact, given the rule. Useful for diagnostic tasks. We are given the rule
Induction 1. Observe p and q together n. Observe p and q together Conclude: if p then q This is stereotypical thinking…highly error prone. We create the rule
Example – Baby in the kitchen
ID3 Decision Tree Algorithm “Iterative Dichotomizer” Developed by Ross Quinlan (1979) This is the basis for many commercial induction products The goal of this algorithm is to find rules resulting in YES or NO values. (Therefore, the output of generated rules have 2 possible outcomes) ID3 generates a tree, where each path of the tree represents a rule. The leaf node is the THEN part of the rule, and the nodes leading to it are the ANDS of attribute-value combinations in the IF part of the rule.
ID3 Algorithm Starting Point: an empty tree (this tree will eventually represent the final rules created by ID3) a recordset of data elements (e.g. records from a database) a set attributes (fields), each with some finite number of possible values NOTE: one of the attributes is the “decision” field, with a YES or NO value (or some other 2-valued option...GOOD/BAD, HIGH/LOW, WIN/LOSE, etc.) Output: a tree, where each path of the tree represents a rule
ID3 algorithm If all records in your recordset are positive (i.e. have YES values for their decision attribute), create a YES node and stop (end recursion) If all records in your recordset are negative, create a NO node and stop (end recursion) Select the attribute that best discriminates among the records (using an entropy function) Create a tree-node representing that attribute, with n branches, where n is the number of values for the selected attribute Divide the records of the recordset into subsets subrecordset 1, subrecordset 2,..., subrecordset n corresponding with each value of the selected attribute Recursively apply the algorithm to each subrecordset i, with reduced attribute set (don’t include already used attributes further down the path)
Calculating Entropy Entropy = mixture, chaos We want to pick the attribute with the lowest entropy: ideally, a particular value for the input attribute leads to ALL yes or ALL no in the outcome attribute…or come as close to this as possible An attribute’s entropy = Where n is the total number of possible values for the attribute and x i is the ith value
Baby’s RecordSet of Oven-Touching Experiences
ID3 Applied to Baby-in-the-Kitchen Which attribute to start with? Based on Entropy measure (assuming log base 2), Touch stove entropy = Mom in kitchen entropy = 1.0 To see this, note that: Probability of touching stove leading to ouch is.67, and not leading to ouch is *.33 =.22 Probability of mom being in kitchen leading to ouch is.5 and mom being in kitchen not leading to ouch is also.5.5 *.5 =.25
Applying the Touch Stove Attribute
Recurse …apply the Mom in Kitchen attribute where needed
Resulting decision rules If Touch_Oven = No then BOO_BOO = No If Touch_Oven = Yes and Mom_In_Kitchen = Yes then BOO_BOO = Yes If Touch_Oven = Yes and Mom_In_Kitchen = No then BOO_BOO = No
Now we’ll do this with Microsoft SQL Server