1 CSC 8520 Spring 2013. Paula Matuszek CS 8520: Artificial Intelligence Machine Learning 1 Paula Matuszek Spring, 2013.

Slides:

Advertisements

Similar presentations

Critical Reading Strategies: Overview of Research Process

Advertisements

COMP3740 CR32: Knowledge Management and Adaptive Systems

Learning from Observations Chapter 18 Section 1 – 3.

DECISION TREES. Decision trees  One possible representation for hypotheses.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Spring 2004.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

TEMPLATE DESIGN © Genetic Algorithm and Poker Rule Induction Wendy Wenjie Xu Supervised by Professor David Aldous, UC.

Machine Learning CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 5.

1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Learning From Observations

An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.

Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.

Evaluating Hypotheses

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Three kinds of learning

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.

Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.

Classification.

Experimental Evaluation

Part I: Classification and Bayesian Learning

DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.

Inductive learning Simplest form: learn a function from examples

Chapter 9 – Classification and Regression Trees

1 CSC 8520 Spring Paula Matuszek Kinds of Machine Learning Machine learning techniques can be grouped into several categories, in several ways: –What.

CSC 9010 Spring Paula Matuszek CS 9010: Knowledge-Based Systems Automated Knowledge Acquisition Paula Matuszek Spring, 2011.

Learning from Observations Chapter 18 Through

CHAPTER 18 SECTION 1 – 3 Learning from Observations.

1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines (SVMs)

1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.

Data Mining and Decision Support

Copyright Paula Matuszek Kinds of Machine Learning.

CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,

Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer.

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

Chapter 18 Section 1 – 3 Learning from Observations.

Learning From Observations Inductive Learning Decision Trees Ensembles.

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

Machine Learning: Ensemble Methods

Learning from Observations

Learning from Observations

Choosing Inputs for Machine Learning

Introduce to machine learning

Machine Learning overview Chapter 18, 21

Machine Learning overview Chapter 18, 21

CSC 8520 Spring Paula Matuszek

Machine Learning: Introduction

School of Computer Science & Engineering

Presented By S.Yamuna AP/CSE

K Nearest Neighbors and Instance-based methods

Machine Learning: Introduction

Learning from Observations

Evaluating Classifiers

MIS2502: Data Analytics Classification Using Decision Trees

Learning from Observations

Decision trees One possible representation for hypotheses

Machine Learning overview Chapter 18, 21

Decision Trees - Intermediate

Presentation transcript:

1 CSC 8520 Spring Paula Matuszek CS 8520: Artificial Intelligence Machine Learning 1 Paula Matuszek Spring, 2013

2 CSC 8520 Spring Paula Matuszek CSC 8520 Spring Paula Matuszek

3 CSC 8520 Spring Paula Matuszek What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.” –Herbert Simon “Learning is constructing or modifying representations of what is being experienced.” –Ryszard Michalski “Learning is making useful changes in our minds.” –Marvin Minsky

4 CSC 8520 Spring Paula Matuszek Why learn? Understand and improve efficiency of human learning –Improve methods for teaching and tutoring people (better CAI) Discover new things or structure that were previously unknown to humans –Examples: data mining, scientific discovery Fill in skeletal or incomplete specifications about a domain –Large, complex AI systems cannot be completely derived by hand and require dynamic updating to incorporate new information. –Learning new characteristics expands the domain or expertise and lessens the “brittleness” of the system Build software agents that can adapt to their users or to other software agents Reproduce an important aspect of intelligent behavior

5 CSC 8520 Spring Paula Matuszek Learning Systems Many machine learning systems can be viewed as an iterative process of –produce a result, –evaluate it against the expected results –tweak the system Machine learning is also used for systems which discover patterns without prior expected results. May be open or black box –Open: changes are clearly visible in KB and understandable to humans –Black Box: changes are to a system whose internals are not readily visible or understandable.

6 CSC 8520 Spring Paula Matuszek Learner Architecture Any learning system needs to somehow implement four components: –Knowledge base: what is being learned. Representation of a problem space or domain. –Performer: does something with the knowledge base to produce results –Critic: evaluates results produced against expected results –Learner: takes output from critic and modifies something in KB or performer. May also need a “problem generator” to test performance against.

7 CSC 8520 Spring Paula Matuszek A Very Simple Learning Program Animals Guessing Game –Representation is a binary tree –Performer is a tree walker interacting with a human –Critic is the human player –Learning component elicits new questions and modifies the binary tree

8 CSC 8520 Spring Paula Matuszek What Are We Learning? Direct mapping from current state to actions Way to infer relevant properties of the world from the current percept sequence Information about changes and prediction of results of actions Desirability of states and actions Goals

9 CSC 8520 Spring Paula Matuszek Representing The Problem Representing the problem to be solved is the fir st decision to be made in any machine learning application It ’ s also the most important. And the one that depends most on knowing the domain -- the field in which the problem is set.

10 CSC 8520 Spring Paula Matuszek Representation How do you describe your problem? –I'm guessing an animal: binary decision tree –I'm playing chess: the board itself, sets of rules for choosing moves –I'm categorizing documents: vector of word frequencies for this document and for the corpus of documents –I'm fixing computers: frequency matrix of causes and symptoms –I'm OCRing digits: probability of this digit; 6x10 matrix of pixels; % light; # straight lines

CSC 8520 Spring Paula Matuszek 14 Performer We are building a machine learning system because we want to do something. –make a prediction –sort into categories –look for similarities The performer is the part of the system that actually does things. Once a system has learned, or been trained, this is the component we continue to use. It may be as simple as a formula to be applied, or it may be a complex program

12 CSC 8520 Spring Paula Matuszek Performer How do you take action? –Guessing an animal: walk the tree and ask associated questions –Playing chess: chain through the rules to identify a move; use conflict resolution to choose one; output it. –Categorizing documents: apply a function to the vector of features (word frequencies) to determine which category to put document in –Fixing computers: use known symptoms to identify potential causes, check matrix for additional diagnostic symptoms. –OCRing digits: input the features for a digit, output probability that it's 0-9.

CSC 8520 Spring Paula Matuszek 16 Critic This component provides the experience we learn from. Typically, it is a set of examples with the decision that should be reached or action that should be taken But may be any kind of feedback that gives an indication of how close we are to where we want to be.

14 CSC 8520 Spring Paula Matuszek Critic How do you judge correct actions? –Guessing an animal: human feedback –Fixing computers: Human input about symptoms and cause observed for a specific case –OCRing digits: Human-categorized training set. –Categorizing documents: match to a set of human- categorized test documents. –Categorizing documents: which are most similar in language or content? –Playing chess: who won? (Credit assignment problem) Can be generally categorized as supervised, unsupervised, reinforcement.

15 CSC 8520 Spring Paula Matuszek Supervised Learning In supervised learning, we provide the system with example training data and the result we want to see from those data. –each example, or training case, consists of a set of variables or features describing one case, including the decision that should be made –the system builds a model from the examples and uses the model to make a decision –the critic compares the actual decision to the desired decision –and tweaks the model to make the actual and desired decisions more similar

16 CSC 8520 Spring Paula Matuszek Supervised Learning Examples Learn to detect spam from example spam and non-spam Decide whether a house will sell from a list of its features Decide the age and gender of a skeleton

17 CSC 8520 Spring Paula Matuszek Unsupervised Learning In an unsupervised learning application, we do not give the system any a priori decisions. The task instead is to find similarities among the examples given and group them The critic is some measure of similarity among the cases in a group compared to those in a different group The data we provide define the kind of similarities and groupings we will find.

18 CSC 8520 Spring Paula Matuszek Unsupervised Learning The goal in unsupervised learning is often focused more on discovery than on specific decisions. Some examples: –do my search results have some natural grouping? (eg, “bank” should give results related to finance and results related to rivers) –can I identify categories or genres of books based on what people purchase?

19 CSC 8520 Spring Paula Matuszek Reinforcement Learning Reinforcement learning systems learn a series of actions or decisions, rather than a single decision, based on feedback given at the end of the series. –For instance, the Animals game makes multiple moves, but the critic gives only whether the game was won or lost.

CSC 8520 Spring Paula Matuszek 18 Learner The learner is the core of a machine learning system. It will –examine the information provided by the critic –use it to modify the representation to move toward a more desirable action the next time. –repeat until the performance is satisfactory, or until it stops improving There are many existing tools and systems which can be used here.

21 CSC 8520 Spring Paula Matuszek Learner What does the learner do? –Guessing an animal: elicit a question from the user and add it to the binary tree –Fixing computers: update frequency matrix with actual symptoms and outcome –OCRing digits: modify weights on a network of associations. –Categorizing documents: modify the weights on the function to improve categorization –Playing chess: increase the weight for some rules and decrease for others.

22 CSC 8520 Spring Paula Matuszek General Model of Learning Agent Environment Agent Critic Learning Element Problem Generator Performer with KB Performance Standard Sensors Effectors feedback learning goals changes knowledge

23 CSC 8520 Spring Paula Matuszek The Inputs We defined learning as changes in behavior based on experience. The nature of that experience is critical to the success of a learning system. In machine learning, that means we need to give careful attention to the examples we give the system to learn from.

24 CSC 8520 Spring Paula Matuszek Representative Examples Goal of our machine learning system is to act correctly over some set of inputs or cases. This is the population or corpus it will be applied to. The machine learning examples must accurately reflect the field or domain that we want to learn. The examples must be typical of the population for which we will eventually make decisions.

25 CSC 8520 Spring Paula Matuszek Typical Mistakes in Choosing Examples The “convenience” sample: –using the examples that are easy to get, whether or not they reflect the cases we will make decisions on later. The positive examples: –using only examples with one of the two possible outcomes; only loans made, for instance, not those denied The unbalanced examples –choosing different proportions of positive and negative examples than the population. For instance, about 7% of mammograms show some abnormality. If you wanted to train a system to recognize abnormal mammogram films, you should not use 50% normal and 50% abnormal films as example.

26 CSC 8520 Spring Paula Matuszek Are these good samples? All received in the last two days Every 10th test for strep received by a lab First 10,000 credit card acceptance records and first 10,000 credit card rejection records All incoming freshmen for Villanova in 2013 Every 10th book at

27 CSC 8520 Spring Paula Matuszek Feature Spaces We saw in knowledge representation that we are always using an abstraction which omits detail Machine Learning is similar -- which features of our examples do we include? –They should be relevant to what we want to learn –They should ideally be observable for every example –They should be independent of one another

28 CSC 8520 Spring Paula Matuszek Relevant Features We want our system to look at some features and some decision, and find the patterns which led to the decision. This will only work if the features we give the system are in fact related to the decision being made. Examples: –To decide whether a house will sell Probably relevant: price, square footage, age Probably irrelevant: name of the owner, day of the week

29 CSC 8520 Spring Paula Matuszek Which Are Relevant 1 To decide whether an illness is influenza 1. presence of fever 2. last name of patient 3. color of shoes 4. presence of cough 5. date

30 CSC 8520 Spring Paula Matuszek Which Are Relevant 2 Decide gender of a skeleton 1. shape of pelvis 2. gender of the examiner 3. length of femur 4. date 5. number of ribs 6. position of bones

31 CSC 8520 Spring Paula Matuszek Relevant Features A supervised learning algorithm can use the expected answer to ignore irrelevant features. but if the relevant ones aren’t included the system cannot perform well. For unsupervised systems, giving it only relevant features is critical. You must know your domain and have some feel for what matters to use these techniques successfully.

32 CSC 8520 Spring Paula Matuszek Irrelevant Features: A Painful Example There were about a ten million articles relevant to medicine indexed in Medline in 2011; about 28,500 of them included diabetes. Clearly, if you are doing research about diabetes, you are not going to read them all. Medline provides a reference and abstract for each. Can we cluster them by giving this information to a machine learning system, and get some idea of the overall pattern of the research?

33 CSC 8520 Spring Paula Matuszek Example Medline Abstract Acta Diabetol.Acta Diabetol Nov 16. [Epub ahead of print] Polymorphisms in the Selenoprotein S gene and subclinical cardiovascular disease in the Diabetes Heart Study. Cox AJ, Lehtinen AB, Xu J, Langefeld CD, Freedman BI, Carr JJ, Bowden DW.en AB,angefeld CD, Ca Bowden DW. Source Center for Human Genomics, Wake Forest School of Medicine, Winston-Salem, NC, USA.for Human G Forestf Medicin Abstract Selenoprotein S (SelS) has previously been associated with a range of inflammatory markers, particularly in the context of cardiovascular disease (CVD). The aim of this study was to examine the role of SELS genetic variants in risk for subclinical CVD and mortality in individuals with type 2diabetes mellitus (T2DM). The association between 10 polymorphisms tagging SELS and coronary (CAC), carotid (CarCP) and abdominal aortic calcified plaque, carotid intima media thickness and other known CVD risk factors was examined in 1220 European Americans from the family-basedDiabetes Heart Study. The strongest evidence of association for SELS SNPs was observed for CarCP; rs (5' region; β = 0.329, p = 0.044), rs (intron 5; β = 0.329, p = 0.036), rs (3' region; β = 0.331, p = 0.039) and rs (downstream; β = 0.375, p = 0.016) were all associated. In addition, rs (intron 5) was associated with CAC (β = , p = 0.032), and rs , rs and rs were all associated with self- reported history of prior CVD (p = ). These results suggest a potential role for the SELS region in the development subclinical CVD in this sample enriched for T2DM. Further understanding the mechanisms underpinning these relationships may prove important in predicting and managing CVD complications in T2DM. PMID: [PubMed - as supplied by publisher]

34 CSC 8520 Spring Paula Matuszek Diabetes clustering Representation: the entire text of set of Medline abstracts relevant to diabetes. Actor: Tool to display clusters of related documents Critic: a measure of how much vocabulary is in common between two abstracts Learner: a method which uses the critic to draw cluster boundaries such that the abstracts in a cluster have similar vocabularies

35 CSC 8520 Spring Paula Matuszek Result A good clustering tool created tight clusters. But the vocabulary they mostly had in common was the abbreviated journal titles. –Acta Diabetol So all the clusters did was assign articles to journals. Useful sometimes, but not here Would have been better to omit title, date, authors, etc, and just include the actual text.

36 CSC 8520 Spring Paula Matuszek Some data sets These are included with the Weka distribution. Most are drawn from the UC Irvine Machine Learning Repository at

37 CSC 8520 Spring Paula Matuszek Onward! Okay, when we have a good sample and good features, what do we do with them?

38 CSC 8520 Spring Paula Matuszek The Inductive Learning Problem Extrapolate from a given set of examples to make accurate predictions about future examples Concept learning or classification –Given a set of examples of some concept/class/category, determine if a given example is an instance of the concept or not –If it is an instance, we call it a positive example –If it is not, it is called a negative example Usually called supervised learning

39 CSC 8520 Spring Paula Matuszek Inductive Learning Framework Representation must extract from possible observations a feature vector of relevant features for each example. The number of attributes and values for the attributes are fixed (although values can be continuous). Each example is represented as a specific feature vector, and is identified as a positive or negative instance. Each example can be interpreted as a point in an n- dimensional feature space, where n is the number of attributes

40 CSC 8520 Spring Paula Matuszek Hypotheses The task of a supervised learning system can be viewed as learning a function which predicts the outcome from the inputs: –Given a training set of N example pairs (x 1, y 1 ) (x 2,y 2 )...(x n,y n ), where each y j was generated by an unknown function y = f(x), discover a function h that approximates the true function y h is our hypothesis, and learning is the process of finding a good h in the space of possible hypotheses Prefer simplest consistent with the data Tradeoff between fit and generalizability Tradeoff between fit and computational complexity

41 CSC 8520 Spring Paula Matuszek Decision Tree Induction Very common machine learning and data mining technique. Given: –Examples –Attributes –Goal (classification, typically) Pick “important” attribute: one which divides set cleanly. Recur with subsets not yet classified.

42 CSC 8520 Spring Paula Matuszek Expressiveness Decision trees can express any function of the input attributes. E.g., for Boolean functions, truth table row → path to leaf: Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic in x) but it probably won't generalize to new examples Prefer to find more compact decision trees

43 CSC 8520 Spring Paula Matuszek ID3 A greedy algorithm for decision tree construction originally developed by Ross Quinlan, 1987 Top-down construction of decision tree by recursively selecting “best attribute” to use at the current node in tree –Once attribute is selected, generate children nodes, one for each possible value of selected attribute –Partition examples using possible values of attribute, assign subsets of examples to appropriate child node –Repeat for each child node until all examples associated with a node are either all positive or all negative

44 CSC 8520 Spring Paula Matuszek Best Attribute What’s the best attribute to choose? The one with the best information gain –If we choose Bar, we have no: 3 -, 3 + yes: 3 -, 3+ –If we choose Hungry, we have no: 4-, 1 + yes: 1 -, 5+ –Hungry has given us more information about the correct classification.

45 CSC 8520 Spring Paula Matuszek Textbook restaurant domain Develop a decision tree to model the decision a patron makes when deciding whether or not to wait for a table at a restaurant Two classes: wait, leave Ten attributes: Alternative available? Bar in restaurant? Is it Friday? Are we hungry? How full is the restaurant? How expensive? Is it raining? Do we have a reservation? What type of restaurant is it? What’s the purported waiting time? Training set of 12 examples ~ 7000 possible cases

46 CSC 8520 Spring Paula Matuszek Thinking About It What might you expect a decision tree to have as the first question? The second?

47 CSC 8520 Spring Paula Matuszek A Decision Tree from Introspection

48 CSC 8520 Spring Paula Matuszek A Training Set

49 CSC 8520 Spring Paula Matuszek Choosing an attribute Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" Patrons? is a better choice

50 CSC 8520 Spring Paula Matuszek Learned Tree

51 CSC 8520 Spring Paula Matuszek How well does it work? Many case studies have shown that decision trees are at least as accurate as human experts. –A study for diagnosing breast cancer had humans correctly classifying the examples 65% of the time; the decision tree classified 72% correct –British Petroleum designed a decision tree for gas- oil separation for offshore oil platforms that replaced an earlier rule-based expert system –Cessna designed an airplane flight controller using 90,000 examples and 20 attributes per example

52 CSC 8520 Spring Paula Matuszek Evaluating Classifiers With a decision tree, or with any classifier, we need to know how well our trained model performs on other data Train on sample data, evaluate on test data (why?) Some things to look at: –classification accuracy: percent correctly classified –confusion matrix –sensitivity and specificity

53 CSC 8520 Spring Paula Matuszek Evaluating Classifying Systems Standard methodology: –1. Collect a large set of examples (all with correct classifications) –2. Randomly divide into two disjoint sets: training and test –3. Apply learning algorithm to training set –4. Measure performance with respect to test set Important: keep the training and test sets disjoint! To study the efficiency and robustness of an algorithm, repeat steps 2-4 for different training sets If you modify your algorithm, start again with step 1 to avoid evolving the algorithm to work well on just this collection

54 CSC 8520 Spring Paula Matuszek Confusion Matrix Is it spam?Predicted yesPredicted no Actually yesTrue positives False negatives Actually noFalse positives True negatives Note that “positive” vs “negative” is arbitrary

55 CSC 8520 Spring Paula Matuszek Specificity and Sensitivity sensitivity: ratio of labeled positives to actual positives –how much spam are we finding? specificity: ratio of labeled negatives to actual negatives –how much “real” are we calling ?

56 CSC 8520 Spring Paula Matuszek More on Evaluating Classifiers Overfitting: very close fit to training data which takes advantage of irrelevant variations in instances –performance on test data will be much lower –may mean that your training sample isn’t representative Is the classifier actually useful? –Compare to the “majority” classifier

57 CSC 8520 Spring Paula Matuszek Concept Check For binary classifiers A and B, for balanced data: –Which is better: A is 80% accurate, B is 60% accurate –Which is better: A has 90% sensitivity, B has 70% sensitivity –Which is the better classifier: A has 100 % sensitivity, 50% specificity B has 80% sensitivity, 80% specificity Would you use a spam filter that was 80% accurate? Would you use a classifier for who needs major surgery that was 80% accurate? Would you ever use a binary classifier that is 50% accurate?

58 CSC 8520 Spring Paula Matuszek Pruning With enough levels of a decision tree we can always get the leaves to be 100% positive or negative But if we are down to one or two cases in each leaf we are probably overfitting Useful to prune leaves; stop when –we reach a certain level –we reach a small enough size leaf –our information gain is increasing too slowly

59 CSC 8520 Spring Paula Matuszek Strengths of Decision Trees Strengths include –Fast to learn and to use –Simple to implement –Can look at the tree and see what is going on -- relatively “white box” –Empirically valid in many commercial products –Handles noisy data (with pruning) C4.5 and C5.0 are extension of ID3 that account for unavailable values, continuous attribute value ranges, pruning of decision trees, rule derivation.

60 CSC 8520 Spring Paula Matuszek Weaknesses and Issues Weaknesses include: –Univariate splits/partitioning (one attribute at a time) limits types of possible trees –Large decision trees may be hard to understand –Requires fixed-length feature vectors –Non-incremental (i.e., batch method) –Overfitting

61 CSC 8520 Spring Paula Matuszek Decision Tree Architecture Knowledge Base: the decision tree itself. Performer: tree walker Critic: actual outcome in training case Learner: ID3 or its variants –This is an example of a large class of learners that need all of the examples at once in order to learn. Batch, not incremental.

62 CSC 8520 Spring Paula Matuszek Strengths of Decision Trees Strengths include –Fast if the data set isn’t too large –Simple to implement –Often produces simpler tree than human experts –Output is reasonably understandable by humans –Empirically valid in many commercial products –Handles noisy data

63 CSC 8520 Spring Paula Matuszek Decision Tree Weaknesses Weaknesses include: –Univariate splits/partitioning (one attribute at a time) limits types of possible trees –Large decision trees may be hard to understand –Requires fixed-length feature vectors –Non-incremental (i.e., batch method) –For continuous or real-valued features requires additional complexity to choose decision points

64 CSC 8520 Spring Paula Matuszek Summary: Decision tree learning One of most widely used learning methods in practice Can out-perform human experts in many problems