The Good News: Why are Decision Trees Currently Quite Popular for Classification Problems? Very robust --- good average testing performance: outperform.

Slides:

Advertisements

Similar presentations

COMP3740 CR32: Knowledge Management and Adaptive Systems

Advertisements

On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach Author: Steven L. Salzberg Presented by: Zheng Liu.

Chapter 5: Introduction to Information Retrieval

Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.

CHAPTER 9: Decision Trees

Classification with Multiple Decision Trees

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

AI – CS364 Hybrid Intelligent Systems Overview of Hybrid Intelligent Systems 07 th November 2005 Dr Bogdan L. Vrusias

Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

Recommender systems Ram Akella November 26 th 2008.

© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.

1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.

Data Mining Techniques

Possibilities for Applying Data Mining for Early Warning in Food Supply Networks Adrie J.M. Beulens,Yuan Li, Mark R. Kramer, Jack G.A.J. van der Vorst.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.

Data Mining Chun-Hung Chou

Issues with Data Mining

Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,

Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.

Mohammad Ali Keyvanrad

Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,

Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.

CS690L Data Mining: Classification

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.

Today’s Topics HW1 Due 11:55pm Today (no later than next Tuesday) HW2 Out, Due in Two Weeks Next Week We’ll Discuss the Make-Up Midterm Be Sure to Check.

Week 2 Introduction to Data Modelling

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.

Summary „Rough sets and Data mining” Vietnam national university in Hanoi, College of technology, Feb.2006.

Data Mining and Decision Support

Data Mining By: Johan Johansson. Mining Techniques Association Rules Association Rules Decision Trees Decision Trees Clustering Clustering Nearest Neighbor.

Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted.

Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden.

The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.

Anomaly Detection Nathan Dautenhahn CS 598 Class Lecture March 3, 2011.

General-Purpose Learning Machine

What Is Cluster Analysis?

k-Nearest neighbors and decision tree

An Artificial Intelligence Approach to Precision Oncology

School of Computer Science & Engineering

Prepared by: Mahmoud Rafeek Al-Farra

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Data Mining: Concepts and Techniques (3rd ed

Basic Intro Tutorial on Machine Learning and Data Mining

Prepared by: Mahmoud Rafeek Al-Farra

Prepared by: Mahmoud Rafeek Al-Farra

Data Mining 資料探勘分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育

Data Mining: Concepts and Techniques

Supporting End-User Access

Data Mining: Concepts and Techniques

The Good News: Why are Decision Trees Currently Quite Popular for Classification Problems? Very robust --- good average testing performance: outperform.

Chap 8. Instance Based Learning

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Nearest Neighbor Classifiers

Data Mining: Concepts and Techniques

Avoid Overfitting in Classification

CS639: Data Management for Data Science

Lecture 4. Niching and Speciation (1)

A task of induction to find patterns

Scientific forecasting

Data Mining CSCI 307, Spring 2019 Lecture 21

A task of induction to find patterns

Presentation transcript:

The Good News: Why are Decision Trees Currently Quite Popular for Classification Problems? Very robust --- good average testing performance: outperform other methods over sets of diverse benchmarks. Decision trees are still somewhat understandable for domain experts. Very useful in early stages of a data analysis project:attributes near the root are very important, attributes near the leafs are somewhat important, attributes that do not occur or occur very rarely near the leafs are not important. Information gain heuristic avoids searching a huge search space --- claim: searches an NP-hard search space quite well. The approach avoids the combinatorial explosion of rules/nodes that other approaches face through the use of sophisticated pruning techniques and because of its hierarchical knowledge representation approach. Can cope with: missing data, noisy data, mixed (numerical and symbolic) data. Easy to use; do not require to provide additional domain knowledge. Simplicity of the approach is appealing.

Decision Trees:The Bad News Rely on rectangular approximations --- this kind of approximations is sometimes not be well suited for particular application domains. Decision trees rely on the ordering of attribute values, and not their absolute differences; e.g. 5>3>1 and 3.0001>3>2.9999 is the same in the context of C5.0; basically, decision trees employ ordering based classification in contrast to distance-based classification which is used by techniques, such as nearest neighbors. If the notion of distance is of key importance for an application, decision trees might be less suitable for the application. Not necessary good for applications in which a lot of attributes have a minor impact and very few or no attributes have a major impact on a decision --- violates the hierarchical nature of decision trees. Data collections have to be in flat-file format, which causes problems with multi-valued attributes (but other approaches face similar problems) Summary: Although decision trees might not be “perfect” for all applications, I consider decision trees as one of the most promising machine learning and data mining technologies for classification tasks.

Decision Trees & the Concept Learning / Classification Tool Market Main Competitors (performance is “comparable” to decision trees): Neural Networks (good overall learning performance, have a hard time to tell what they learned) Support Vector Machines (somewhat new) Other Competitors (“inferior performance” or other problems): Fuzzy Techniques (combinatorial explosion of rules, not easy to use, lack of heuristics, poor learning performance) Discriminant Analysis (sound theoretical foundation, not very stable learning performance: does very well for some benchmarks and very badly for others) Association rule learning (needs symbolic data sets, combinatorial explosion of rules), Bayesian Rule-learning approaches(many diverse approaches which makes it difficult to evaluate the members of this group; most approaches are restricted to symbolic data sets) Classical and Symbolic Regression (poor learning performance) Nearest neighbor(success strongly depends on the availability of a “good” distance function; learning performance not very stable) Logic-based rule-learning approaches, such as AQ-family (currently not very popular) Remark: The following evaluation is based on research projects that benchmarked various approaches which were conducted by the author and his students Y.J. Kim, Brandon Rabke, Ruijiang Zhang, Jim Reynolds and Zheng Wen.