Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G201249003 김영제 Database Lab.

Slides:



Advertisements
Similar presentations
Is Random Model Better? -On its accuracy and efficiency-
Advertisements

Random Forest Predrag Radenković 3237/10
CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Data Mining Classification: Alternative Techniques
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Ensemble Learning: An Introduction
Tree-based methods, neutral networks
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Quality-driven Integration of Heterogeneous Information System by Felix Naumann, et al. (VLDB1999) 17 Feb 2006 Presented by Heasoo Hwang.
Classification.
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
On the Application of Artificial Intelligence Techniques to the Quality Improvement of Industrial Processes P. Georgilakis N. Hatziargyriou Schneider ElectricNational.
Ensemble Learning (2), Tree and Forest
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Decision Tree Learning
Basic Data Mining Techniques
Machine Learning CS 165B Spring 2012
Issues with Data Mining
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
By Wang Rui State Key Lab of CAD&CG
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Bootstrapped Optimistic Algorithm for Tree Construction
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning Reading: Chapter Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous.
Data Mining Practical Machine Learning Tools and Techniques
DECISION TREES An internal node represents a test on an attribute.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Data Mining Practical Machine Learning Tools and Techniques
Classification by Decision Tree Induction
CSCI N317 Computation for Scientific Applications Unit Weka
Statistical Learning Dong Liu Dept. EEIS, USTC.
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 2nd Edition
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

Shared Ensemble Learning using Multi-trees 전자전기컴퓨터공학과 G 김영제 Database Lab

Introduction What is a decision tree? Each node in the tree specifies a test for some attribute of the instance Each branch corresponds to an attribute value Each leaf node assigns a classification Decision Tree for PlayTennis

Cost Associated with Machine Learning Generation costs Computational costs i.e. computer resource consumption Give better solutions for provided resources

Cost Associated with Machine Learning Application costs First Models are accurate in average This does not mean seamless, and confident Model can be highly accurate for frequent cases Extremely inaccurate for infrequent, critical situations i.e. diagnosis, fault detection

Cost Associated with Machine Learning Application costs Second Even accurate models can be useless If the purpose is to obtain some new knowledge not expressed in the form of rules the number of rules is too high The interpretation of results significant costs it may even be impossible

Construction of Decision Tree Tree construction Driven by a splitting criterion that selects the best split The selected split is applied to generate new branches The rest of splits are discarded Algorithm stops when the examples that fall into a branch belong to the same class

Construction of Decision Tree Pruning Removal of not useful parts of the tree in order to avoid over- fitting Pre-pruning performed during the construction of the tree Post-pruning performed by analyzing the leaves once the tree has been built

Merit and Demerit of Decision Tree Merit Allows the quick construction of a model Because decision trees are built in a eager way (greedy) Demerit It may produce bad models because of bad decisions

Multi-tree Structure Rejected splits are not removed But stored as suspended nodes Two new criteria required for the construction of a single decision tree Suspended node selection To populate a multi-tree, need to specify a criterion that selects one of the suspended nodes Selection of a model Select one or more comprehensible models according to a selection criterion

Multi-tree Structure

Shared Ensembles Combination Combination of a set of classifiers improves the accuracy of simple classifiers Combination methods Boosting, Bagging, Randomization, Stacking, Windowing The large amount of memory required to store Shared the common parts of the components of the ensemble Using the multi-tree

Shared Ensembles Combination

Original Good loser Bad loser MajorityDifference {40, 10, 30} {80, 0, 0} {40, 0, 0} {1, 0, 0} {0, -60, -20} {7, 2, 10} {0, 0, 19} {0, 0, 10} {0, 0, 1} {-5, -15, 1}

Experiments #DatasetSizeClasses Nom. Attr. Num. Attr. 1Balance-scale Cars Dermatology Ecoli Iris House-votes Monks Monks Monks New-thyroid Post-operative Soybean-small Tae Tic-tac Wine Information about datasets used in the experiments.

Experiments Arit.Sum.Prod.Max.Min. #Acc.Dev.Acc.Dev.Acc.Dev.Acc.Dev.Acc.Dev Geomean Comparison between fusion techniques

Experiments Max+OrigMax+GoodMax+BadMax+Majo.Max+Diff. #Acc.DevAcc.DevAcc.DevAcc.DevAcc.Dev Gmean Comparison between vector transformation methods

Experiments #Acc.Dev.Acc.Dev.Acc.Dev.Acc.Dev Gmean Influence of the size of the multi-tree

Experiments

References 54-escicucrri.pdf 54-escicucrri.pdf Shared Ensemble Learning using Multi-trees V. Estruch, C. Ferri, J. Hernandez-Orallo, M.J. Ramirez-Quintana Wikipedia

Thank you for listening