Conditional Classification Trees using Instrumental Variables Roberta Siciliano Valerio Aniello Tutore Department of Mathematics and Statistics University.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Design Rule Generation for Interconnect Matching Andrew B. Kahng and Rasit Onur Topaloglu {abk | rtopalog University of California, San Diego.
Random Forest Predrag Radenković 3237/10
CHAPTER 9: Decision Trees
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
CART: Classification and Regression Trees Chris Franck LISA Short Course March 26, 2013.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Outline 1) Objectives 2) Model representation 3) Assumptions 4) Data type requirement 5) Steps for solving problem 6) A hypothetical example Path Analysis.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Decision Tree Rong Jin. Determine Milage Per Gallon.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Tree-based methods, neutral networks
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
1 On statistical models of cluster stability Z. Volkovich a, b, Z. Barzily a, L. Morozensky a a. Software Engineering Department, ORT Braude College of.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
FLANN Fast Library for Approximate Nearest Neighbors
Chapter 5 Data mining : A Closer Look.
Discriminant Analysis Testing latent variables as predictors of groups.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Enterprise systems infrastructure and architecture DT211 4
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Data Mining Techniques
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley.
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Lecture Notes 4 Pruning Zhangxi Lin ISQS
1 Active learning based survival regression for censored data Bhanukiran Vinzamuri Yan Li Chandan K.
Recursive Partitioning And Its Applications in Genetic Studies Chin-Pei Tsai Assistant Professor Department of Applied Mathematics Providence University.
Chapter 10. Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributes Jean-Hugues Chauchat and Ricco.
Chapter 9 – Classification and Regression Trees
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Hierarchical Annotation of Medical Images Ivica Dimitrovski 1, Dragi Kocev 2, Suzana Loškovska 1, Sašo Džeroski 2 1 Department of Computer Science, Faculty.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Recursive Binary Partitioning Old Dogs with New Tricks KDD Conference 2009 David J. Slate and Peter W. Frey.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
APPLICATION OF DATAMINING TOOL FOR CLASSIFICATION OF ORGANIZATIONAL CHANGE EXPECTATION Şule ÖZMEN Serra YURTKORU Beril SİPAHİ.
Super Learning in Prediction HIV Example Mark van der Laan Division of Biostatistics, University of California, Berkeley.
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Classification and Regression Trees
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 18 Multivariate Statistics.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Mining Resource-Scheduling Protocols Arik Senderovich, Matthias Weidlich, Avigdor Gal, and Avishai Mandelbaum Technion – Israel Institute of Technology.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
JMP Discovery Summit 2016 Janet Alvarado
Chapter 7. Classification and Prediction
DECISION TREES An internal node represents a test on an attribute.
k-Nearest neighbors and decision tree
KAIR 2013 Nov 7, 2013 A Data Driven Analytic Strategy for Increasing Yield and Retention at Western Kentucky University Matt Bogard Office of Institutional.
Introduction to Machine Learning and Tree Based Methods
Supervised Learning Seminar Social Media Mining University UC3M
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
ECE 471/571 – Lecture 12 Decision Tree.
Gerd Kortemeyer, William F. Punch
A Unifying View on Instance Selection
A New Boosting Algorithm Using Input-Dependent Regularizer
MIS2502: Data Analytics Classification using Decision Trees
Data Mining for Business Analytics
Comparative Evaluation of SOM-Ward Clustering and Decision Tree for Conducting Customer-Portfolio Analysis By 1Oloyede Ayodele, 2Ogunlana Deborah, 1Adeyemi.
Decision Trees for Mining Data Streams
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

Conditional Classification Trees using Instrumental Variables Roberta Siciliano Valerio Aniello Tutore Department of Mathematics and Statistics University of Naples Federico II Massimo Aria

Outline The framework  Standard CART (Breiman et al., 1984)  TWO-STAGE methods (Mola and Siciliano, 1992) The problem  Specific context applications concerning structural information on either the objects or the variables  Conditional Data Analysis using Trees The proposed approach  Introduction of instrumental variables  Two methods and some applications

Tree-based model N1N1 N2N2 N3N3 N7N7 N 13 N 12 N5N5 N9N9 N8N8 N4N4 N6N6 Sample L = {y, x n ; n =1, …, N 1 } from the distribution (Y, X) where Y is the response and X is the set of predictors Segmentation  Recursive partitioning of objects to get homogeneous subgroups w.r.t. Y by means of a sequence of splitting variables generated by the predictors s 2  0 s 2  1s 3  0s 3  1 s 4  0 s 4  1s 6  0s 6  1  Supervised Classification and Regression to predict Y y8y8 y9y9 y 12 y 13 y7y7 s 1  0s 1  1

Two problems when using trees There are applications where objects are not intrinsically homogeneous as they can be distinguished into different subsamples  The final partitioning cannot ignore this condition! There are applications where the predictors are within-groups correlated and are organized into more blocks  Standard trees offer unstable and unfair tree partitioning!

The genesis of our contribution This work aims to define a segmentation methodology for three-way data matrix starting from some recent results (Tutore et al., 2006)). A three-way data matrix consists of measurements of a response variable, a set of predictors, and in addition a stratifying or descriptor variable (of categorical type). The stratifying instance plays the role of instrumental variable to identify either subsamples of objects or groups of predictors. Two basic methods are proposed: Partial predictability trees Multiple discriminant trees

Partial predictability trees: the idea In a context of classification dealing with categorical predictors (m=1,…, M), the instrumental (categorical/categorized) variable provides to distinguish different subsamples of objects: “A standard splitting criterion would divide the objects regardless their subsamples belonging” We introduce a criterion that finds the best split conditioned by the instrumental variable. Partial predictability trees can be understood as an extension of two- stage segmentation and of the CART methodology.

Partial predictability trees: the method We consider the two-stage splitting criterion based on the predictability index of Goodman and Kruskal for two-way cross- classification and its extensions to three-way cross-classifications due to Gray and Williams. 1. First stage: the best predictor is found maximizing the global prediction with respect to the response variable conditional to the instrumental variable; 2. Second stage: the best split of the best predictor is found maximizing the partial prediction.

The proposed splitting criterion First stage: at each node among all predictors we maximize the partial index to find the best predictor conditioned by the instrumental variable : where and are the multiple and the simple predictability measures. Second stage: we find the best split of the best predictor maximizing the partial index among all possible splits of the best predictors.

Partial predictability trees – An example A dataset about credit leave in Germany from UCI (Professor Dr. Hans Hofmann, University of Hamburg) 2026 objects 20 variables:  1 categorical response: bad or good client  18 predictors  1 instrumental variable (amount credit request – four classes)

Path 1-23 Good clients

Multiple discriminant trees: the idea The data set consists of G blocks of internally correlated covariates for and a dummy response variable Y The idea is to find a compromise of all predictors in each block and a compromise of all blocks in such a way to maximize the predictability of Y The approach is an improvement of TS-DIS Segmentation (Siciliano and Mola, 2002)

Multiple discriminant trees: the method I. Within-block latent compromises: In each block find the linear combination of the covariates using discriminant functions: II. Across-block latent compromise: Find the discriminant function of all III. Multiple factorial split: Find the best split of

A real dataset from Customer Satisfaction of a local public transport in objects 13 variables:  12 predictors  4 groups of predictors (every 3 predictors are correlated)  1 categorical response: global satisfaction Multiple Discriminant Trees – An example

Path 1-8 Unsatisfied customers

Path 1-55 Satisfied customers

Some remarks Partial Predictability Trees:  It can be demonstrated that considering only the second stage (trying out all possible splits of all predictors) yields to a conditional version of CART-like splitting criterion (based on the decrease of the impurity measure). Multiple Discriminant Trees:  Extension to Regression trees or Classification trees with more response classes Define the dummy response variable such to distinguish two groups of response values/classes which are the most distant without caring of predictors (the so-called retrospective split, Siciliano and Mola, 2002).

Conclusion remarks Main results of the proposed conditional methods:  Multiple discriminant trees allow to reduce input dimensionality introducing a multidimensional splitting that takes into account the structural relationship among the variables  Partial predictability trees provide a three- dimensional classification taking into account distinct subsamples at each node

Last but not the least point Matching two scientific worlds  Computational Statistics/Data Mining: The science of extracting useful information from large data sets by means of a strategy of analysis considering data preprocessing and statistical methods  Computer Science/Machine Learning: The approach that combines data-driven procedures with computational intensive methods by exploiting the information technology such to obtain a comprehensive and detailed explanation of the phenomenon under analysis Intelligent Data Analysis Statistical Learning and Information Management  Turning data into information and then information into knowledge are the main steps of the knowledge discovery process of the statistical learning paradigm.

References Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984): Classification and Regression Trees, Belmont C.A. Wadsworth. Hastie, T.J., Tibshirani, R.J., Friedman, J. (2001): The Elements of Statistical Learning, Springer Verlag. Mola, F., Siciliano, R. (1992): A two-stage predictive splitting algorithm in binary segmentation. In: Y. Dodge, J. Whittaker (Eds.): Computational Statistics: COMPSTAT ’92, 1, Physica Verlag, Heidelberg (D), Mola, F., Siciliano, R. (1997): A Fast Splitting Procedure for Classification Thees, Statistics and Computing, 7, Siciliano, R., Mola, F. (2002). Discriminant Analysis and Factorial Multiple Splits in Recursive Partitioning for Data Mining, in Roli, F., Kittler, J. (eds.): Proceedings of International Conference on Multiple Classifier Systems (Chia, June 24-26, 2002), , Lecture Notes in Computer Science, Springer, Heidelberg. Siciliano, R., Aria, M., Conversano, C. (2004). Harvesting trees: methods, software and applications. In Proceedings in Computational Statistics: 16° Symposium of IASC, held Prague, August 23-27, 2004 (COMPSTAT 2004), Electronical Edition (CD) Physica-Verlag, Heidelberg.