Decision Tree under MapReduce Week 14 Part II. Decision Tree.

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

Random Forest Predrag Radenković 3237/10
CHAPTER 9: Decision Trees
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Classification Techniques: Decision Tree Learning
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Trees.
Regression. So far, we've been looking at classification problems, in which the y values are either 0 or 1. Now we'll briefly consider the case where.
Large-Scale Machine Learning Program For Energy Prediction CEI Smart Grid Wei Yin.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Decision Tree Algorithm
Induction of Decision Trees
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
ICS 273A Intro Machine Learning
Chapter 6 Decision Trees
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Ensemble Learning (2), Tree and Forest
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
Decision Trees.
CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.
Scaling up Decision Trees Shannon Quinn (with thanks to William Cohen of CMU, and B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo of IIT)
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Scaling up Decision Trees. Decision tree learning.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to.
Today’s Topics Learning Decision Trees (Chapter 18) –We’ll use d-trees to introduce/motivate many general issues in ML (eg, overfitting reduction) “Forests”
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Lecture Notes for Chapter 4 Introduction to Data Mining
Bootstrapped Optimistic Algorithm for Tree Construction
Classification and Regression Trees
1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) [Edited by J. Wiebe] Decision Trees.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
k-Nearest neighbors and decision tree
Decision Trees.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Artificial Intelligence
Decision Trees (suggested time: 30 min)
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Decision Trees on MapReduce
Roberto Battiti, Mauro Brunato
Machine Learning in Practice Lecture 23
Junheng, Shengming, Yunsheng 10/19/2018
CS639: Data Management for Data Science
Presentation transcript:

Decision Tree under MapReduce Week 14 Part II

Decision Tree

A decision tree is a tree‐structured plan of a set of attributes to test in order to predict the output

Decision Tree Decision trees: – Split the data at each internal node – Each leaf node makes a prediction Lecture today: – Binary splits: X (j) <v – Numerical attributes – Regression

How to Make Predictions? Input: Example X i Output: Predicted y i ’ “Drop” x i down the tree until it hits a leaf node Predict the value stored in the leaf that x i hits

Decision Tree VS. SVM

How to Construct a Tree? Training dataset D*, |D*|=100 examples

How to Construct a Tree? Imagine we are currently at some node G – Let D G be the data that reaches G There is a decision we have to make: Do we continue building the tree? – If yes, which variable and which value do we use for a split? Continue building the tree recursively – If not, how do we make a prediction? We need to build a “predictor node”

3 Steps in Constructing a Tree

Step (1): How to Split? Pick an attribute and value that optimizes some criterion – Regression: Purity Find split (X (i), v) that creates D, D L, D R : parent, left, right child datasets and maximizes:

Step (1): How to Split? Pick an attribute and value that optimizes some criterion – Classification: Information Gain Measures how much a given attribute X tells us about the class Y IG(Y | X) : We must transmit Y over a binary link. How many bits on average would it save us if both ends of the line knew X?

Information Gain? Entropy Entropy: What’s the smallest possible number of bits, on average, per symbol, needed to transmit a stream of symbols drawn from X’s distribution?

Information Gain? Entropy Suppose we want to predict Y and we have X – X = College Major – Y = Like “Chicago”

Information Gain? Entropy Suppose we want to predict Y and we have X – X = College Major – Y = Like “Chicago”

Information Gain? Entropy Suppose we want to predict Y and we have X – X = College Major – Y = Like “Chicago”

Information Gain? Entropy

Suppose you are trying to predict whether someone is going live past 80 years From historical data you might find: – IG(LongLife | HairColor) = 0.01 – IG(LongLife | Smoker) = 0.3 – IG(LongLife | Gender) = 0.25 – IG(LongLife | LastDigitOfSSN) = IG tells us how much information about is contained in – So attribute X that has high IG(Y|X) is a good split!

Step (2): When to Stop? Many different heuristic options Two ideas: – (1) When the leaf is “pure” The target variable does not vary too much: Var(y i ) <  – (2) When # of examples in the leaf is too small For example, |D|  10

Step (3): How to Predict? Many options – Regression: Predict average y i of the examples in the leaf Build a linear regression model on the examples in the leaf – Classification: Predict most common y i of the examples in the leaf

Build Decision Tree Using MapReduce Problem: Given a large dataset with hundreds of attributes, build a decision tree General considerations: – Tree is small (can keep it memory): Shallow (~10 levels) – Dataset too large to keep in memory – Dataset too big to scan over on a single machine – MapReduce to the rescue!

PLANET Algorithm Parallel Learner for Assembling Numerous Ensemble Trees [Panda et al., VLDB ‘09] – A sequence of MapReduce jobs that builds a decision tree Setting: – Hundreds of numerical (discrete & continuous, but not categorical) attributes – Target variable is numerical: Regression – Splits are binary: X (j) < v – Decision tree is small enough for each Mapper to keep it in memory – Data too large to keep in memory

PLANET: Build the Tree

Decision Tree under MapReduce

PLANET: Overview We build the tree level by level – One MapReduce step builds one level of the tree Mapper – Considers a number of possible splits (X (i),v) on its subset of the data – For each split it stores partial statistics – Partial split‐statistics is sent to Reducers Reducer – Collects all partial statistics and determines best split Master grows the tree for one level

PLANET: Overview Mapper loads the model and info about which attribute splits to consider – Each mapper sees a subset of the data D* – Mapper “drops” each datapoint to find the appropriate leaf node L – For each leaf node L it keeps statistics about (1) the data reaching L (2) the data in left/right subtree under split S Reducer aggregates the statistics (1), (2) and determines the best split for each tree node

PLANET: Component Master – Monitors everything (runs multiple MapReduce jobs)

PLANET: Component Master node MapReduce: initialization (run once first) MapReduce: find best split (run multiple times) MapReduce: in-memory build (run once last)

Master Node Controls the entire process Determines the state of the tree and grows it: (1) Decides if nodes should be split (2) If there is little data entering a tree node, Master runs an InMemoryBuild MapReduce job to grow the entire subtree (3) For larger nodes, Master launches MapReduce FindBestSplit to evaluate candidates for best split Master also collects results from FindBestSplit and chooses the best split for a node (4) Updates the model

Initialization Initialization job: Identifies all the attribute values which need to be considered for splits – Initialization process generates “attribute metadata” to be loaded in memory by other tasks Which splits to even consider?

Initialization

FindBestSplit Goal: For a particular split node j find attribute x (j) and value v that maximize purity:

FindBestSplit To compute purity we need

FindBestSplit: Mapper

FindBestSplit: Reducer

Overall System Architecture

Back to the Master