Presentation is loading. Please wait.

Presentation is loading. Please wait.

Random Survival Forests

Similar presentations


Presentation on theme: "Random Survival Forests"— Presentation transcript:

1 Random Survival Forests
In python Nayan Chaudhary Siddharth VERMA

2 Contents Decision Trees and Random Forests
Survival Analysis –Problem Statement Survival Random Forests Project Goals

3 Tree-based algorithms
For Classification

4 What are Decision Trees?
A decision tree is a flowchart-like structure in which Each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails) Each branch represents the outcome of the test Each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules.

5 Source: http://www.doc.ic.ac.uk
A Quick Example Source:

6 The Decision Tree Algorithm
Source: A Course in Machine Learning by Hal Daume

7 Types of Decision Trees
ID3 (Iterative Dichotomiser 3) C4.5 (successor of ID3) CART (Classification And Regression Tree) CHAID (CHi-squared Automatic Interaction Detector). Performs multi-level splits when computing classification trees. MARS extends decision trees to handle numerical data better. Improvements in C4.5 over ID3 Handling both continuous and discrete attributes - In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Handling training data with missing attribute values - C4.5 allows attribute values to be marked as ? for missing. Missing attribute values are simply not used in gain and entropy calculations. Handling attributes with differing costs. Pruning trees after creation - C4.5 goes back through the tree once it's been created and attempts to remove branches that do not help by replacing them with leaf node CHAID uses the Bonferroni testing method to compare multiple hypotheses.

8 Splitting Criterion The score function can differ based on the various criterions such as, Entropy and Information Gain Inspired from Information Theory GINI Impurity Index Measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset. 𝑝 1 , 𝑝 2 , …, 𝑝 𝑛 represent the class probabilities in the child node that results from a split in the Tree. Also Σ 𝑝 𝑖 = 1 Used in ID3, C4.5, C5.0 Trees 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐻 𝑇 = 𝐼 𝐸 𝑝 1 , 𝑝 2 , …, 𝑝 𝑛 = 𝑖=1 𝑛 𝑝 𝑖 ×𝑙𝑜𝑔 𝑝 𝑖 H(T) represents the Entropy of the parent node. H(T|a) is the weighted sum of the entropies of the child node. 𝐼𝑛𝑓𝑜𝑔𝑎𝑖𝑛 𝐼𝐺 𝑇,𝑎 =𝐻 𝑇 −𝐻(𝑇|𝑎) Used in CART 𝐼 𝐺 𝑝 = 𝑖=1 𝑛 𝑝 𝑖 𝑗≠𝑖 𝑛 𝑝 𝑗 =1− 𝑖=1 𝑛 𝑝 𝑖 2 𝑝 𝑖 represents the probability of item with class label 𝑖 being chosen randomly. 𝑝 𝑗 represents the probability of incorrectly categorizing that item.

9 Decision Tree Boundary
Piece-wise Linear Feature 2 Feature 1 Source: Machine Learning w/ Spark

10 Decision Trees are great…
Are simple to understand and interpret. People are able to understand decision tree models after a brief explanation.  Have value even with little hard data. Allow the addition of new possible scenarios. Help determine worst, best and expected values for different scenarios. Can be combined with other decision techniques.

11 … except when they aren’t!
Most splitting criterion follow a greedy approach as they make the optimum decision at each node without taking into account the global optimum. Decision trees are prone to overfitting, especially when a tree is particularly deep. How can we minimize the effects of bias and variance from Decision Trees?

12 Enter Random Forest Random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction. Instead of searching for the best feature while splitting a node, it searches for the best feature among a random subset of features. This is known as an “ensemble approach” and creates a wide diversity, which generally results in a better model.

13 Source: Towards Data Science
A Quick Example Source: Towards Data Science

14 The Random Forest Algorithm
At the current node, randomly select p features from available features D. The number of features p is usually much smaller than the total number of features D. Compute the best split point for tree k using the specified splitting metric (Gini Impurity, Information Gain, etc.) and split the current node into daughter nodes and reduce the number of features D from this node on. Repeat steps 1 to 2 until either a maximum tree depth l has been reached or the splitting metric reaches some extrema. Repeat steps 1 to 3 for each tree k in the forest. Vote (class mode) or aggregate (averaging etc) on the output of each tree in the forest. Source: Towards Data Science

15 Time Complexities Building Decision Trees (unpruned) Parameters needed
Number of observations, n Number of attributes, v TC = O(v * n log n) Building Random Forests Parameters needed Number of observations, n Number of trees, nt Number of attributes (per tree) vt TC = O(nt * vt * n log n)

16 Survival Analysis

17 Survival Analysis Think of following scenarios:
How long will a patient ‘survive’ given pre-existing medical conditions and certain treatment ? How long before a customer ‘makes’ a purchase given their demographics and surfing history ? How long before a device ‘fails’ given the device characteristics ? How long before a student ’drops out’ given their family history, grades, locality etc. ?

18 Survival Data Analysis Window Censored Event Other event Subjects Time

19 Survival Analysis Predicting ‘Time-to-event’ for an entity, given predictors of the entity. For a given instance 𝑖, represented by a triplet 𝑋𝑖,𝑦𝑖,𝛿𝑖 where: 𝑋𝑖 is the feature vector 𝛿𝑖 is the binary event indicator, i.e. 𝛿𝑖 = 1 For an uncensored/evented instance while 𝛿𝑖 = 0 for a censored instance 𝑦𝑖 is the observed time 𝑦𝑖 = 𝑇𝑖 𝛿𝑖 = 1 𝐶𝑖 𝛿𝑖 = 0 For a new instance 𝑗, predict 𝑇𝑗 given feature vector 𝑋𝑗

20 Survival Analysis Focus on ‘Time-to-event’ data Important definitions:
Survival Function: Probability that instance would ‘survive’ for a time more than certain time 𝑡 𝑆 𝑡 =Pr⁡(𝑇>𝑡) Hazard Function:  is defined as the event rate at time 𝑡 conditional on survival until time 𝑡 or later i.e. (𝑇>𝑡) l 𝑡 = 𝑓(𝑡) 𝑆(𝑡) = −𝑑( ln 𝑆 𝑡 ) 𝑑𝑡 Cumulative Hazard Function Λ 𝑡 = 0 𝑡 l 𝑢 𝑑𝑢

21 Survival Analysis Methods
Non-Parametric Kaplan Meier Statistical Methods Semi-parametric Cox Regression parametric Accelerated Failure Time Survival Trees Survival SVM Machine Learning Methods Survival Ensembles ……

22 Survival Analysis Methods
Non-Parametric Kaplan Meier Statistical Methods Semi-parametric Cox Regression parametric Accelerated Failure Time Survival Trees Survival SVM Machine Learning Methods Survival Ensembles ……

23 Kaplan Meier 𝑆 𝑡 = 𝑁 𝑡 −𝐸 𝑡 ) 𝐸(𝑡) ∗ 𝑆 (𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑒𝑣𝑒𝑛𝑡 𝑡𝑖𝑚𝑒) Where:
𝑆 𝑡 = 𝑁 𝑡 −𝐸 𝑡 ) 𝐸(𝑡) ∗ 𝑆 (𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑒𝑣𝑒𝑛𝑡 𝑡𝑖𝑚𝑒) Where: 𝑁 𝑡 - # at risk at time t E 𝑡 - # event at time t

24 Survival Forest - Algorithm
Draw B bootstrap samples from the original data. Note that each bootstrap sample excludes on average 37% of the data, called out-of-bag data (OOB data). Grow a survival tree for each bootstrap sample. At each node of the tree, randomly select p candidate variables. The node is split using the candidate variable that maximizes survival difference between daughter nodes. Grow the tree to full size under the constraint that a terminal node should have no less than d0 > 0 unique deaths. Calculate a CHF for each tree. Average to obtain the ensemble CHF. Using OOB data, calculate prediction error for the ensemble CHF.

25 Survival Difference ? Log Rank Test

26 Evaluation Metric(s) Can’t use R2 as the problem not of standard regression. Can’t use standard classification measures as well. Use specialized evaluation metrics for survival analysis: C-index Brier Score Mean absolute error

27 Evaluation metric: C-index
It is a rank order statistic for predictions against true outcomes and is defined as the ratio of the concordant pairs to the total comparable pairs. Given the comparable instance pair 𝑖,𝑗 , with 𝑡𝑖 and 𝑡𝑗 are the actual observed times and S(𝑡𝑖) and S(𝑡𝑗) are the predicted survival times, The pair 𝑖,𝑗 is concordant if 𝑡𝑖 > 𝑡𝑗 and S(𝑡𝑖) > S(𝑡𝑗) The pair 𝑖,𝑗 is discordant if 𝑡𝑖 > 𝑡𝑗 and S(𝑡𝑖) < S(𝑡𝑗) With Censoring Comparable only with events a with those censored after the events Without Censoring A total of 5C2 comparable pairs

28 Project Goals Replicate R package for Survival Random Forests in Python Write good documentation which increases ease-of-use Something like - But with Survival Forests implementation

29 References Wikipedia Pages on Decision Tree and Random Forests
The Random Forest Algorithm – Towards Data Science by Niklas Douges A Course in Machine Learning by HalDaume Machine Learning with Spark Random Survival Forests – Ishwaran et.al , 2008 Machine Learning for survival analysis, Reddy et. al


Download ppt "Random Survival Forests"

Similar presentations


Ads by Google