Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Decision Trees Brief tutorial by M Werner.

Similar presentations


Presentation on theme: "Learning Decision Trees Brief tutorial by M Werner."— Presentation transcript:

1 Learning Decision Trees Brief tutorial by M Werner

2 Medical Diagnosis Example Goal – Diagnose a disease from a blood test Clinical Use –Blood sample is obtained from a patient –Blood is tested to measure current expression of various proteins, say by using a DNA microarray –Data is analyzed to produce a Yes or No answer

3 Data Analysis Use a decision tree such as: P1 > K1 P2 > K2 P3 > K3P4 > K4 YesNoYes No Yes Y N YY Y Y N N N N No Y

4 How to Build the Decision Tree Start with samples of blood from patients known to either have the disease or not (training set). Suppose there are 20 patients and 10 are known to have the disease and 10 not From the training set get expression levels for all proteins of interest i.e. if there are 20 patients and 50 proteins we get a 50 X 20 array of real numbers Rows are proteins Columns are patients

5 Choosing the decision nodes We would like the tree to be as short as possible Start with all 20 patients in one group Choose a protein and a level that gains the most information Px > Kx 10/10 9/31/7 10 have disease 10 don’t Possible splitting condition Mostly diseased Mostly not diseased Py > Ky 10/10 7/73/3 Alternative splitting condition

6 How to determine information gain Purity – A measure to which the patients in a group share the same outcome. A group that splits 1/7 is fairly pure – Most patients don’t have the disease 0/8 is even purer 4/4 is the opposite of pure. This group is said to have high entropy. Knowing that a patient is in this group does not make her more or less likely to have the disease. The decision tree should reduce entropy as test conditions are evaluated

7 Measuring Purity (Entropy) Let f(i,j)=Prob(Outcome=j in node i) i.e. If node 2 has a 9/3 split –f(2,0) = 9/12 =.75 –f(2,1) = 3/12 =.25 Gini impurity: Entropy:

8 Computing Entropy

9 Goal is to use a test which best reduces total entropy in the subgroups

10 Building the Tree

11 Links http://www.ece.msstate.edu/research/isip/ publications/courses/ece_8463/lectures/cu rrent/lecture_27/lecture_27.pdfhttp://www.ece.msstate.edu/research/isip/ publications/courses/ece_8463/lectures/cu rrent/lecture_27/lecture_27.pdf Decision Trees & Data Mining Andrew Moore Tutorial


Download ppt "Learning Decision Trees Brief tutorial by M Werner."

Similar presentations


Ads by Google