Presentation is loading. Please wait.

Presentation is loading. Please wait.

Decision trees part II Decision trees part II. LESSON TOPICS  CHAID method : Chi-Squared Automatic Interaction Detection  Chi-square test  Bonferroni.

Similar presentations


Presentation on theme: "Decision trees part II Decision trees part II. LESSON TOPICS  CHAID method : Chi-Squared Automatic Interaction Detection  Chi-square test  Bonferroni."— Presentation transcript:

1 Decision trees part II Decision trees part II

2 LESSON TOPICS  CHAID method : Chi-Squared Automatic Interaction Detection  Chi-square test  Bonferroni correction factor  Examples

3 Principal features of CHAID method

4 CHAID merges categories of the predictor that are homogeneous with respect to the dependent variable, but keeps distinct all the categories which are heterogeneous

5 CHAID uses Bonferroni multiplier for doing the needed adjustments in order for making simultaneous statistical inferences

6 CHAID, a differenza di altri metodi di partizione iterativa, è limitato a caratteri di tipo ordinale e nominale

7 It uses chi-square test for veryfing indipendence between characters (together with Bonferroni factor) for assessing significativity of partition

8 Chi-square test of independence  i i j j ( n ij - n ij ) 2 * * n ij * * x 2 =

9 where is the empirical frequency corresponding to the combination of modality i of the first character with modality j of the second character n ij

10 Is the corresponding theoretical frequency according to the hypothesis of indipendence between the two characters n ij = n i n j *

11 EXAMPLE Families according to residence and personal computer ownership (empirical frequencies)

12 Geographic zone Ownership of personal computer North- Center South Total YES NO Total 150 500 650 100 250 350 250 750 1000

13 Families according to residence and personal computer ownership (theoretical frequencies)

14 Geographic zone Ownership of personal computer North- Center South Total YES NO Total 162,5 487,5 650,0 87,5 262,5 350,0 250,0 750,0 1000,0

15 Test calculations: (500-487,5) 2 /487,5+ (87,5-100)2/87,5+ (162,5-150)2/162,5+ (250- 262,5)2/262,5=3,65 (500-487,5) 2 /487,5+ (87,5-100)2/87,5+ (162,5-150)2/162,5+ (250- 262,5)2/262,5=3,65

16 Bonferroni adjustment factor  Let us take that  is the first type error of the indipendence test in a two entry table with B e R (for example  =0,05) 4Let us consider the dependent variable R and the predictors B, with five modalities, and A, with two

17 There are 2 4 -1 = 15 different ways to make dichotomous variable B If the 15 test of hypothesis were indipendent, the probability of making a first type error would be: 1-(1-) 15 > 

18 In the above example, 15 is called Bonferroni factor 1 - (1-) M = M For the predictor A the probability of making a first type error is simply  If  è piccolo

19 In the CHAID method we compare the value of  associated with the test of indipendence for the variable A with the value of  for the variable B corrected with Bonferroni factor

20 Basic components of CHAID:

21 1 1 A categorical dependent variable 2 2 A set of independent variables, categorical too, combinations of which are used for defining the partitions 3 3 A set of parameters

22 In each step of the analysis, each subgroup is analyzed and we get the best predictor, defined as that which has the smallest value of  corrected by the smallest Bonferroni factor

23 Kinds of predictive variables in CHAID Floating 3 3 Free 2 2 Monotonic 1 1

24 The CHAID algorithm: STEP 1: Merging Step 2: Splitting Step 3: Stopping

25 Merging

26 For each predictor

27 Construct the complete two ways table 1 1

28 For each couple of categories that can be merged calculate chi-square test. For each couple which is not significative merge and go to step 3. If all the remaining couples are significative go to step 4 2 2

29 For each categories resulting from the merge of three or more categories originarie controlla con il test chi- quadrato se ogni categoria originaria può essere separata dalle altre. Torna al passo 2 3 3

30 Merge categories which have a too small number of observations, taking those which have the smallest value of chi-squared 4 4 Calculate the value of  corrected by Bonferroni factor on the table resulting by the merging process 5 5

31 Splitting  Take as the best predictor that which has the smallest value of  corrected by Bonferroni factor  If no predictor shows a significant value of , do not split that subgroup

32 Stopping Come back to step 1 and analyze the next subgroup. Stop when every subgroup has been analyzed or has too few observations

33 Dependent variable: Response rate to a promotional offer of subscribing a magazine Example of chaid method

34 Indipendent Variables

35 gender - 2 categories -monotonic - (GENDER) Presence of children - 2 categories - monotonic (KIDS) Family income - 8 categories - monotonic (INCOME) Head of the family age - 5 categories -floating (AGE)

36 Credit card - 2 categories - monotonic (BANKCARD) Number of components - 6 categories - floating - (HHSIZE) Occupational status -4 categories - free (OCCUP)

37 Representation of the partition process by a dendrogram

38 Total 1.15 81,040 HHSIZE OCCUP GENDER -4- -1- -2- -3- -5- -6- 23 1.52 16,132 45 1.92 6,198 ? 0.87 33,326 W 2.39 1,758 BO? 1.42 14,374 F 1.08 7,795 M 0.81 25,531 1 1.09 25,384

39 Interpretation of results Comparison of response accordin to the variable household size before and after merging

40 % of responses HHSIZE Frequency Before merging After merging 1 1 2 2 3 3 4 4 5 5 Missing value 25384 11240 4892 3187 3011 33326 1,09 1,49 1,59 1,79 2,06 0,87 1,09 1,52 1,92 0,87

41 Ranking of segments according to response rate

42 Rank Number Description Response rate 1 1 2 2 Segment 2 Segment 4 Household with two or tre components, head white collar 2,39 1,92 Households with four components and more

43 Rank Number Description Response rate 3 3 4 4 Segment 3 Segment 1 Household with two or three components, head with occupational staus different from white collar 1,42 1,09 Household with one component

44 Rank Number Description Response rate 5 5 Segment 6 Segment 5 Household with missing number of components, head female 1,o6 0,81 Household with missing number of components, head male


Download ppt "Decision trees part II Decision trees part II. LESSON TOPICS  CHAID method : Chi-Squared Automatic Interaction Detection  Chi-square test  Bonferroni."

Similar presentations


Ads by Google