Contraceptive Method Choice 指導教授 黃三益博士 組員 :B 王俐文 B 謝孟凌 B 陳怡珺
Background and Motivation Population of the world increases tremendously, people of present day pay more attention to contraceptive method.
Step one: Translate the Business Problem into a Data Mining Problem Topic: Contraceptive Method Choice Predict the current contraceptive method choice (no use, long-term methods, or short-term methods) of a woman based on her demographic and socio- economic characteristics. Especially what kind of couples would chose long- term method.
Step two: Select Appropriate Data Title: Contraceptive Method Choice Sources: Origin: Subset of the 1987 National Indonesia Contraceptive Prevalence Survey Creator: Tjen-Sien Lim Date: June 7, 1997
Step two: Select Appropriate Data Number of Instances: 1473 There is no missing value in this dataset.
Step two: Select Appropriate Data Number of attributes: 10 (including the class attribute) Wife's age Wife's education Husband's education Number of children ever born Wife's religion Wife's now working? Husband's occupation Standard-of-living index Media exposure Contraceptive method used (class attribute)
Step three: Get to Know the Data Attribute Information Attribute NameAttribute TypeDescription of Attribute Value Contraceptive method used class attribute1=No-use 2=Long-term 3=Short-term
Step three: Get to Know the Data Attribute Information Attribute NameAttribute TypeDescription of Attribute Value Wife's ageNumerical
Step three: Get to Know the Data Attribute Information Attribute NameAttribute TypeDescription of Attribute Value Wife's educationCategorical1=low 2, 3, 4=high
Step three: Get to Know the Data Attribute Information Attribute NameAttribute TypeDescription of Attribute Value Husband's educationCategorical1=low 2, 3, 4=high
Step three: Get to Know the Data Attribute Information Attribute NameAttribute TypeDescription of Attribute Value Number of children ever born Numerical
Step three: Get to Know the Data Attribute Information Attribute NameAttribute TypeDescription of Attribute Value Wife's religionBinary0=Non-Islam 1=Islam
Step three: Get to Know the Data Attribute Information Attribute NameAttribute TypeDescription of Attribute Value Wife's now working?Binary0=Yes 1=No
Step three: Get to Know the Data Attribute Information Attribute NameAttribute TypeDescription of Attribute Value Husband's occupationCategorical1, 2, 3, 4
Step three: Get to Know the Data Attribute Information Attribute NameAttribute TypeDescription of Attribute Value Standard-of-living indexCategorical1=low 2, 3, 4=high
Step three: Get to Know the Data Attribute Information Attribute NameAttribute TypeDescription of Attribute Value Media exposureBinary0=Good 1=Not good
Step Four : Create a Model Set Raw Data
Step Four : Create a Model Set Total 1473 samples 75% of the data as training set the rest of the data as testing set →By random sampling Rapid Miner
Step Five: Fix Problems with the Data No missing value Skewed distributions
Step Six : Transform Data to Bring Information to the Surface most of the values of the attribute named Media Exposure are “Good” the numeric variables to do the statistical analysis to finding outliers
Step7 Build Model By RapidMiner, build it with Decision Tree
Step7 Build Model(con’t)
Ripper Rule if wife_age > 30 and Num_children_born <= 1 then 1 (53 / 1 / 3) if Num_children_born <= 0 then 1 (36 / 0 / 0) if Wife_education = 4 and wife_age 3 then 2 (0 / 14 / 0) if Wife_education = 1 and Husband_occupation = 2 then 1 (17 / 0 / 1) if Wife_education = 4 and wife_age > 33 and Num_children_born > 2 and Husband_occupation = 1 and Num_children_born <= 3 then 2 (1 / 10 / 2) Step7 Build Model(con’t)
if Num_children_born > 2 and wife_age 28 then 3 (1 / 0 / 13) if wife_age 4 and Media_exposure = 0 then 3 (1 / 2 / 12) if Husband_education = 4 and wife_age 37 then 2 (0 / 5 / 0) else 1 (305 / 168 / 281) Step7 Build Model(con’t)
Weka-JRip (Wife_education = 4) and (Num_children_born >= 3) and (wife_age >= 35) => method_used=2 (178.0/76.0) (wife_age = 3) => method_used=3 (271.0/120.0) (wife_age = 1) and (wife_age method_used=3 (106.0/51.0) => method_used=1 (771.0/342.0) Step7 Build Model(con’t)
Step 8 Assess Model Decision Tree
Step 8 Assess Model(con’t) Ripper Rule
Step 8 Assess Model(con’t) JRip Rule
Conclusion Result The problems we should improve more data ignore some attributes details of the attribute are not so clear period and environment have changed
Thanks for you listening…