Download presentation
Presentation is loading. Please wait.
1
Annual Income Prediction Modeling Using SVM
Xinjue YU 12/14/2010
2
Annual Income Prediction
Why this problem? Useful in industries such as insurance, banking, marketing, etc Interested in the income distribution The goal: To predict whether a person has an annual income of more than $50,000 The information we have: Age, gender, education level, working hours per week, etc.
3
The Dataset The Adult dataset: 32561 total with 16281 for testing
Extracted from the 1994 Census database. A set of reasonably clean records was extracted ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0)) databases/adult/
4
Preparation There are 14 features in the raw dataset, using 4 out of 14 The 4 features that are used: gender, education level, aged and working hours per week Quantizing the features Education level: 1(<=high school), 2(<grad school) & 3(>=grad school) Gender: 0(Female) & 1(male) Age: 1(<30), 2(30-50) & 3(>50) Working hours per week: 1(<=40) & 2(>40)
5
The Approach Using Support Vector Machine in artificial neural network
The data are supposed to be non-separable Using SVM for non-separable pattern classification Trying different kernels such as Linear RBF Polynomial Sigmoid Using 2-D feature pairs first gender & education level Age and working hours per week Using 4 features in further study (increased complexity)
6
The Expected Results Predict a person’s annual income is whether more than 50K by the result using SVM (classification/clustering involved) Using testing data to get the error rates of different kernels Comparison of the results of different kernels Linear kernels are supposed to have the highest error rate Try to limit the error rate within 20%-30%
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.