Download presentation
Presentation is loading. Please wait.
Published byAgatha Owen Modified over 9 years ago
1
Participation in the NIPS 2003 Challenge Theodor Mader ETH Zurich, tmader@student.ethz.ch Five Datasets were provided for experiments: ARCENE: cancer diagnosis DEXTER: text categorization DOROTHEA: drug discovery GISETTE: digit recognition MADELON: artificial data All problems were two-class Problems DATASETS BACKGROUND Feature Selection is a vital part for any classification procedure. Thus it is very important to understand common procedures and algorithms and see how they perform on real world data. The students had the chance to apply the knowledge acquired during class to real world datasets, learning about the strengths and weaknesses of the individual approaches. SUMMARY In the course on feature extraction of the winter semester 2005-2006 the students had to participate in the NIPS 2003 feature selection challenge. They experimented with different feature selection and classification methods in order to achieve high rankings. METHODS The students were provided with a Matlab machine learning package containing a variety of classification and feature extraction algorithms like supportvector machines, random probe method,... The results were ranked according to the balanced error rate (BER), i.e. the error rate of both classes weighted by the number of examples of each class. For this purpose a test set was used which was not available to the students. RESULTS ( BER in % ) DOROTHEA: drug discovery Size: 5000 features, 300 training examples, 300 validation examples Our Results indicate: feature selection by TP statistic was already sufficient, the relief criterion didn’t give much improvement. We optimized the number of features in order to beat the baseline model. Submission: my_model=chain({TP('f_max=1400’),naive, bias}) CONCLUSIONS Participating in the NIPS 2003 challenge was a very interesting opportunity. It could be clearly seen that in most cases there was no need to use very complicated classifiers, rather simple techniques (when applied correctly) and combined with powerful feature selection methods were faster and more successful. The importance of feature selection was hencee shown very clearly. Only using a fraction of the original features improved classification results and speed significantly. ARCENE: cancer diagnosis 10000 features, 100 training examples, 100 validation examples Very small number of training examples, merging training and validation set improved performance significantly. We empiricaly found that the probe method performed better than signal to noise ratio Optimal number of features found by crossvalidation: Submission: my_classif=svc({'coef0=1','degree=3','gamma=0','shrinkage=0.1'}) my_model=chain({probe(relief,{'p_num=2000', ‘f_max=2400'}), normalize, my_classif}) DEXTER: text categorization, 5000 features, 300 training examples, 300 validation examples. Also very small number of training examples => we merged training and validation set Initial experiments with principal component analysis showed no improvement We sticked to the baseline model and optimized the shrinkage and number of features parameter Our final submission: Feature selection by signal to noise ratio (1210 features kept) Normalization of features Support vector classifier with linear kernel GISETTE: digit recognition, 5000 features, 6000 training examples, 1000 validation examples Garbage features were easily removed by feature selection according to signal to noise ratio Applying gaussian blur to the input images brought significant improvement. Shrinkage could further improve the results my_classif=svc({'coef0=1', 'degree=4', 'gamma=0', 'shrinkage=0.5'}); my_model=chain({convolve(’dim1=5’,’dim2=5’), normalize, my_classif}) MADELON: artificial data 5000 features, 6000 training examples, 1000 validation examples The relief method performed extremely well: With only 20 features the classification error could be reduced to below 7% Shrinkage proved to be nearly as important as kernel width for the gaussian kernel. my_classif=svc({'coef0=1', 'degree=0', 'gamma=0.5', 'shrinkage=0.3'}) my_model=chain({probe(relief,{’f_max=20', 'pval_max=0'}), standardize, my_classif})
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.