Frontiers in the Convergence of Bioscience and Information Technologies 2007 Seyed Koosha Golmohammadi, Lukasz Kurgan, Brendan Crowley, and Marek Reformat University of Alberta Department of Electrical and Computer Engineering Classification of Cell Membrane Proteins This presentation and other related information are available at
Knowledge of cell membrane protein type is important –Critical for determining their function –Determining type of protein using traditional experimental methods is costly and time consuming Large and widening gap between known proteins (over 3.3 million) and annotated proteins Automatedaccurate Automated and accurate methods of classifying uncharacterized proteins are highly desirable Classification of Cell Membrane Proteins 1/10 Problem definition
Cell Membrane Proteins Classification of Cell Membrane Proteins 2/10
Methodology Classification of Cell Membrane Proteins 3/10
Datasets and test procedures Classification of Cell Membrane Proteins 5/10 Two datasets were used to design and test our system. These standard benchmark datasets allow for a fair comparison with other methods –2059 proteins were used to design the prediction system Chou, Prediction of protein cellular attributes using pseudo amino acid composition, Proteins (2001) 43: –2625 proteins were used for an independent test Chou and Elrod, Prediction of membrane protein types and subcellular locations, Proteins (1999) 34: Three test methods were used for evaluation of the performance of proposed prediction system –in-sample resubstitution (self-consistency) on the design dataset –out-of-sample jackknife (leave-one-out) on the design dataset –out-of-sample test on the independent dataset
Feature-based sequence representation Classification of Cell Membrane Proteins 4/10
Applying different classifiers to feature-based representation of proteins Classification of Cell Membrane Proteins 6/10 Decision Tree with Naive Bayes at the leaves K* -nearest neighbor Support Vector Machine with polynomial kernel K-nearest neighbor Neural Network with back propagation training 9 classifiers with the highest total accuracy
Our method results in a glance Classification of Cell Membrane Proteins 7/10 Test method Self- consistency JackknifeIndependent Accuracy [%]Overall Type I Type II Multipass Lipid GPI Specificity [%] Type I Type II Multipass Lipid GPI
Our method outperforms existing methods Classification of Cell Membrane Proteins 8/10 ClassifierReference Test method Self- consistency Jack-knifeIndependent K* This paper Ensemble of NNs Shen and Chou 2007 not available Fuzzy KNN Shen and Chou 2006 not available Stacking Wang et al OET-KNN Shen et al Weighted SVM Wang et al SLLE Wang et al not available Augmented covariant discriminant Chou SVM Cai et al not available
Conclusions Classification of Cell Membrane Proteins 9/10 The proposed method outperforms existing methods –higher accuracy in both jackknife and independent dataset tests The improved prediction quality of our method is a result of applying a comprehensive feature-based sequence representation –existing methods use either composition or pseudo amino acid composition for protein representation. –in contrast, our method uses seven feature sets for the same task –there might be other features that are not tested in this study and could further improve the prediction accuracy