Frontiers in the Convergence of Bioscience and Information Technologies 2007 Seyed Koosha Golmohammadi, Lukasz Kurgan, Brendan Crowley, and Marek Reformat.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Naïve-Bayes Classifiers Business Intelligence for Managers.
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.
Data Mining Classification: Alternative Techniques
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Indian Statistical Institute Kolkata
High Throughput Computing and Protein Structure Stephen E. Hamby.
Discriminative and generative methods for bags of features
Lesson learnt from the UCSD datamining contest Richard Sia 2008/10/10.
Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids Y. Wang, O. Zaiane, R. Goebel.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
COMP 328: Midterm Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
Methods for Improving Protein Disorder Prediction Slobodan Vucetic1, Predrag Radivojac3, Zoran Obradovic3, Celeste J. Brown2, Keith Dunker2 1 School of.
Machine Learning CS 165B Spring 2012
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
ENN: Extended Nearest Neighbor Method for Pattern Recognition
Semantic Similarity over Gene Ontology for Multi-label Protein Subcellular Localization Shibiao WAN and Man-Wai MAK The Hong Kong Polytechnic University.
Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Exploration of Instantaneous Amplitude and Frequency Features for Epileptic Seizure Prediction Ning Wang and Michael R. Lyu Dept. of Computer Science and.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Externally Enhanced Classifiers and Application in Web Page Classification Join work with Chi-Feng Chang and Hsuan-Yu Chen Jyh-Jong Tsay National Chung.
Semantic Embedding Space for Zero ­ Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.
Classification Techniques: Bayesian Classification
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
LOGO iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance- Pairs and Reduced Alphabet Profile into the General Pseudo Amino.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Speaker : Shau-Shiang Hung ( 洪紹祥 ) Adviser : Shu-Chen Cheng ( 鄭淑真 ) Date : 99/05/04 1 Qirui Zhang, Jinghua Tan, Huaying Zhou, Weiye Tao, Kejing He, "Machine.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Cheng-Lung Huang Mu-Chen Chen Chieh-Jen Wang
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
An Effective Hybridized Classifier for Breast Cancer Diagnosis DISHANT MITTAL, DEV GAURAV & SANJIBAN SEKHAR ROY VIT University, India.
1 Computational Approaches(1/7)  Computational methods can be divided into four categories: prediction methods based on  (i) The overall protein amino.
BIOINFORMATION A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation - - 王红刚 14S
Detecting BGP Anomalies Using Machine Learning Techniques
New Machine Learning in Medical Imaging Journal Club
Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Relation Extraction CSCI-GA.2591
An Enhanced Support Vector Machine Model for Intrusion Detection
Week 6 Cecilia La Place.
Introduction Feature Extraction Discussions Conclusions Results
Machine Learning Week 1.
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Support Vector Machine (SVM)
Walter J. Scheirer, Samuel E. Anthony, Ken Nakayama & David D. Cox
Prepared by: Mahmoud Rafeek Al-Farra
Reecha Khanal Mentor: Avdesh Mishra Supervisor: Dr. Md Tamjidul Hoque
Machine Learning with Clinical Data
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Modeling IDS using hybrid intelligent systems
Presentation transcript:

Frontiers in the Convergence of Bioscience and Information Technologies 2007 Seyed Koosha Golmohammadi, Lukasz Kurgan, Brendan Crowley, and Marek Reformat University of Alberta Department of Electrical and Computer Engineering Classification of Cell Membrane Proteins This presentation and other related information are available at

Knowledge of cell membrane protein type is important –Critical for determining their function –Determining type of protein using traditional experimental methods is costly and time consuming Large and widening gap between known proteins (over 3.3 million) and annotated proteins Automatedaccurate Automated and accurate methods of classifying uncharacterized proteins are highly desirable Classification of Cell Membrane Proteins 1/10 Problem definition

Cell Membrane Proteins Classification of Cell Membrane Proteins 2/10

Methodology Classification of Cell Membrane Proteins 3/10

Datasets and test procedures Classification of Cell Membrane Proteins 5/10 Two datasets were used to design and test our system. These standard benchmark datasets allow for a fair comparison with other methods –2059 proteins were used to design the prediction system Chou, Prediction of protein cellular attributes using pseudo amino acid composition, Proteins (2001) 43: –2625 proteins were used for an independent test Chou and Elrod, Prediction of membrane protein types and subcellular locations, Proteins (1999) 34: Three test methods were used for evaluation of the performance of proposed prediction system –in-sample resubstitution (self-consistency) on the design dataset –out-of-sample jackknife (leave-one-out) on the design dataset –out-of-sample test on the independent dataset

Feature-based sequence representation Classification of Cell Membrane Proteins 4/10

Applying different classifiers to feature-based representation of proteins Classification of Cell Membrane Proteins 6/10 Decision Tree with Naive Bayes at the leaves K* -nearest neighbor Support Vector Machine with polynomial kernel K-nearest neighbor Neural Network with back propagation training 9 classifiers with the highest total accuracy

Our method results in a glance Classification of Cell Membrane Proteins 7/10 Test method Self- consistency JackknifeIndependent Accuracy [%]Overall Type I Type II Multipass Lipid GPI Specificity [%] Type I Type II Multipass Lipid GPI

Our method outperforms existing methods Classification of Cell Membrane Proteins 8/10 ClassifierReference Test method Self- consistency Jack-knifeIndependent K* This paper Ensemble of NNs Shen and Chou 2007 not available Fuzzy KNN Shen and Chou 2006 not available Stacking Wang et al OET-KNN Shen et al Weighted SVM Wang et al SLLE Wang et al not available Augmented covariant discriminant Chou SVM Cai et al not available

Conclusions Classification of Cell Membrane Proteins 9/10 The proposed method outperforms existing methods –higher accuracy in both jackknife and independent dataset tests The improved prediction quality of our method is a result of applying a comprehensive feature-based sequence representation –existing methods use either composition or pseudo amino acid composition for protein representation. –in contrast, our method uses seven feature sets for the same task –there might be other features that are not tested in this study and could further improve the prediction accuracy