Biological sequence analysis and information processing by artificial neural networks.

Slides:



Advertisements
Similar presentations
NEURAL NETWORKS Backpropagation Algorithm
Advertisements

Artificial Neural Networks (1)
Perceptron Learning Rule
Automatic Speech Recognition II  Hidden Markov Models  Neural Network.
Artificial Neural Networks - Introduction -
Artificial Neural Networks - Introduction -
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Artificial Neural Networks 2 Morten Nielsen BioSys, DTU.
Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
Artificial Neural Networks 2 Morten Nielsen Depertment of Systems Biology, DTU.
Protein Secondary Structures
1 Part I Artificial Neural Networks Sofia Nikitaki.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Biological sequence analysis and information processing by artificial neural networks Morten Nielsen CBS.
Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Sequence similarity.
M.W. Mak and S.Y. Kung, ICASSP’09 1 Conditional Random Fields for the Prediction of Signal Peptide Cleavage Sites M.W. Mak The Hong Kong Polytechnic University.
Biological sequence analysis and information processing by artificial neural networks.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
It & Health 2010 Summary Thomas Nordahl Petersen.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.
Information Fusion Yu Cai. Research Article “Comparative Analysis of Some Neural Network Architectures for Data Fusion”, Authors: Juan Cires, PA Romo,
Radial-Basis Function Networks
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Artificial Neural Networks for Secondary Structure Prediction CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (slides by J. Burg)
Lecture 11, CS5671 Secondary Structure Prediction Progressive improvement –Chou-Fasman rules –Qian-Sejnowski –Burkhard-Rost PHD –Riis-Krogh Chou-Fasman.
The dynamic nature of the proteome
Classification Part 3: Artificial Neural Networks
Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.
Intelligent Systems for Bioinformatics Michael J. Watts
Parallel Artificial Neural Networks Ian Wesley-Smith Frameworks Division Center for Computation and Technology Louisiana State University
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.
ANNs (Artificial Neural Networks). THE PERCEPTRON.
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Sequence encoding, Cross Validation Morten Nielsen BioSys, DTU
Artificial Neural Networks An Introduction. What is a Neural Network? A human Brain A porpoise brain The brain in a living creature A computer program.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU IIB-INTECH, UNSAM, Argentina.
What is a Project Purpose –Use a method introduced in the course to describe some biological problem How –Construct a data set describing the problem –Define.
Artificial Neural Networks Students: Albu Alexandru Deaconescu Ionu.
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
CS621 : Artificial Intelligence
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
Typically, classifiers are trained based on local features of each site in the training set of protein sequences. Thus no global sequence information is.
CAP6938 Neuroevolution and Artificial Embryogeny Neural Network Weight Optimization Dr. Kenneth Stanley January 18, 2006.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
“ Using Sequence Motifs for Enhanced Neural Network Prediction of Protein Distance Constraints ” J.Gorodkin, O.Lund, C.A.Anderson, S.Brunak On ISMB 99.
“Principles of Soft Computing, 2 nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved. CHAPTER 2 ARTIFICIAL.
Introduction to Bioinformatics Summary Thomas Nordahl Petersen.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
CSE 473 Introduction to Artificial Intelligence Neural Networks
Dr. Kenneth Stanley September 6, 2006
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
network of simple neuron-like computing elements
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Presentation transcript:

Biological sequence analysis and information processing by artificial neural networks

Pairvise alignment >carp Cyprinus carpio growth hormone 210 aa vs. >chicken Gallus gallus growth hormone 216 aa scoring matrix: BLOSUM50, gap penalties: -12/ % identity; Global alignment score: carp MA--RVLVLLSVVLVSLLVNQGRASDN-----QRLFNNAVIRVQHLHQLAAKMINDFEDSLLPEERRQLSKIFPLSFCNSD ::. :...:.:. : :.. :: :::.:.:::: :::...::..::..:.:.:: :. chicken MAPGSWFSPLLIAVVTLGLPQEAAATFPAMPLSNLFANAVLRAQHLHLLAAETYKEFERTYIPEDQRYTNKNSQAAFCYSE carp YIEAPAGKDETQKSSMLKLLRISFHLIESWEFPSQSLSGTVSNSLTVGNPNQLTEKLADLKMGISVLIQACLDGQPNMDDN : ::.:::..:..:..:::.:. ::.:: : : ::..:.:. :.... ::: ::. ::..:.. :.:. chicken TIPAPTGKDDAQQKSDMELLRFSLVLIQSWLTPVQYLSKVFTNNLVFGTSDRVFEKLKDLEEGIQALMRELEDRSPR---G carp DSLPLP-FEDFYLTM-GENNLRESFRLLACFKKDMHKVETYLRVANCRRSLDSNCTL.: :.. :...:. :... ::.:::::.:::::::.:.:::.::::. chicken PQLLRPTYDKFDIHLRNEDALLKNYGLLSCFKKDLHKVETYLKVMKCRRFGESNCTI

Biological Neural network

Biological neuron

Diversity of interactions in a network enables complex calculations Similar in biological and artificial systems Excitatory (+) and inhibitory (-) relations between compute units

Biological neuron structure

Transfer of biological principles to artificial neural network algorithms Non-linear relation between input and output Massively parallel information processing Data-driven construction of algorithms Ability to generalize to new data items

Simplest non-trivial classification problem CNHSYYP, HIETRRA, NWQSADY, NQYSEPR, WHITRCA, DYHSANY,... Two categories: positives and negatives Data described by features, e.g. charge, sidechain volume, molecular weight, number of atoms,...

Features of phosphorylations sites PKG cGMP- dep.kinase PKC CaM-II Ca++/cal- modulin-dep. kinase cdc2 Cyclin- dep.kinase 2 CK-II Casein kinase 2

Neural networks Neural networks can learn higher order correlations XOR function: 0 0 => => => => 0 (1,1) (1,0) (0,0) (0,1) No linear function can separate the points

Neural networks v1v1 v2v2 Linear function

Neural networks w 11 w 12 v1v1 w 21 w 22 v2v2 Higher order function

Neural networks. How does it work? w 12 v1v1 w 21 w 22 v2v2 w t2 w t1 w 11 vtvt Input 1 (Bias) {

Neural networks (0 0) Input 1 (Bias) { o 1 =-6 O 1 =0 o 2 =-2 O 2 =0 y 1 =-4.5 Y 1 =0

Neural networks (1 0 && 0 1) Input 1 (Bias) { o 1 =-2 O 1 =0 o 2 =4 O 2 =1 y 1 =4.5 Y 1 =1

Neural networks (1 1) Input 1 (Bias) { o 1 =2 O 1 =1 o 2 =10 O 2 =1 y 1 =-4.5 Y 1 =0

What is going on? XOR function: 0 0 => => => => Input 1 (Bias) { y2y2 y1y1

What is going on? (1,1) (1,0) (0,0) (0,1) x2x2 x1x1 y1y1 y2y2 (1,0) (2,2) (0,0)

DEMO

Training and error reduction

Transfer of biological principles to neural network algorithms Non-linear relation between input and output Massively parallel information processing Data-driven construction of algorithms

A Network contains a very large set of parameters –A network with 5 hidden neurons predicting binding for 9meric peptides has 9x20x5=900 weights Over fitting is a problem Stop training when test performance is optimal Neural network training years Temperature

Neural network training. Cross validation Cross validation Train on 4/5 of data Test on 1/5 => Produce 5 different neural networks each with a different prediction focus

Neural network training curve Maximum test set performance Most cable of generalizing

Network training Encoding of sequence data Sparse encoding Blosum encoding Sequence profile encoding

Sparse encoding of amino acid sequence windows

Sparse encoding Inp Neuron AAcid A R N D C Q E

Sparse encoding of nucleotide sequence windows Nucleotides 4 letter alphabet ACGTAGGCAATCTCAGACGTTTATC

BLOSUM encoding (Blosum50 matrix) A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V

Sequence encoding (continued) Sparse encoding V: L: V. L=0 (unrelated) Blosum encoding V: L: V. L = 0.88 (highly related) V. R = (close to unrelated)

Applications of artificial neural networks Talk recognition Prediction of protein secondary structure Prediction of Signal peptides Post translation modifications Glycosylation Phosphorylation Proteasomal cleavage MHC:peptide binding