Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

An Analysis of “Coronavirus 3CL pro proteinase cleavage sites: Possible relevance to SARS virus pathology” Connie Wu.
Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Biological sequence analysis and information processing by artificial neural networks.
PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
Artificial Neural Networks 2 Morten Nielsen Depertment of Systems Biology, DTU.
Protein Secondary Structures
1 Part I Artificial Neural Networks Sofia Nikitaki.
Biological sequence analysis and information processing by artificial neural networks Morten Nielsen CBS.
Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
It & Health 2009 Summary Thomas Nordahl Petersen.
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Secondary Structures Assignment and prediction Pernille Haste Andersen
Biological sequence analysis and information processing by artificial neural networks.
It & Health 2010 Summary Thomas Nordahl Petersen.
Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks for Secondary Structure Prediction CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (slides by J. Burg)
Traffic Sign Recognition Using Artificial Neural Network Radi Bekker
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Predicting the Cellular Localization Sites of Proteins Using Decision Tree and Neural Networks Yetian Chen
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Day 2: Protein Sequence Analysis 1.Physico-chemical properties. 2.Cellular localization. 3.Signal peptides. 4.Transmembrane domains. 5.Post-translational.
NEURAL NETWORKS FOR DATA MINING
Artificial Intelligence Lecture No. 29 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Protein Secondary Structure Prediction
Secondary structure prediction
Protein Secondary Structure Prediction G P S Raghava.
Neural Networks in Computer Science n CS/PY 231 Lab Presentation # 1 n January 14, 2005 n Mount Union College.
N. Saoulidou & G. Tzanakos1 ANN Basics : Brief Review N. Saoulidou, Fermilab & G. Tzanakos, Univ. of Athens.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.
Introduction to Bioinformatics Summary Thomas Nordahl Petersen.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Convolutional Sequence to Sequence Learning
CS273B: Deep learning for Genomics and Biomedicine
Neural Networks.
Learning in Neural Networks
Artificial Intelligence (CS 370D)
Functional Annotation of Transcripts
CSE 473 Introduction to Artificial Intelligence Neural Networks
CSE P573 Applications of Artificial Intelligence Neural Networks
Neural Networks Advantages Criticism
Chapter 3. Artificial Neural Networks - Introduction -
OVERVIEW OF BIOLOGICAL NEURONS
network of simple neuron-like computing elements
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Neural Networks Geoff Hulten.
Artificial Intelligence Lecture No. 28
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
ARTIFICIAL NEURAL networks.
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Basic Local Alignment Search Tool
Introduction to Neural Network
PYTHON Deep Learning Prof. Muhammad Saeed.
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen

Use of artificial neural networks A data-driven method to predict a feature, given a set of training data In biology input features could be amino acid sequence or nucleotides Secondary structure prediction Signal peptide prediction Surface accessibility Propeptide prediction O- and N-glycosylation C N Signal peptide Propeptide Mature/active protein

Neural network prediction methods http://www.cbs.dtu.dk/services/

Pattern recognition

Biological Neural network

Biological neuron structure

Diversity of interactions in a network enables complex calculations Similar in biological and artificial systems Excitatory (+) and inhibitory (-) relations between compute units 1 fire

Transfer of biological principles to artificial neural network algorithms Non-linear relation between input and output Massively parallel information processing Data-driven construction of algorithms Ability to generalize to new data items google translate (https://www.youtube.com/watch?v=0zKU7jDA2nc)

Sparse encoding of amino acid sequence windows

Sparse encoding Inp Neuron 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 AAcid A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 R 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

BLOSUM encoding (Blosum50 matrix) A R N D C Q E G H I L K M F P S T W Y V A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4

Sequence encoding (continued) Sparse encoding V:0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 L:0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 V.L=0 (unrelated) Blosum encoding V: 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 L:-1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 V.L = 0.88 (highly related) V.R = -0.08 (close to unrelated)

Simplified feed-forward neural network no bias neurons are shown I1 I2 I3 Input w1,2 w3,1 w2,2 w2,1 w1,1 w3,2 h1 h2 hidden v1,1 v2,1 O1 output h1  h1  = 1/ (1+e-x) o=H1*v1,1 + H2*v2,1 O1 = (o) Error = O - True

Sigmodial or logistic function

Training and error reduction 

Training and error reduction 

Training and error reduction Size matters 

Secondary Structure Elements ß-strand Helix Bend Turn

Neural Network Architecture Weights Input Layer I K H E Output Layer E E C H V I I Q A E Hidden Layer Window IKEEHVIIQAEFYLNPDQSGEF…..

Predictions and reliability of a prediction Normally the best prediction is obtained by averaging results from several predictions - “wisdom of the crowd Two types of neural networks Prediction of features in classes/bins e.g. H, E or C (1,0,0) Values close to 1 or 0 are more accurate than values close to 1/2 Prediction of real values e.g. Surface accessibility (0.43) Reliability of a prediction is more difficult to estimate

Signal peptide http://www.cbs.dtu.dk/services/SignalP

Eukaryotic SP & TM Signal peptide cleavage 1523 seq C-terminal end of TM-regions 669 seq 23

Signal peptide is present Signal peptide prediction Signal pepdide likeness Cleavage site Combined information Signal peptide is present If D-score is above threshold

Protein Engineering, Design and Selection: 17: 107-112, 2004. Propeptide prediction Many secretory proteins and peptides are synthesized as inactive precursors that in addition to signal peptide cleavage undergo post-translational processing to become biologically active polypeptides. Precursors are usually cleaved at sites composed of single or paired basic amino acid residues by members of the subtilisin/kexin-like proprotein convertase (PC) family. In mammals, seven members have been identified, with furin being the one first discovered and best characterized. Recently, the involvement of furin in diseases ranging from Alzheimer's disease and cancer to anthrax and Ebola fever has created additional focus on proprotein processing. We have developed a method for prediction of cleavage sites for PCs based on artificial neural networks. Two different types of neural networks have been constructed: a furin-specific network based on experimental results derived from the literature, and a general PC-specific network trained on data from the Swiss-Prot protein database. The method predicts cleavage sites in independent sequences with a sensitivity of 95% for the furin neural network and 62% for the general PC network. Protein Engineering, Design and Selection: 17: 107-112, 2004. General cleavage: R/K-Xn-R/K , n=0, 2, 4, 6 Furin cleavage: R-X-R/K-R

Propeptide prediction http://www.cbs.dtu.dk/services/ProP/ Furin cleavage

NetSurfP http://www.cbs.dtu.dk/services/NetSurfP

http://www.cbs.dtu.dk/services/NetSurfP a b c a = a-helix b=b-strand c=coil

O- and N-glycosylation O-glycosylation of Ser and Thr (mucin type glycolylation) N-acetylgalactosamine (GalNAc) N-glycosylation of Asn-X-Ser/Thr, where X differ from Pro N-acetylglycosamine (GlcNAc)