Download presentation
Presentation is loading. Please wait.
1
Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk
2
Parvis alignment >carp Cyprinus carpio growth hormone 210 aa vs. >chicken Gallus gallus growth hormone 216 aa scoring matrix: BLOSUM50, gap penalties: -12/-2 40.6% identity; Global alignment score: 487 10 20 30 40 50 60 70 carp MA--RVLVLLSVVLVSLLVNQGRASDN-----QRLFNNAVIRVQHLHQLAAKMINDFEDSLLPEERRQLSKIFPLSFCNSD ::. :...:.:. : :.. :: :::.:.:::: :::...::..::..:.:.:: :. chicken MAPGSWFSPLLIAVVTLGLPQEAAATFPAMPLSNLFANAVLRAQHLHLLAAETYKEFERTYIPEDQRYTNKNSQAAFCYSE 10 20 30 40 50 60 70 80 80 90 100 110 120 130 140 150 carp YIEAPAGKDETQKSSMLKLLRISFHLIESWEFPSQSLSGTVSNSLTVGNPNQLTEKLADLKMGISVLIQACLDGQPNMDDN : ::.:::..:..:..:::.:. ::.:: : : ::..:.:. :.... ::: ::. ::..:.. :.:. chicken TIPAPTGKDDAQQKSDMELLRFSLVLIQSWLTPVQYLSKVFTNNLVFGTSDRVFEKLKDLEEGIQALMRELEDRSPR---G 90 100 110 120 130 140 150 160 170 180 190 200 210 carp DSLPLP-FEDFYLTM-GENNLRESFRLLACFKKDMHKVETYLRVANCRRSLDSNCTL.: :.. :...:. :... ::.:::::.:::::::.:.:::.::::. chicken PQLLRPTYDKFDIHLRNEDALLKNYGLLSCFKKDLHKVETYLKVMKCRRFGESNCTI 170 180 190 200 210
10
Biological neuron
12
Diversity of interactions in a network enables complex calculations Similar in biological and artificial systems Excitatory (+) and inhibitory (-) relations between compute units
14
Transfer of biological principles to neural network algorithms Non-linear relation between input and output Massively parallel information processing Data-driven construction of algorithms Ability to generalize to new data items
17
Simplest non-trivial classification problem CNHSYYP, HIETRRA, NWQSADY, NQYSEPR, WHITRCA, DYHSANY,... Two categories: positives and negatives Data described by two features, e.g. charge, sidechain volume, molecular weight, number of atoms,...
18
Features of phosphorylations sites PKG cGMP- dep.kinase PKC CaM-II Ca++/cal- modulin-dep. kinase cdc2 Cyclin- dep.kinase 2 CK-II Casein kinase 2
21
Homotypical cerebral cortex – (from primate) - 6 layers
26
DEMO
28
negative positive Training and error reduction
29
Transfer of biological principles to neural network algorithms Non-linear relation between input and output Massively parallel information processing Data-driven construction of algorithms
30
Sparse encoding of amino acid sequence windows
31
Sparse encoding of nucleotide sequence windows Nucleotides 4 letter alphabet Normally no need for a fifth letter ACGTAGGCAATCTCAGACGTTTATC 1000010000100001100000100010010010001000000101000001010010000010100001000010000100010001100000010100
32
NetTalk Network learned to pronounce English text (mapped text to phonemes) Network input: moving window of 7 characters Network output: phoneme code for center character in input window Output fed to a phoneme-to-speech converter Each input character represented by a group of 29 units (localist representation) 203 total input units 80 hidden units 26 output units for phonemes Trained on 1024 words using a side-by-side English/phoneme source Intelligible speech after 10 training epochs; 95% accuracy on training corpus after 50 epochs Some hidden units developed meaningful responses (e.g., vowels vs. consonants) Generalization: 78% accuracy on continuation of training text Damaging network produced graceful degradation, with rapid recovery on retraining
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.