Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Similar presentations


Presentation on theme: "Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance."— Presentation transcript:

1 Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance

2 Lecture 10, CS5672 Problems Deducing the genetic code Predicting genes Predicting signal peptide cleavage sites

3 Lecture 10, CS5673 Deducing the genetic code Problem: Given a codon, predict corresponding amino acid Of didactic value –Trivial mapping table, after-the-fact Perfect classification problem, rather than prediction –With minimal network Learning issues –‘Similar’ codons code for ‘similar’ amino acids –Abundance of amino acids proportional to code redundancy (this and previous point undermine effect of mutations) –Third base ‘wobble’ –N:1 mapping between codon and amino acid

4 Lecture 10, CS5674 The genetic code http://molbio.info.nih.gov/molbio/gcode.html TCAG T TTT Phe (F) TTC " TTA Leu (L) TTG " TCT Ser (S) TCC " TCA " TCG " TAT Tyr (Y) TAC TAA Ter TAG Ter TGT Cys (C) TGC TGA Ter TGG Trp (W) C CTT Leu (L) CTC " CTA " CTG " CCT Pro (P) CCC " CCA " CCG " CAT His (H) CAC " CAA Gln (Q) CAG " CGT Arg (R) CGC " CGA " CGG " A ATT Ile (I) ATC " ATA " ATG Met (M) ACT Thr (T) ACC " ACA " ACG " AAT Asn (N) AAC " AAA Lys (K) AAG " AGT Ser (S) AGC " AGA Arg (R) AGG " G GTT Val (V) GTC " GTA " GTG " GCT Ala (A) GCC " GCA " GCG " GAT Asp (D) GAC " GAA Glu (E) GAG " GGT Gly (G) GGC " GGA " GGG "

5 Lecture 10, CS5675 Network Architecture Orthogonal coding (4X3)  2 hidden neurons (Is this a linear or non-linear problem?) 20 output neurons –Winner takes all Total of  86 parameters (How?) FFBP

6 Lecture 10, CS5676 Deducing the genetic code (Fig 6.7)

7 Lecture 10, CS5677 Deducing the genetic code (Fig 6.8)

8 Lecture 10, CS5678 Improving classification error Training rate high for misclassified codons, low otherwise (in addition to iteration dependence) Balanced cycles (Balanced in terms of amino acids, not codons) Adaptive training –Present mis-classified examples more often

9 Lecture 10, CS5679 Is it a gene or not a gene? Approaches depend on –Bias at junctions of coding and non-coding regions Donor (5’ end of intron) and acceptor sites (3’ end of intron) have biases in composition (GT [junk]+ C/U+ AG) –Bias in composition of coding regions (but not of non- coding regions, eg, introns) Exons are “regular guys”, introns are “freshman dorm rooms” Seen as GC bias, codon usage frequency and codon bias –Inverse relationship between the two (splice site strength and regularity within exons) “Food exit sign on highway doesn’t need prominent restaurant signs” “Stretch of prominent restaurant signs doesn’t need a sign indicating food”

10 Lecture 10, CS56710 Regularity within coding regions (Fig 6.11) BacteriaMammals C. elegans A. thaliana

11 Lecture 10, CS56711 Predicting Exons: The holy GRAIL Neural networks for gene prediction –Input representation/transformation key –NN per se trivial: MLP with single hidden layer and single output neuron –Input = Coding region candidate, transformed to 6mer (di-codon) score of candidate region 6mer (di-codon) score of flanking regions GC composition of candidate region GC composition of flanking region Markov model score Length of candidate Splice site score

12 Lecture 10, CS56712 Signal peptide (SignalP) prediction Signal peptides are N-terminal subsequences in proteins that are “export tags” including a “dotted line” (cleavage site) indicating point of detachment –Coding is species specific Problem analogous to exon/intron delineation –Distinguish between signalP and rest of protein –Find junction between signalP and rest of protein

13 Lecture 10, CS56713 Signal peptide (SignalP) prediction Two kinds of network that output, for each position, –S-score: Probability of classification as signal peptide –C-score: Probability of being the junction Key is post-processing – using S and C scores to come up with final prediction C-score prediction: Based on Asymmetric windows (why?) S-score prediction: Based on Symmetric windows (why?) Y-score = (C i  d S i ) 1/2 where  d S i = Average difference in S i in windows of size d flanking position i

14 Lecture 10, CS56714 Signal peptide (SignalP) prediction (Fig 6.5) S S


Download ppt "Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance."

Similar presentations


Ads by Google