Example of regression by RBF-ANN Prediction of charge on peptides after electron-spray ionization in mass spectrometry What are the best attributes to predict charge?
Review of molecular biology DNA sequence determines protein sequence
Amino acids with different side chains have different names GlycineglyG alaninealaA valinevalV leucineleuL isoleucineileI methioninemetM porlineproP phenylalaninepheF tryptophantrpW serineserS cysteinecysC threoninethrT glutamineglnQ asparagineasnN histidinehisH tyrosinetyrY glutamic acidgluE aspartic acidaspD lysinelysK arginineargR What are amino acids? C-terminus N-terminus Side chain
chemical properties of amino acids
codemass pipK1pK2chargeHydrop hobic? Polar ? A TF R FF N FT D FF C FT E FF Q FT G TF H FT I TF L TF K FF M TF F TF P TF S FT T FT W TT Y FT V TF More properties of amino acids
Amino Acids Polymerize to Form Proteins (polypeptides) -N-C-C-N-C-C-N- H0 RHRH H0 H formation of peptide bond
Proteases: enzymes that cut proteins at the peptide bond -N-C-C-N-C-C-N- H0 RHRH H0 H Most proteases have cleavage specificity. Trypsin cleaves mainly at arginine (R) and lysine (K) Digestion of a protein with trypsin produces peptides of various length Analysis of digestion mixture yields information about proteins in sample
peptides are retained for differing times on the LC column LC column Electro-spray ionization Mass spectrometer Digested protein mixture Peptides may have multiple charges. Charges in dataset are averages from several runs Liquid chromatography coupled to mass spectrometry
SequenceCharge AAAAAAPDDVAAQLVVADLDLVGGHVEDAFAR2.8 AAAAADLANR2 AAAAAQASASAAAK AAAAAVAQGGPIEDAER2 First 4 of ~ 23,000 data pairs are Can peptide sequence be an input? What inputs can we calculate from the input sequence?
Some suggestions for inputs from properties of amino acids Length of peptide Mass of peptide First amino acid Last amino acid Factions of amino acids of each type Fractions of hydrophobic, polar, and charged residues Net formal charge Average isoelectric point Average disassociation constant
MLP with default options. 600 examples reserved for test set Poor results
Other regression options