SDPpred: a method for identification of amino acid residues that determine differences in functional specificity of homologous proteins and application thereof to the MIP family of membrane transporters Olga V. Kalinina Pavel S. Novichkov Andrey A. Mironov Mikhail S. Gelfand Aleksandra B. Rakhmaninova
Large families of proteins: generally similar biochemical function but many different specificities… Example: ~800 transcription factors of the LacI family. Average sequence identity 30%. Bind different effectors and operators. Some effectors: lactose (LacI) D-fructose-6-phosphate (FruR) guanine, hypoxantine (PurR) cytidine, adenosine (CytR) trehalose-6-phosphate (TreR) D-gluconate (GntR) D-galactose (GalR) D-ribose (RbsR) maltose (MalR) raffinose (RafR) ……. Х??
Positions that account for specificity Assignment of specificity to new proteins Experiment Testing on families that include proteins with resolved 3D structure SDPpred Description of specificity groups : Group А: No. 1-10,13… Group В: No.12, 14-16… Group С: No … … Q9KDW MSPFLGEVIGTMILIILGGGVVAGVVLKGTK Q8Y6Z1 ----MIDTSLATQFLGEVIGTAILIILGAGVVAGVSLKRSK Q97JG MTIFFAELVGTLLLILLGDGVVANVVLKNSK GLPF_ECOLI MSQT---STLKGQCIAEFLGTGLLIFFGVGCVA--ALKVAG Q8ZJK5 MSQTA-SSTLKGQCIAEFLGTGLLIFFGAGCVA--ALKLAG GLPF_HAEIN MDKS-----LKANCIGEFLGTALLIFFGVGCVA—-ALKVAG GLPF_PSEAE MTTAAPTPSLFGQCLAEFLGTALLIFFGTGCVA--ALKVAG AQPZ_BRUME MLNKLSAEFFGTFWLVFGGCGSAILAA--AFP Q92NM MFRKLSVEFLGTFWLVLGGCGSAVLAA--AFP Q8UJW MGRKLLAEFFGTFWLVFGGCGSAVFAA--AFP AQPZ_ECOLI MFRKLAAECFGTFWLVFGGCGSAVLAA--GFP Alignment ?
SDP is not equivalent to a functionally important position! Specificity group = group of proteins that have the same specificity (experimental data, genome analysis, etc.) SDP = alignment position that is conserved within specificity groups but differs between them What are SDPs? (SDP = Specificity Determining Position)
Mutual information I p reflect the extent to which an alignment position tends to be a SDP. Statistical significance of I p. Expected mutual information I p exp of an alignment column. Z-score. ( Mirny&Gelfand, 2002, J Mol Biol, 321(1) ) Smoothed amino acid frequencies: a leucine is more a methionine than a valine, and any arginine has a dash of lysine… Are 5 SDP with Z-score >10.5 better than 10 SDP with Z-score >9.0? Bernoulli estimator for selection of proper number of SDPs ы N - number of groups, - fraction of proteins in group i. - ratio of occurrences of amino acid In group i in position p to the length of the whole alignment column, - frequency of amino acid in the whole alignment column in position p, Algorithm …
Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Sci 13(2): Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucl Acids Res 32(Web Server issue): W424-8.
Web interface Input: multiple alignment of proteins divided into specificity groups === AQP === %sp|Q9L772|AQPZ_BRUME mlnklsaeffgtfwlvfggcgsa ilaa--afp elgigflgvalafgltvltmayavggisg--ghfnpavslgltv iiilgsts slap qlwlfwvaplvgavigaiiwkgllgrd %sp|P48838|AQPZ_ECOLI mfrklaaecfgtfwlvfggcgsa vlaa--gfp elgigfagvalafgltvltmafavghisg--ghfnpavtiglwa lvihgatd kfap qlwffwvvpivggiiggliyrtllekrd %tr|Q92ZW mfkklcaeflgtcwlvlggcgsa vlas--afp qvgigllgvsfafgltvltmaytvggisg--ghfnpavslglav iiilgsth rrvp qlwlfwiaplfgaaiagivwksvgeefrpvd === GLP === %sp|P11244|GLPF_ECOLI msqt---stlkgqciaeflgtglliffgvgcv aalkvag a-sfgqweisviwglgvamaiyltagvsg--ahlnpavtialwl glilaltd dgn g-vpr -flvplfgpivgaivgafayrkligrhlpcdicvveek--etttpseqkasl %sp|P44826|GLPF_HAEIN mdks-----lkancigeflgtalliffgvgcv …
Web interface Output Alignment of the family with the SDPs highlighted (Alignment view) Detailed description of each SDP (List of SDPs) Plot of probabilities, used by the Bernoulli estimator to set the cutoff (Probability plot view)
Examples: the LacI family of bacterial transcription factors Training set: 459 sequences, average length: 338 amino acids, 85 specificity groups 10 residues contact NPF (analog of the effector) 6 residues make up intersubunit contacts 7 residues contact the operator sequence 7 residues in the effector contact zone (5Ǻ<d min <10Ǻ) 5 residues in the intersubunit contact zone (5Ǻ<d min <10Ǻ) 6 residues in the operator contact zone (5Ǻ<d min <10Ǻ) – 44 SDPs LacI from E.coli
Examples: bacterial membrane channels of the MIP family Training set: 17 sequences, average length 280 amino acids, 2 specificity groups: Aquaporines & glyceroaquaporines – 21 SDPs 8 residues contact glycerol (substrate) (d min <5Ǻ) 8 residues oriented to the channel 5 residues make up contacts with other subunits GlpF from E.coli
Why does the prediction make sense? LacI from E.coli Total 348 amino acids 44 SDP Non-contacting residues (distance to the DNA, effector, or the other subunit >10Ǻ) Contact zone (may be functional) Contacting residues (distance to the DNA, effector, or the other subunit <5Ǻ)
Why does the prediction make sense? GlpF from E.coli Total 281 amino acids 21 SDP Contacting residues (distance to the substrate, or another subunit <5Ǻ) Non-contacting residues (distance to the substrate, or another subunit >10Ǻ) Contact zone (may be functional)
GlpF from E.coli, a membrane channel from the MIP family: SDPs either interact with the substrate or are located on the outer surface of the monomer Structure of the GlpF monomerPredicted SDPs Glycerol
SDPs located on the outer surface of the GlpF monomer form subunit contacts Glu43 from all four subunits 20Leu, 24Ile, 108Tyr of one subunit, 193Ser from another subunit
SDPs located on the outer surface of the GlpF monomer (continued) Subunit ISubunit IISubunit IV ResidueAtomResidueAtomResidueAtom(Ǻ)(Ǻ) Glu43OE1Ser38O4.8 Glu43OE2Glu43OE24.1 Glu43CGTrp42CD13.7 Glu43OE2Glu43OE24.1 Subunit ISubunit II ResidueAtomResidueAtom(Ǻ)(Ǻ) Leu20CD2Ile158CD14.3 Leu20CD1Leu162CD24.5 Phe24CZIle158CG23.9 Phe24CZLeu186CD13.9 Phe24CE2Val189CG23.8 Phe24CE2Ile190CG13.7 Phe24CASer193CB3.9 Phe24OSer193OG4.2 Phe24OSer193CBCB3.3 Gly27OSer193O3.2 Cys28CASer193CA3.8 Tyr108OHSer193O2.6 Tyr108CE1Met194CE3.7 Tyr108CE1Leu197CD13.9
SDPs located on the outer surface of the GlpF monomer (continued) Structure of contacts in the type A cluster Structure of contacts in the type B cluster
Conclusions I. SDPpred: the SDP prediction method A method for identification of amino acid residues that account for differences in protein functional specificity –Does not rely on the protein 3D structure –Automatically determines the number of significant positions –Considers substitutions according to the chemical properties of substituted amino acids Results agree with available structural and experimental data Applicable to any protein family in a standard way Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Sci 13(2): Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucl Acids Res 32(Web Server issue): W424-8.
Conclusions II. SDPs for GlpF from E.coli In protein families, whose members function as oligomers, predicted SDPs are often localized on the contact surface between subunits 5 “surface” SDPs in GlpF: 20Leu, 24Ile, 43Glu, 108Tyr, 193Ser. All of them participate in forming the quaternary structure Evolutionary pressure on amino acids that establish intersubunit contacts correlates with evolutionary pressure on amino acids that account for the correct recognition of the substrate These residues form compact spatial clusters “structural clasps” for recognition of proper subunits
Olga V. Kalinina Pavel S. Novichkov Andrey A. Mironov Mikhail S. Gelfand Aleksandra B. Rakhmaninova –Department of Bioengineering and Bioinformatics, Moscow State University, Moscow, Russia –Institute for Information Transmission Problems RAS, Moscow, Russia –State Scientific Center GosNIIGenetika, Moscow, Russia Acknowledgements –Leonid A. Mirny –Olga Laikova –Vsevolod Makeev –Roman Sutormin –Shamil Sunyaev –Aleksey Finkelstein