Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDPpred: a method for identification of amino acid residues that determine differences in functional specificity of homologous proteins and application.

Similar presentations


Presentation on theme: "SDPpred: a method for identification of amino acid residues that determine differences in functional specificity of homologous proteins and application."— Presentation transcript:

1 SDPpred: a method for identification of amino acid residues that determine differences in functional specificity of homologous proteins and application thereof to the MIP family of membrane transporters Olga V. Kalinina Pavel S. Novichkov Andrey A. Mironov Mikhail S. Gelfand Aleksandra B. Rakhmaninova

2 Large families of proteins: generally similar biochemical function but many different specificities… Example: ~800 transcription factors of the LacI family. Average sequence identity 30%. Bind different effectors and operators. Some effectors: lactose (LacI) D-fructose-6-phosphate (FruR) guanine, hypoxantine (PurR) cytidine, adenosine (CytR) trehalose-6-phosphate (TreR) D-gluconate (GntR) D-galactose (GalR) D-ribose (RbsR) maltose (MalR) raffinose (RafR) ……. Х??

3 Positions that account for specificity Assignment of specificity to new proteins Experiment Testing on families that include proteins with resolved 3D structure SDPpred Description of specificity groups : Group А: No. 1-10,13… Group В: No.12, 14-16… Group С: No. 17-45… … Q9KDW9 ----------MSPFLGEVIGTMILIILGGGVVAGVVLKGTK Q8Y6Z1 ----MIDTSLATQFLGEVIGTAILIILGAGVVAGVSLKRSK Q97JG6 ----------MTIFFAELVGTLLLILLGDGVVANVVLKNSK GLPF_ECOLI MSQT---STLKGQCIAEFLGTGLLIFFGVGCVA--ALKVAG Q8ZJK5 MSQTA-SSTLKGQCIAEFLGTGLLIFFGAGCVA--ALKLAG GLPF_HAEIN MDKS-----LKANCIGEFLGTALLIFFGVGCVA—-ALKVAG GLPF_PSEAE MTTAAPTPSLFGQCLAEFLGTALLIFFGTGCVA--ALKVAG AQPZ_BRUME ---------MLNKLSAEFFGTFWLVFGGCGSAILAA--AFP Q92NM3 ---------MFRKLSVEFLGTFWLVLGGCGSAVLAA--AFP Q8UJW4 ---------MGRKLLAEFFGTFWLVFGGCGSAVFAA--AFP AQPZ_ECOLI ---------MFRKLAAECFGTFWLVFGGCGSAVLAA--GFP Alignment ?

4 SDP is not equivalent to a functionally important position! Specificity group = group of proteins that have the same specificity (experimental data, genome analysis, etc.) SDP = alignment position that is conserved within specificity groups but differs between them What are SDPs? (SDP = Specificity Determining Position)

5 Mutual information I p reflect the extent to which an alignment position tends to be a SDP. Statistical significance of I p. Expected mutual information I p exp of an alignment column. Z-score. ( Mirny&Gelfand, 2002, J Mol Biol, 321(1) ) Smoothed amino acid frequencies: a leucine is more a methionine than a valine, and any arginine has a dash of lysine… Are 5 SDP with Z-score >10.5 better than 10 SDP with Z-score >9.0? Bernoulli estimator for selection of proper number of SDPs ы N - number of groups, - fraction of proteins in group i. - ratio of occurrences of amino acid In group i in position p to the length of the whole alignment column, - frequency of amino acid in the whole alignment column in position p, Algorithm …

6 Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Sci 13(2): 443-56 http://math.belozersky.msu.ru/~psn/ Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucl Acids Res 32(Web Server issue): W424-8.

7 Web interface Input: multiple alignment of proteins divided into specificity groups === AQP === %sp|Q9L772|AQPZ_BRUME -------------------------------------mlnklsaeffgtfwlvfggcgsa ilaa--afp-------elgigflgvalafgltvltmayavggisg--ghfnpavslgltv iiilgsts------------------------------slap------------------ qlwlfwvaplvgavigaiiwkgllgrd--------------------------------- ------ %sp|P48838|AQPZ_ECOLI -------------------------------------mfrklaaecfgtfwlvfggcgsa vlaa--gfp-------elgigfagvalafgltvltmafavghisg--ghfnpavtiglwa lvihgatd------------------------------kfap------------------ qlwffwvvpivggiiggliyrtllekrd-------------------------------- ------ %tr|Q92ZW9 -------------------------------------mfkklcaeflgtcwlvlggcgsa vlas--afp-------qvgigllgvsfafgltvltmaytvggisg--ghfnpavslglav iiilgsth------------------------------rrvp------------------ qlwlfwiaplfgaaiagivwksvgeefrpvd----------------------------- ------ === GLP === %sp|P11244|GLPF_ECOLI ----------------------------msqt---stlkgqciaeflgtglliffgvgcv aalkvag---------a-sfgqweisviwglgvamaiyltagvsg--ahlnpavtialwl glilaltd------------------------------dgn--------------g-vpr -flvplfgpivgaivgafayrkligrhlpcdicvveek--etttpseqkasl-------- ------ %sp|P44826|GLPF_HAEIN ----------------------------mdks-----lkancigeflgtalliffgvgcv …

8 Web interface Output Alignment of the family with the SDPs highlighted (Alignment view) Detailed description of each SDP (List of SDPs) Plot of probabilities, used by the Bernoulli estimator to set the cutoff (Probability plot view)

9 Examples: the LacI family of bacterial transcription factors Training set: 459 sequences, average length: 338 amino acids, 85 specificity groups 10 residues contact NPF (analog of the effector) 6 residues make up intersubunit contacts 7 residues contact the operator sequence 7 residues in the effector contact zone (5Ǻ<d min <10Ǻ) 5 residues in the intersubunit contact zone (5Ǻ<d min <10Ǻ) 6 residues in the operator contact zone (5Ǻ<d min <10Ǻ) – 44 SDPs LacI from E.coli

10 Examples: bacterial membrane channels of the MIP family Training set: 17 sequences, average length 280 amino acids, 2 specificity groups: Aquaporines & glyceroaquaporines – 21 SDPs 8 residues contact glycerol (substrate) (d min <5Ǻ) 8 residues oriented to the channel 5 residues make up contacts with other subunits GlpF from E.coli

11 Why does the prediction make sense? LacI from E.coli Total 348 amino acids 44 SDP Non-contacting residues (distance to the DNA, effector, or the other subunit >10Ǻ) Contact zone (may be functional) Contacting residues (distance to the DNA, effector, or the other subunit <5Ǻ)

12 Why does the prediction make sense? GlpF from E.coli Total 281 amino acids 21 SDP Contacting residues (distance to the substrate, or another subunit <5Ǻ) Non-contacting residues (distance to the substrate, or another subunit >10Ǻ) Contact zone (may be functional)

13 GlpF from E.coli, a membrane channel from the MIP family: SDPs either interact with the substrate or are located on the outer surface of the monomer Structure of the GlpF monomerPredicted SDPs Glycerol

14 SDPs located on the outer surface of the GlpF monomer form subunit contacts Glu43 from all four subunits 20Leu, 24Ile, 108Tyr of one subunit, 193Ser from another subunit

15 SDPs located on the outer surface of the GlpF monomer (continued) Subunit ISubunit IISubunit IV ResidueAtomResidueAtomResidueAtom(Ǻ)(Ǻ) Glu43OE1Ser38O4.8 Glu43OE2Glu43OE24.1 Glu43CGTrp42CD13.7 Glu43OE2Glu43OE24.1 Subunit ISubunit II ResidueAtomResidueAtom(Ǻ)(Ǻ) Leu20CD2Ile158CD14.3 Leu20CD1Leu162CD24.5 Phe24CZIle158CG23.9 Phe24CZLeu186CD13.9 Phe24CE2Val189CG23.8 Phe24CE2Ile190CG13.7 Phe24CASer193CB3.9 Phe24OSer193OG4.2 Phe24OSer193CBCB3.3 Gly27OSer193O3.2 Cys28CASer193CA3.8 Tyr108OHSer193O2.6 Tyr108CE1Met194CE3.7 Tyr108CE1Leu197CD13.9

16 SDPs located on the outer surface of the GlpF monomer (continued) Structure of contacts in the type A cluster Structure of contacts in the type B cluster

17 Conclusions I. SDPpred: the SDP prediction method A method for identification of amino acid residues that account for differences in protein functional specificity –Does not rely on the protein 3D structure –Automatically determines the number of significant positions –Considers substitutions according to the chemical properties of substituted amino acids Results agree with available structural and experimental data Applicable to any protein family in a standard way Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Sci 13(2): 443-56 http://math.belozersky.msu.ru/~psn/ Kalinina OV, Novichkov PS, Mironov AA, Gelfand MS, Rakhmaninova AB. (2004) SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins. Nucl Acids Res 32(Web Server issue): W424-8.

18 Conclusions II. SDPs for GlpF from E.coli In protein families, whose members function as oligomers, predicted SDPs are often localized on the contact surface between subunits 5 “surface” SDPs in GlpF: 20Leu, 24Ile, 43Glu, 108Tyr, 193Ser. All of them participate in forming the quaternary structure  Evolutionary pressure on amino acids that establish intersubunit contacts correlates with evolutionary pressure on amino acids that account for the correct recognition of the substrate These residues form compact spatial clusters  “structural clasps” for recognition of proper subunits

19 Olga V. Kalinina Pavel S. Novichkov Andrey A. Mironov Mikhail S. Gelfand Aleksandra B. Rakhmaninova –Department of Bioengineering and Bioinformatics, Moscow State University, Moscow, Russia –Institute for Information Transmission Problems RAS, Moscow, Russia –State Scientific Center GosNIIGenetika, Moscow, Russia Acknowledgements –Leonid A. Mirny –Olga Laikova –Vsevolod Makeev –Roman Sutormin –Shamil Sunyaev –Aleksey Finkelstein


Download ppt "SDPpred: a method for identification of amino acid residues that determine differences in functional specificity of homologous proteins and application."

Similar presentations


Ads by Google