PREDICTING THE EXPRESSION AND SOLUBILITY OF MEMBRANE PROTEINS Center for High Throughput Structural Biology Mark E. Dumont *†, Michael A. White *, Kathy Clark †, Elizabeth J. Grayhack *†, and Eric. M. Phizicky *† Departments of * Biochemistry and Biophysics, and † Pediatrics, University of Rochester Medical Center, Rochester, NY Conclusions 1. Membrane proteins can be overexpressed on a genomic scale, many of them at high levels. 2. Many factors affect overexpression of soluble and membrane proteins similarly. 3. While overall hydrophobicity of membrane proteins is negatively correlated with expression, hydrophobicity of membrane regions is positively correlated with expression. 4. The presence of a predicted signal sequence, topological orientation in the membrane, and normal subcellular localization do not appear to affect the ability of yeast membrane proteins to be overexpressed. 5. The majority of yeast membrane proteins can be solubilized using a small set of detergents. 6. Solubility in shorter chain detergents is dependent on specific protein properties. 7. Increasing polarity of protein TM segments tends to decrease efficiency of solubilization by short chain detergents. Summary The challenge of overexpression and solubilization of eukaryotic integral membrane proteins is one of the most significant obstacles to structure determination of this important class of proteins. To identify properties of membrane proteins that may be predictive of successful overexpression, we analyzed expression levels of the genomic complement of over 1,000 predicted membrane proteins in a recently completed Saccharomyces cerevisiae protein expression library. 1 We detected statistically significant positive and negative correlations between high membrane protein expression and protein properties such as size, overall protein hydrophobicity, number of transmembrane helices, and amino acid composition of transmembrane segments. Expression levels of membrane and soluble proteins exhibited a nearly identical negative correlation with protein size and the overall hydrophobicity. However, high-level membrane protein expression was positively correlated with the hydrophobicity of predicted transmembrane segments. To further characterize yeast membrane proteins as potential targets for structure determination, we tested the solubility of 123 of the highest expressed yeast membrane proteins in six commonly used detergents. Over 75% of our test proteins could be classified into just four detergent solubility patterns. Protein size, number of transmembrane segments, and the hydrophobicity of predicted transmembrane segments all showed significant correlations with solubility in some detergents. These results suggest that bioinformatic approaches may be capable of identifying certain classes of membrane proteins most likely to be amenable to high level recombinant expression and efficient detergent solubilization, facilitating structural genomics approaches to membrane protein structure determination. 1 Gelperin DM, White MA, Wilkinson ML, Kon Y, Kung LA, Wise KJ, Lopez-Hoyo N, Jiang L, Piccirillo S, Yu H, Gerstein M, Dumont ME, Phizicky EM, Snyder M, and Grayhack EJ. (2005) Genes Dev. 19, Prediction of Yeast Transmembrane Proteins Two different transmembrane helix prediction programs were used to identify and classify membrane proteins in the yeast genome. We used TMHMM v , to predict 1,155 integral membrane proteins in the MORF collection. From this set of 1,155 proteins, we removed 63 that were predicted by the Phobius program 3 ( to have only a signal peptide and no transmembrane segments. This left a total of 1,092 proteins predicted to have one or more transmembrane helices. Since TMHMM may not be best for determining the actual topology of a membrane protein 4, we used HMMTOP predictions 5 to predict of the topology of the membrane proteins identified as such by TMHMM. In very few cases where we were aware of good experimental data suggesting a topology different from the HMMTOP prediction, we used this experimentally determined topology in our analysis. 2 Krogh A, Larsson B, von Heijne G, and Sonnhammer EL. (2001) J Mol Biol. 305: Kall L, Krogh A, and Sonnhammer EL. (2004) J Mol Biol. 338: Lehnert U, Xia Y, Royce TE, Goh CS, Liu Y, Senes A, Yu H, Zhang ZL, Engelman DM, and Gerstein M.(2004)Q. Rev Biophys. 37: Tusnady GE, and Simon I.(2001) Bioinformatics 17: The MORF Yeast Protein Overexpression Library The yeast MORF library is a genomic collection of Saccharomyces cerevisiae strains expressing C-terminally tagged proteins under Gal control 1 The MORF library contains 5,574 sequence-verified clones tested for protein expression by Western blot Factors evaluated for correlations with membrane protein expression and solubilization Codon usage, codon adaptation index Molecules per cell under chromosomal expression Percentage of total protein residues that are aromatic Isoelectric point Size (kDa) GRAVY score (overall protein hydrophobicity) Homolog in yeast or other organism Percentage of protein in transmembrane segments Percentage of transmembrane residues that are hydrophobic (WFLIVMY) Percentage of transmembrane residues that are charged/polar (EDKRHNQST) Percentage of transmembrane residues that are aromatic (WYF) Testing Solubilization of High-Expressing Yeast Proteins in Six Different Detergents Detergents used: Triton X-100 (TX-100), lauryldimethylamine-N-oxide (LDAO), Fos-choline 12 (FC-12, dodecylphosphocholine), tetraethyleneglycol monooctyl ether (C8E4), n-octyl- -D-glucoside (OG), and n-dodecyl- -D- maltoside (DDM). Procedure: 9 l of yeast whole-cell lysate (about 4.2 g protein) were solubilized by addition of 141 l of 1% TX- 100, LDAO, FC-12, DDM, or 2% of OG and C 8 E 4 in 20 mM Hepes pH 7.5, 500 mM NaCl, and 10% glycerol, followed by centrifugation at 109,000 g for 1 hour at 21 o C. A portion of the supernatant was diluted in loading buffer for SDS PAGE then analyzed by immunoblotting using anti-HA antibodies. MORF library vector insert region (Gateway cloning) P GAL ORF3CHis6ATT site HAZZ Transmembrane proteins can be expressed almost as well as soluble proteins in yeast 95% of cloned soluble proteins in the MORF library are expressed. 88% of cloned predicted membrane proteins in the MORF library are expressed. Expression was detected by immunoblotting of whole-cell lysates using antibodies against the HA-epitope tag. (“ND”, not detected) Membrane ProteinsAll MORF Proteins Disagreement between TMHMM and HMMTop in predictions of Transmembrane Proteins in the Yeast Proteome Cellular membrane localization Predicted to contain a signal peptide Membrane protein characteristics Number of predicted transmembrane segments N- and C-terminal orientation across membrane Average transmembrane segment length = Membrane proteins Number of transmembrane segmentsPercent of protein in TM segments Percent charged and polar residues in TM segments Percent hydrophobic residues in TM segments Factors such as size, overall hydrophobicity, and pI have similar effects on soluble and membrane protein expression. However, there are membrane-specific factors. Bars = Number of ORFs per bin Solubilization Efficiency (123 total proteins) Solid bars: Effective solubilization Hatched bars: Partial solubilization Detergent solubilization of membrane proteins: Correlations with number and polarity of TM segments DDM TX-100 OG C 8 E 4 DDM TX-100 OG C 8 E 4 Number of TM segments Percent charged and polar residues in TM segments Venn diagram of Proteins Solubilized by Different Detergents