Download presentation
Presentation is loading. Please wait.
1
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein backbone in the design algorithm is necessary to capture the behaviour of real proteins and is a prerequisite for the accurate exploration of sequence space. We present a broad exploration of protein sequence space, with backbone flexibility, through a novel approach: large-scale protein design to structural ensembles. An application is demonstrated, wherein designed sequences are used to increase the utility of comparative modeling, in place of natural sequence homologues. Results We designed hundreds of thousands of diverse sequences for 264 naturally-occurring proteins, in 55 fold classes. Protein folds show distinct variation in “designability”. Our novel “reverse BLAST” approach uses designed sequence to identify up to 5-fold more high-quality structural templates for comparative modeling than standard PSI-BLAST. Reverse BLAST identifies at least one new modeling target in 41 of 49 genomes tested.
2
Protein design Challenges in computational protein design: choosing sufficiently accurate energy functions finding intelligent ways to efficiently search the large (O(10n)) space of protein sequences modeling peptide backbone flexibility Some highlights of the design algorithm (SPA): initial rotamer filtering step Amber/OPLS parameter set; implicit solvation amino acid baseline corrections to maintain reasonable sequence compositions genetic algorithm to search for low energy sequences to match the target structure
3
Peptide backbone flexibility through structural ensembles Ten representative backbone traces from the structural ensemble used in designing sequences for 1abo, the SH3 domain from Abl tyrosine kinase. The structural variants appear in yellow, with the original crystal structure backbone traced in purple. All structures are within 1 Å rmsd of each other. Designing to a structural ensemble generates more diverse sequences than fixed-backbone methods.
4
More non-native-like sequences are designed Distribution of identity to the native parent sequence for 253 proteins. Identity to the native sequence was calculated for the set of sequences designed using only the fixed parent backbone as a target template (all residues: black dashed line; buried residues: great dashed line) and for the set of sequences designed using a structural ensemble target (all residues: black solid line; buried residues: grey solid line). Using structural ensembles of 100 structural variants as target templates narrows and lowers the distribution of identity to the parent native sequence, indicating broader exploration of sequence space.
5
Overall sequence diversity is determined by the protein fold Sequence entropy distributions of designed sequences, grouped by structure into folds. The six folds are identified by their PFAM families. The relatively tight clustering of sequence entropies within a fold and the separation of sequence entropy distributions for different folds suggests a) that the diversity of the designed sequence set for a structure is primarily determined by its overall fold and b) that the designability principle postulated from studies of simple models may hold in real proteins.
6
Designed sequences identify structural homologues accurately The E-value of the most significant hit from each of 264 “reverse BLAST” searches is plotted. Dark grey columns represent predictions that are true structural homologues; light grey columns represent false positives. Our novel “reverse BLAST searching” uses alignments of designed sequences as PSI-BLAST queries against a genome to identify structural templates for structure prediction of gene sequences. 251 of the 264 designed sequence alignments produced hits (against PDB as a test set) with E-values below 10. At a significance level of E<0.01, a commonly used threshold in comparative modeling, all hits were against true structural homologues, with 47% (124/264) coverage.
7
“Reverse BLAST” identifies more templates for homology modeling Light grey: the number of genes for which structural templates were identified by PSI-BLAST searching against the set of 264 structures in the test set. Dark grey: the number of novel genes for which structural templates were identified by “reverse BLAST” searching using 264 alignments of computationally designed sequences. Reverse BLAST searching identified at least one additional structural template for use in homology modeling (not identified by standard PSI- BLAST) for 41 of 49 genomes. In ten cases, the reverse BLAST method more than doubled the number of structural templates identified.
8
Conclusions The task of large-scale protein sequence design has been efficiently massively parallelized. Design to a structural ensemble greatly increases the diversity of sequences generated, without loss of sequence quality. Similar structures produce sequence sets of similar diversity, and the distributions of sequence entropies for different folds segregate, supporting the designability postulate seen in simple models. “Reverse BLAST searching” uses designed sequences to accurately identify structural homologues. Reverse BLAST searching allows increased identification of structural templates for homology modeling without the need for natural sequence homologues.
9
Future Directions Use sequence profiles for specific proteins to generate biased combinatorial libraries for protein synthesis. This will experimentally test the ability of the design algorithm to produce viable sequences. Introduce functional constraints into the design process to produce new sequences which are both stable and functional. Refine methods for generating high sequence diversity for a given structure, allowing more extensive sampling of sequence space. Use computational design to redesign peptide ligands for applications in drug discovery and understanding protein-protein interactions.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.