Download presentation
Presentation is loading. Please wait.
Published byDorothy Russell Modified over 9 years ago
1
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g934251 詹濠先
2
General Information MODBASE is a queryable database of annotated protein structure models. The models are derived by ModPipe, an automated modeling pipeline relying on the programs PSI-BLAST and MODELLER. The database also includes fold assignments and alignments on which the models were based. MODBASE contains theoretically calculated models, which may contain significant errors, not experimentally determined structures. Thus, special care is taken to assess the quality of the models. More information about MODBASE can be found in the HTML or PDF version of ModBase: A database of annotated comparative protein structure models. R. Sánchez, U. Pieper, N. Mirkovic, P.I.W. deBakker, E. Wittenstein & A. Sali. Nucl. Acids Res. 28, 250-253, 2000.PSI-BLAST MODELLERHTML PDFModBase: A database of annotated comparative protein structure models.
3
MODBASE core Models in MODBASE are calculated using MODPIPE, our entirely automated software pipeline for comparative modeling (16). MODPIPE can calculate comparative models for a large number of protein sequences, using many different template structures and sequence– structure alignments. MODPIPE relies on the various modules of MODELLER for its functionality and is streamlined for large-scale operation on a cluster of PCs using scripts written in PERL.16
4
Models in MODBASE are organized into data sets. The largest data set contains models of all sequences in the Swiss-Prot/TrEMBL database that are detectably related to at least one known structure in the PDB. Currently, there are 1,262,629 models for domains in 659,495 of the 1,182,126 sequences in the Swiss- Prot/TrEMBL database, with an average length of 235 residues per model. Human 32,985:sequences, Arabidopsis thaliana : 22 880 sequences, Drosophila melanogaster : 15,195 sequences Escherichia coli : 9,691 sequences Because the sequence databases contain sequence information of different strains and mutations, the number of unique sequences for a given organism exceeds the number of genes in the genome.
5
Specialities Predicted interacting proteins Residue contacts between the two models are predicted based on a match of both modeled sequences to different parts of a single PDB file. The residue contacts in a hypothetical interface are scored by their propensities to span an interface. False positive ratio : 25%
6
Specialities Predicted Ligand Binding Sites ModBase contains a list of the binding sites of known structure for ∼ 50 000 ligands found in the PDB. Forty-four percent of the models in MODBASE have at least one predicted binding site for a small ligand. Application of MODBASE to Structural Genomics NYSGXRC structures, PSI-BLAST E-value
7
Access and Interface MODBASE is queryable http://salilab.org/modbase PDB codes, Swiss-Prot/TrEMBL and GenPept accession numbers, annotation keywords, model reliability, model size, target–template sequence identity, alignment significance, and sequence similarity to the modeled sequences as detected by BLAST The output of a search is displayed on pages with varying amounts of information about the modeled sequences, template structures, alignments and functional annotations. These tables also contain links to other sequence, structure and function annotation databases, such as PDB (4), GenBank (3), Swiss- Prot/TrEMBL (2), CATH (32), Pfam (33), ProDom (34), and UCSC Genome Browser (35). In addition, MODBASE models are directly accessible from the Swiss- Prot/TrEMBL sequence pages at http://www.expasy.org and UCSC Genome Browser at http://genome.ucsc.edu.43232333435 http://www.expasy.orghttp://genome.ucsc.edu Enter PDB, Swiss-Prot GenPept…etc here. Press “Search”! 1 2
8
Search Select any of your interests. Select your purpose, e.g., “Find Ligand binding site”
9
Homolog Structure Prediction of Ligand Binding Sites
10
Reference Ursula, Pieper et al. MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res. 2004 January 1; 32(Database issue): D217–D222.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.