Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g934251 詹濠先.

Slides:



Advertisements
Similar presentations
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Advertisements

1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Protein Tertiary Structure Prediction
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Sequence Similarity Searching Class 4 March 2010.
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Protein structure (Part 2 of 2).
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
The Protein Data Bank (PDB)
Protein Modules An Introduction to Bioinformatics.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Protein Structure Prediction II
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Protein Tertiary Structure Prediction
Part II : Introduction To Protein Structure Kong Lesheng Victor Tong Joo Chuan National University of Singapore.
Protein domains. Protein domains are structural units (average 160 aa) that share: Function Folding Evolution Proteins normally are multidomain (average.
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Comparative Protein Structure Modeling Lecture 4.1
COMPARATIVE or HOMOLOGY MODELING
1 PyMOL Evolutionary Trace Viewer 1.1 Lichtarge Lab Sept. 13, 2010.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
The Pfam and MEROPS databases EMBO course 2004 Robert Finn
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Corrections. - The cacao genome is currently being sequenced - Human Chromosome 1 sequence Search ‘Genome’
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Bioinformatics how to … use publicly available free tools to predict protein structure by comparative modeling.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Copyright OpenHelix. No use or reproduction without express written consent1.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense.
Protein Homologue Clustering and Molecular Modeling L. Wang.
Large-scale Prediction of Yeast Gene Function Introduction to Bio-Informatics Winter Roi Adadi Naama Kraus
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis.
What is BLAST? Basic BLAST search What is BLAST?
Protein Tertiary Structure Prediction Structural Bioinformatics.
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
What is BLAST? Basic BLAST search What is BLAST?
Bio/Chem-informatics
Basics of BLAST Basic BLAST Search - What is BLAST?
Demo: Protein Information Resource
Genome Annotation Continued
GEP Annotation Workflow
Prediction of Protein Structure and Function on a Proteomic Scale
BLAST.
Prediction of protein structure
Basic Local Alignment Search Tool
Protein structure prediction.
Nora Pierstorff Dept. of Genetics University of Cologne
Basic Local Alignment Search Tool
TF candidate selection pipeline.
Presentation transcript:

Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先

General Information MODBASE is a queryable database of annotated protein structure models. The models are derived by ModPipe, an automated modeling pipeline relying on the programs PSI-BLAST and MODELLER. The database also includes fold assignments and alignments on which the models were based. MODBASE contains theoretically calculated models, which may contain significant errors, not experimentally determined structures. Thus, special care is taken to assess the quality of the models. More information about MODBASE can be found in the HTML or PDF version of ModBase: A database of annotated comparative protein structure models. R. Sánchez, U. Pieper, N. Mirkovic, P.I.W. deBakker, E. Wittenstein & A. Sali. Nucl. Acids Res. 28, , 2000.PSI-BLAST MODELLERHTML PDFModBase: A database of annotated comparative protein structure models.

MODBASE core Models in MODBASE are calculated using MODPIPE, our entirely automated software pipeline for comparative modeling (16). MODPIPE can calculate comparative models for a large number of protein sequences, using many different template structures and sequence– structure alignments. MODPIPE relies on the various modules of MODELLER for its functionality and is streamlined for large-scale operation on a cluster of PCs using scripts written in PERL.16

Models in MODBASE are organized into data sets. The largest data set contains models of all sequences in the Swiss-Prot/TrEMBL database that are detectably related to at least one known structure in the PDB. Currently, there are 1,262,629 models for domains in 659,495 of the 1,182,126 sequences in the Swiss- Prot/TrEMBL database, with an average length of 235 residues per model. Human 32,985:sequences, Arabidopsis thaliana : sequences, Drosophila melanogaster : 15,195 sequences Escherichia coli : 9,691 sequences Because the sequence databases contain sequence information of different strains and mutations, the number of unique sequences for a given organism exceeds the number of genes in the genome.

Specialities Predicted interacting proteins Residue contacts between the two models are predicted based on a match of both modeled sequences to different parts of a single PDB file. The residue contacts in a hypothetical interface are scored by their propensities to span an interface. False positive ratio : 25%

Specialities Predicted Ligand Binding Sites ModBase contains a list of the binding sites of known structure for ∼ ligands found in the PDB. Forty-four percent of the models in MODBASE have at least one predicted binding site for a small ligand. Application of MODBASE to Structural Genomics NYSGXRC structures, PSI-BLAST E-value

Access and Interface MODBASE is queryable PDB codes, Swiss-Prot/TrEMBL and GenPept accession numbers, annotation keywords, model reliability, model size, target–template sequence identity, alignment significance, and sequence similarity to the modeled sequences as detected by BLAST The output of a search is displayed on pages with varying amounts of information about the modeled sequences, template structures, alignments and functional annotations. These tables also contain links to other sequence, structure and function annotation databases, such as PDB (4), GenBank (3), Swiss- Prot/TrEMBL (2), CATH (32), Pfam (33), ProDom (34), and UCSC Genome Browser (35). In addition, MODBASE models are directly accessible from the Swiss- Prot/TrEMBL sequence pages at and UCSC Genome Browser at Enter PDB, Swiss-Prot GenPept…etc here. Press “Search”! 1 2

Search Select any of your interests. Select your purpose, e.g., “Find Ligand binding site”

Homolog Structure Prediction of Ligand Binding Sites

Reference Ursula, Pieper et al. MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res January 1; 32(Database issue): D217–D222.