Introduction to RCSB PDB Data, Tools and Resources

Slides:



Advertisements
Similar presentations
Protein Structure.
Advertisements

Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Structure Visualization UCSF Chimera José R. Valverde CNB/CSIC © José R. Valverde, 2014 CC-BY-NC-SA.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
1 Computational Biology, Part 13 Retrieving and Displaying Macromolecular Structures Robert F. Murphy Copyright  1996, 1999, All rights reserved.
An introduction to using the AmiGO Gene Ontology tool.
Using 3D-SURFER. Before you start 3D-Surfer can be accessed at For visualization.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Visualization of Biological Macromolecules Shuchismita Dutta, Ph.D.
Number of released entries Year. Growth of Molecular Complexity Number of Chains Year Number of Structures Containing that Number of Chains.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Bringing Structure to Biology: Small Molecules and the PDBe
Protein 3D-structure analysis Exercises. Practicals Find update frequency for RCSB PDB: weekly. When was the last update? How many protein structures.
Copyright OpenHelix. No use or reproduction without express written consent1.
SMART Teams: Students Modeling A Research Topic Jmol Training 101!
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Copyright OpenHelix. No use or reproduction without express written consent1.
STRUCTURAL BIOLOGY Martina Mijušković ETH Zürich, Switzerland.
Motif discovery and Protein Databases Tutorial 5.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Copyright OpenHelix. No use or reproduction without express written consent1.
X-ray crystallography – an overview (based on Bernie Brown’s talk, Dept. of Chemistry, WFU) Protein is crystallized (sometimes low-gravity atmosphere is.
Chemistry XXI Unit 3 How do we predict properties? M1. Analyzing Molecular Structure Predicting properties based on molecular structure. M4. Exploring.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe Search Services (PDBelite, PDBePro and BIObar) Sanchayita Sen, Ph.D. PDB Depositions.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Homology 3D modeling Miguel Andrade Mainz, Germany Faculty of Biology,
PDBe Protein Interfaces, Surfaces and Assemblies
Integrated technology
Take a REST from manual searching: PDBe, programmatically
Biological Databases By: Komal Arora.
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
Getting the Most out of the PDBe
Protein structure Our understanding of life at the molecular level is highly dependent on the ability to map the molecular details of individual proteins.
Protein 3d structure Our understanding of life at the molecular level is highly dependent on the ability to map the molecular details of individual proteins.
Structural biology Our understanding of life at the molecular level is highly dependent on the ability to map the molecular details of individual proteins.
Section 3: Gene Technologies in Detail
Visualization with VMD
From: Structural database resources for biological macromolecules
Number of released entries
Structure determination Our understanding of life at the molecular level is highly dependent on the ability to map the molecular details of individual.
Protein structure analysis Our understanding of life at the molecular level is highly dependent on the ability to map the molecular details of individual.
The Peptide Bond Amino acids are joined together in a condensation reaction that forms an amide known as a peptide bond.
The Peptide Bond Amino acids are joined together in a condensation reaction that forms an amide known as a peptide bond.
Integrated technology
Homology 3D modeling and effect of mutations
Three-Dimensional Structure of the Human DNA-PKcs/Ku70/Ku80 Complex Assembled on DNA and Its Implications for DNA DSB Repair  Laura Spagnolo, Angel Rivera-Calzada,
Volume 9, Issue 2, Pages (February 2002)
Figure 1. The number of unique PDB, UniProt and Pfam accessions represented in the MemProtMD database over time. A selection of landmark structures are.
Protein Structures.
Chaperone-Assisted Crystallography with DARPins
Solution and Crystal Structures of a Sugar Binding Site Mutant of Cyanovirin-N: No Evidence of Domain Swapping  Elena Matei, William Furey, Angela M.
BIO307- Bioengineering principles SPRING 2019
Introduction to Databases
Meigang Gu, Kanagalaghatta R. Rajashankar, Christopher D. Lima 
Bioinformatics Unit, Life Science Faculty, TAU
Volume 26, Issue 6, Pages e2 (June 2018)
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Introduction to RCSB PDB Data, Tools and Resources Maria Dominguez, Shuchismita Dutta, Ph.D.

Learning Objectives Introduction to PDB RCSB PDB Query Explore and Learn Visualize and Analyze

Protein Data Bank (PDB) First open access digital resource in biology (est. 1971 with 7 entries) Single global archive of 3-D macromolecular structures (contains >122,000 entries) US PDB = RCSB PDB Headquartered at Rutgers/UCSD (NSF, NIH, DOE) Part of Worldwide PDB (with EU and Japan) Makes PDB data freely available to all via www.rcsb.org Some of the first few structures in the PDB

Why Use PDB Data? Visualize Analyze Compare Structures The molecule, its parts or complexes Analyze Stability and interactions of the molecule Structure function relationships Compare Structures Under different conditions Bound to various molecules (ligands or partner proteins) In health and disease Engineer and Design Mutations, additions, deletions to manipulate the function Facilitate tracking Discover drugs 2hhb/1hho

Where Does the Data Come From? Sample  Structural Data Pipeline Target Selection Isolation, Expression, Purification, Crystallization Data Collection Structure Determination PDB Deposition & Release X-ray NMR http://cnx.org/content/col11496/1.6/ Structures in the PDB are experimentally determined – by X-ray crystallography, Nuclear Magnetic Resonance (NMR) or electron microscopy (EM). In order to determine the structures a target is decided on, adequate amount of protein is produces and purified. For X-ray structures the protein has to be crystallized, for NMR a concentrated solution needs to be prepared, while for the EM experiment, the sample is placed on a grid. Data is collected in the respective experiments and used to build a model of the protein(s) in the structure. The structural coordinates and experimental data used to compute the models are all submitted to the PDB along with details about the experiment. The RCSB PDB enables users to freely access and use this material. 3D Models Annotations Publications EM You come here www.rcsb.org   info@rcsb.org

PDB Data Atomic coordinates and primary experimental data Experimental details - sample preparation, data collection and structure solution Sequence(s) of polymers (proteins and nucleic acids) in the structure Information about ligands in the structure Links to various resources that describe sequence, function and other properties of the molecule. Classification of structures by sequence, structure, function and other criteria A Million files like this  are downloaded every day The primary data archived in the PDB are the 3D coordinates of all atoms in the structure. In addition information about the experiment, experimental data used to determine the structure and links to various other bioinformatics databases are also included.

Using the RCSB PDB Website … Default View View for Students and Educators What can you do at the RCSB PDB website? Query: find relevant structure(s) Structure Summary: what is in the structure Visualize: what does the structure look like Integrate: to explore structure function relationships

Educational Resources (pdb101.rcsb.org) Resources to help understand biology at molecular and atomic levels Paper Models Animations Posters

Learning Objectives Introduction to PDB RCSB PDB Query Search Browse Explore and Learn Visualize and Analyze

Search and Refine By … Name, PDB ID, keywords Entry properties e.g. author, deposition/release date, citation Sequence Annotations Chemical components (ligands, drugs, etc.)

RCSB PDB Query Reports

Browse By Annotation For example Gene Ontology Source Organism Biological process Cellular component Molecular function Source Organism Molecular Structure SCOP CATH EC numbers Membrane proteins Anatomical Therapeutic Chemical

Search by PDB ID PDB ID: A 4-character identifier for an entry in the Protein Data Bank, it is both unique and immutable. PDB IDs are the most direct method for retrieving structures from the database, these IDs are randomly assigned at the time of deposition and have no particular meaning. One or more PDB IDs can be typed or copied into the search box. Multiple ID searches can be done by separating these with commas or line breaks.

Learning Objectives Introduction to PDB RCSB PDB Query Explore and Learn Structure Summary Page Links to Other Resources Visualize and Analyze

Result: Search by PDB ID (4INS) Seen on all pages - for new search from anywhere Searching for a PDB entry by PDB ID will show this page – called Structure Summary Page Entry specific information, details

Tabs on Top of Page Structure Summary 3D View Annotations Sequence Overall structure information + details about composition of PDB structure 3D View Options to interactively explore the structure Annotations Information about PDB structure or its components from other bioinformatics resource Sequence Of all polymers (protein, DNA, RNA) in the structure with annotations of secondary structure, mutations, etc. Sequence Similarity Comparison of given structure to entire PDB by sequence Structure Similarity Comparison of given structure to entire PDB by structure Experiment Details of how the structure of the Protein/Complex was determined Literature List of primary or other articles that reference the given structure Not discussed here

Structure Summary Page -1 PDB ID Display/Download file Structure/Experiment Description; Validation Summary Visualization of Structure Literature The literature box was introduced on the RCSB PDB website in 2010. With many publications made freely available online, the RCSB PDB took the opportunity to directly link structural data with the literature.

PDB Coordinates The PDB archives 3D coordinates of molecular structures By clicking on PDB Format, you can acquire the molecule’s 3D coordinates The coordinates include residue name, residue #, X, Y, and Z coordinates for atoms, occupancy values, and B-factors. Occupancy numbers are usually 1 (indicating that it is present at that location) or 0 indicating that it is not present at that location). It can sometimes have two or more different values for the same atom(s) (indicating that the atom may have alternate positions, due to conformational flexibility of the molecule). When there are multiple conformations it may be written as 0.5/0.5 (50:50), if two conformations are equally occupied, or any other ratios if they exist in disproportion. B-factor refers to a temperature-dependent atomic vibrations as measured during the time of x-ray crystallography. Lower values of it indicate that the atom may be reliably located at that position.

Experiment Description Angstrom is a unit of length, used to measure distances between atoms. 1Å = 10−10m (one ten-billionth of a meter)= 0.1 nm. An structure resolved at 1.5Å resolution is therefore of exceptional quality. Like other HORMONES, Insulin circulates in the blood and regulates glucose uptake by cells/tissues. The Insulin used for this experiment was derived from the organism Sus scrofa (or pig) The description box presents a simple yet highly valuable overview of the molecule being analyzed. Experimental Data and Validation: (refer to RCSB PDB website for further details: http://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/introduction Validation allows users to determine structure quality – does the structure model match the experimental data; does it agree with prior knowledge about its structure and function The deposition authors are those responsible for solving this particular structure. The structure was determined using X-Ray Diffraction.

Structure Summary Page - 2 Macromolecule Entities Small Molecules The blue bars in the Macromolecules box indicates the sequence of molecules in the PDB entry, matched against the UniProt sequence and various other domain and other annotations. Small Molecules of Ligands are small molecules that may interact covalently or non-covalently to the proteins and DNA/RNA in the structure. Ligands have at least one atom and may have many atoms and complex connectivity and structure. Where available the ligand’s binding properties (when in complex with the protein/DNA/RNA) is also reported. This data is usually experimentally determined and derived from other bioinformatics data resources.

Macromolecular Entities Indicates number of copies of this protein, and their polymer chain identifiers (chain ID) This row indicates secondary structure. Yellow areas represent β-strands and red areas stand for α- helixes. See the exact residues involved in the helix/strand etc. using mouse-over options. For each kind of protein (with a different polymer sequence) you can see the mapping of the region in the structure to the sequence of the complete gene product as listed in UniProt. Other annotations such as regions of helix and strands etc. are also marked here. The results of clicking on the + sign in the left bottom corner of the box is shown in the next slide. Clicking on this + sign will open the Protein Feature view (see next slide) displaying the region of the protein present in the structure, compared to the complete gene product. Two copies of Insulin protein Chain B Two copies of Insulin protein Chain A

PDB IDs and regions of protein included in the experiment Protein Feature View Links to UniProt with sequence, function and integration of links to various other bioinformatics resources UniProt , Pfam, etc. annotations This page maps all PDB entries that match the protein sequence listed under the UniProt ID. For example in this case the UniProt ID is P01315 (pig insulin). The lower panels list the PDB IDs and the regions (domains) that they map to on the UniProt (protein) sequence. PDB IDs and regions of protein included in the experiment Can use this page to identify other relevant structures – e.g. with different domains, mutations etc.

Structure Summary Page - 3 Experimental Data and Validation Entry History The blue bars in the Macromolecules box indicates the sequence of molecules in the PDB entry, matched against the UniProt sequence and various other domain and other annotations.

Learning Objectives Introduction to PDB RCSB PDB Query Explore and Learn Visualize and Analyze 3D Structures Ligands and their neighborhood Analyzing interactions

Visualization Metaphors/Conventions What does a molecule look like? Wireframe Ribbons Before delving into visualization it may be helpful to understand some basics about visualization. Here the coordinated may be represented as atoms and bonds, ribbons, or surfaces. Combinations of these representations may also be used All atoms Backbone Spacefill

Visualize: Biological Assembly Deposited coordinates (or Asymmetric Unit) Toggle through various biological assemblies (monomer, dimer, trimer and hexamer) Learn more about Biological Assemblies at http://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/missing-coordinates-and-biological-assemblies

Visualize from Structure Summary Page By clicking on the NGL, JSmol, or PV links you will be redirected to the 3D View Tab where you can explore the molecule’s visual further. These visualization tools can also be accessed from the 3D View tab and then by selecting the specific tool using the drop down menu options next to the image. See next slide

3D View (Jmol/JSmol) Display symmetry/ assembly The Jsmol or Jmol tool is most popular online visualization software that can be directly used without installing any visualization software package Structure display options Explore interactions

Explore Ligand Interactions Small molecular ligands, (ions, cofactors, inhibitors or drugs) found in a structure are usually important for its structure and/or function. Exploring the interactions around ligands can highlight amino acid residues critical to the protein’s functions. Exploring the kinds of interactions (hydrogen bond, hydrophobic, charge based etc.) can help understand the protein’s mechanism of action Analyze interactions around ligand Using online resources the environment of key ligands can be explored as shown above. The coordinates may also be visualized using other software e.g. Chimera, Pymol, etc. for analysis and making publication quality images.

Summary Introduction to PDB RCSB PDB Query Explore and Learn Search Browse Explore and Learn Structure Summary Page Links to Other Resources Visualize and Analyze 3D Structures Ligands and their neighborhood Analyzing interactions