PSI Structural Genomics Knowledgebase Helen M. Berman Bottlenecks Workshop April 14, 2008.

Slides:



Advertisements
Similar presentations
SG KB 2009 NIGMS Workshop: Enabling Technologies for Structural Biology Section on Structural Analysis Margaret J. Gabanyi March 4, 2009 How to Use the.
Advertisements

Interoperability Scenarios All Working Groups Meeting May, Rome, Italy.
Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
The IntAct Database Sandra Orchard & Birgit Meldal.
Gene Ontology John Pinney
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Archives and Information Retrieval
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
The Protein Data Bank (PDB)
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Protein and Function Databases
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Ch10. Intermolecular Interactions and Biological Pathways
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Bringing Structure to Biology: Small Molecules and the PDBe
Copyright OpenHelix. No use or reproduction without express written consent1.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
EMBL-EBI Adel Golovin MSDsite The project is funded by the European Commission as the TEMBLOR, contract-no. QLRI-CT under the RTD programme.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
Helen M. Berman, Rutgers University EMBO Practical Course Section: Searching Structure Databases September 26, 2008 PSI Structural Genomics Knowledgebase.
Data and Dissemination Core 1. Overview and EFI Website – Heidi Imker, UIUC 2. EFI LabDB LIMS – Wladek Minor, UVA 3. SFLD – Patsy Babbitt, UCSF (post lunch)
GeWorkbench Highlights caBIG ® Molecular Analysis Tools Knowledge Center AACR Annual Meeting, April 3, 2011.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Copyright OpenHelix. No use or reproduction without express written consent1.
Topic 2 John Markley. Task: choice of targets that meet selection criteria and are likely to yield structures Models from sequences: ORFs, intron/exon.
Protein and RNA Families
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Protein Structure Initiative Mission Statement. The long- range goal of the Protein Structure Initiative is to make the three- dimensional atomic-level.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Motif discovery and Protein Databases Tutorial 5.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
A collaborative tool for sequence annotation. Contact:
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
SG KB 2009 NIGMS Workshop: Enabling Technologies for Structural Biology Section on Structural Analysis Helen M. Berman March 4, 2009 How to use the PSI.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
Cheminformatics and Metabolism Team The EBI Enzyme Portal.
Introduction to PubChem BioAssay
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Archives and Information Retrieval
Data challenges in the pharmaceutical industry
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Predicting Active Site Residue Annotations in the Pfam Database
INFORMATION FLOW AARTHI & NEHA.
Prediction of protein function from sequence analysis
TargetDB and PEPCDB •
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

PSI Structural Genomics Knowledgebase Helen M. Berman Bottlenecks Workshop April 14, 2008

Knowledgebase

PSI SG Knowledgebase Knowledgebase Vision  The PSI Structural Genomics Knowledgebase (PSI SG KB) will turn the products of the PSI effort into major advances in knowledge that can be used to understand living systems and human disease  The PSI SG KB will be a key resource for the advancement of biology, biochemistry, functional genomics, pharmacology, bioinformatics, chemistry, education and clinical medicine

PSI SG Knowledgebase Knowledgebase Goals To provide a “marketplace of ideas” that  connects protein sequence information to 3D structures and homology models  enhances functional annotations  provides access to new experimental protocols and materials To kick start and enable advancements in structural genomics  by communicating and providing visibility and accessibility of information and technology advances of the PSI  through presentation and discussion of the most provocative challenges with the general community  by fostering community collaborations

PSI SG Knowledgebase To capture, make accessible, and highlight elements of the high throughput pipelines for general use in the community and to leverage such information through the generation of hundreds of thousands of molecular models and functional annotation. Standard metrics will be used to measure progress. Genomic Based Target Selection Data Collection Structure Determination Isolation, Expression, Purification,Crystallization PDB Deposition & Release Models Annotations Publications Metrics Technology Experimental Tracking Scope Target Selection Materials

PSI SG Knowledgebase Knowledgebase Users  Biologists  Biochemists  Functional Genomists  Pharmacologists  Bioinformatics  Chemists  Clinical Researchers and Physicians  Teachers and Students

KB Site Features News and Events Molecules of Unknown Function Link to Functional Sleuth Gallery Featured Structure Link to Technology Module Technology Feature Search by - Sequence - Keyword - PDB ID

PSI SG Knowledgebase PSI SG KB Portal  Collects sequences, common features, and common identifiers  Maintains correspondences in local database  Delivers aggregate reports, inventories, and e- publications which contain links to PSI projects, modules and external resources  Delivers featured articles describing: PSI news and events, featured molecules and technologies, molecules of unknown function  Provides collaborative environments for discussion, annotation, and target suggestions

PSI SG Knowledgebase PDB ID Sequence Keyword Queries PSI Modules PSI Centers PSI Info Site Related Biological Resources Archival Sequence Databases Domain Databases (Pfam) Literature (PubMed) TargetD B PepcDB PDB TargetDB Sequences PDB Sequences Portal Resource Database Keyword Database PSI SG KB Portal Databases Models Portal

PSI SG Knowledgebase Modules Modules derived from PSI information and external resources  Target Selection & Experimental Data Tracking  Materials Repository  Models  Annotation  Metrics  Technology  Outreach

PSI SG Knowledgebase Target Selection & Experimental Data Tracking  Target Selection – PSI-2 BIG4  Family definitions and target management  TargetDB  Search by sequence, Target ID, project site, status, update date, protein name, and source organism  Links to other sequence databases, domain databases, other structural genomics centers, and PDB  Download target data  Target statistics summary  PepcDB  All the functionality of TargetDB plus –Experimental protocols –Detailed status history of experimental trials –Information on failed experiments

PSI SG Knowledgebase Experimental Tracking PepcDB Search Form Protocol Keywords Search

PSI SG Knowledgebase

Experimental Tracking Module

PSI SG Knowledgebase

Materials Repository

PSI SG Knowledgebase PSI Materials Repository Module

PSI SG Knowledgebase

Modeling Portal Current Phase 1 Model Portal contains  Models from 4 PSI centers and 2 public model databases (SwissModel and ModBase) integrated on a common UniProt reference system.  Current release consists of 5.8 million comparative protein models for 1.97 million distinct UniProt entries.

PSI SG Knowledgebase Modeling Portal

PSI SG Knowledgebase Metrics Module  Provides objective measures of the progress and output of the PSI project  Centered around “Goals and Milestones” document

PSI SG Knowledgebase PSI-2 Summary Statistics Updated April 1, 2008 I.1.ANumber of novel experimental PSI-2 structures1031 I.1.BNumber of distinct experimental PSI-2 structures non- redundant sequences 1428 I.1.DTotal number of experimental PSI-2 structures1628 I.1.ENumbers of experimentally determined distinct residues Numbers of experimentally determined novel residues I.2.JNumber of experimental structures of human proteins61 I.2.KNumber of experimental structures of eukaryotic proteins186 I.2.MNumber of experimental structures of membrane proteins1 I.2.NNumber of experimental structures determined at the atomic level using x-ray crystallography 1484 Number of experimental structures determined at the atomic level using NMR methods 144

PSI SG Knowledgebase PSI-2 Summary Statistics for Domain and Modeling Leverage I.1.CNumber and Size of BIG Domain Families for which PSI-2 provides the first Experimental Structure Representative 474 Number and Size of MEGA Domain Families for which PSI-2 provides the first Experimental Structure Representative 399 I.1.ENumbers of Experimentally Determined Distinct BIG Family Residues Numbers of Experimentally Determined Distinct MEGA Family Residues I.3.ATotal Modeling Leverage I.3.BNovel Modeling Leverage Updated January 15, 2008 Updated February 21, 2008

PSI SG Knowledgebase Technology Module Genomic Based Target Selection Data Collection Structure Determination PDB Deposition & Release Functional Annotation Publication PSI Centers are actively developing technologies and methodologies for all aspects of the structure determination pipeline Isolation, Expression, Purification,Crystallization

PSI SG Knowledgebase Technology Module Progress  Phase 1 Technology Portal in place  Summary Information from all PSI Centers  Keyword search from KB portal

PSI SG Knowledgebase

Outreach Module Provides information to the public about the products and accomplishments of the PSI  Media reports  Publications  Community activities  Plans for a Nature Gateway

PSI SG Knowledgebase

Current Annotation Module  10 PSI Interactive Services for Sequence, Structure and Functional Annotations  11 PSI Galleries and Summaries of Sequence, Structure and Functional Annotations  35 other resources for annotation Provides paths to unravel sequence, structure, function relationships

PSI SG Knowledgebase Annotation Module

PSI SG Knowledgebase

Biological Annotation of Novel Proteins March 7, Calit2, UCSD  Participants  PSI groups  Annotation system authors  General biological community  Outcome  Recommendations for standard annotations  Processes for community input

PSI SG Knowledgebase Standard Annotations Genomic features: gene identifier, name and synonyms, operon/regulon mappings Protein sequence features: amino acid sequence, taxonomy & phylogeny, sequence database accession, isoform, SNPs, PTMs, sequence families, residue conservation. Structure features: oligomeric state, structure and functional domains, DNA binding motifs, nests & clefts, sites of interaction, residue regions of protein-protein, ligand-protein, catalytic sites, secondary structure, structural neighbors and comparison of groups of structures with common feature, properties/features mapped to 3D and their similarities (e.g. electrostatics, cavities, conserved residues, quality assessment ) Ligands: chemical structure, interactions, functional role. Functional classification: GO, FunCat, EC, epitope mapping, cellular location, organ location, substrate specificity, disease involvement Mapping to Biological Systems: mapping to networks and pathways (e.g. Reactome, Kegg, HPRD, BioCyc, Reactome, KEGG, HPRD, NetPath, MINT, MIPS, DIP, STRING, STITCH, PROLINKS) Literature: synonyms for protein names, links to PubMed by database identifier and related text and authors

PSI SG Knowledgebase Future Improvements Experimental Data Tracking -  Standardization of the protocols in PepcDB  PepcDB data deposition tool  Integration with the Materials Repository Materials Repository -  Searchable database of clones  Ordering system  Integration with PepcDB and PSI SGKB Models Module -  Public web service interface  Additional quality assessment  Interactive homology modeling

PSI SG Knowledgebase Future Improvements Technology Module -  Improved navigation over technology topic areas  Keyword search option of descriptions and publications PSI SGKB -  Integration with Nature Gateway  Simple presentation and search of standard annotations  Incorporation of data about ligands and modified-residues  Molecular visualization tool

PSI SG Knowledgebase Acknowledgements KB TeamModules Wendy TaoTorsten Schwede (Models) Raship ShahAndrei Kouranov (Exp. Data Tracking) James ChunPaul Adams (Technology) John WestbrookWladek Minor (Publications) Josh La Baer (Materials) Rajesh Nair (Metrics) Access Information