Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK.

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

Data Curation in Crystallography: Publisher Perspectives JISC Data Cluster Consultation Workshop CCLRC, Didcot, Oxon 10 October 2006.
CCPN project modeling framework University of Cambridge European Bioinformatics Institute MSD group.
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
Annotation of Image Segments using Ontologies Justin Preece Research Assistant, Bioinformatics Dept. of Botany and Plant Pathology Oregon State University.
Update on PDB Data Deposition Specifications
Deposition BIOCHEMICAL SAMPLE SPECIMEN PREPARATIONIMAGING IMAGE PROCESSING RECONSTRUCTION MAP-FITTING ELECTRONIC NOTEBOOK.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
Computational Biology: A Measurement Perspective Alden Dima Information Technology Laboratory
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Archives and Information Retrieval
EMDB Richard Newman Monica Chagoyen Mohamed Tagari EMBL-EBI Cryo-Electron Microscopy Structure Deposition Workshop RCSB Protein Data Bank in Rutgers University,
Overview of Search Engines
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Evaluation of Structure Quality Using RCSB PDB Tools Kyle Burkhardt, Lead Data Annotator The RCSB PDB at Rutgers University.
CCP-EM community meeting 7 February 2013 EMDB and beyond Ardan Patwardhan and Gerard Kleywegt Protein Data Bank in Europe EMBL-EBI.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Paul Matsudaira WI/MIT BioImaging Center, Dept Biology and Div Biological Engineering, MIT Cellular machinery, biomechanics, and bioinformatics IC-21 macrophage.
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
Ensemble Computing in the National Science Digital Library (NSDL)
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Databank in Europe (PDBe)‏ An Introduction.
Sharing Management of Data and Information on Earth Science about Western China Prof. SUN Chengquan and ZHANG Haihua the Scientific Information Center.
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Worldwide Protein Data Bank Worldwide Protein Data Bank History of the PDB  1970s  Community discussions about how to establish.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Discovery Metadata for Special Collections Concepts, Considerations, Choices William E. Moen School of Library and Information Sciences Texas Center for.
Baylor College of Medicine Wah Chiu*, PI Grigore Pintilie* Matthew Baker* Matthew Dougherty Steven Ludtke Rutgers University Helen Berman, co-PI Catherine.
Workshop on Structural and Computational Proteomics of Biological Complexes.
Data Integration and Management A PDB Perspective.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
Workshop Structural Proteomics of Biological Complexes.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Real World Experiences in Operating a Collaboratory: The Protein Data Bank Helen M. Berman Board of Governors Professor of Chemistry.
EM Maps and Models in EMDB/PDB. Growth of EM entries
Towards a Structural Biology Work Bench Chris Morris, STFC.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
CombeDay Making Data Openly Available Simon Coles.
EMBL-EBI Data Archives – An Overview. The EMBL-EBI mission Provide freely available data and bioinformatics services to all facets of the scientific community.
Metadata-based Discovery: Experience in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK A centre of.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
High throughput biology data management and data intensive computing drivers George Michaels.
The Electron Microscopy Data Bank and OME Rich data, quality assessment, and cloud computing Christoph Best European Bioinformatics Institute, Cambridge,
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
Ukpmc.ac.uk As a result of the mandates Research in the open How mandates work in practice 29 th May, 2009 Paul Davey, UK PubMed Central Engagement Manager,
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
EBI is an Outstation of the European Molecular Biology Laboratory. Semantic Interoperability Framework Sarala M. Wimalaratne (RICORDO project)
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Afternoon session: The archival problem and infrastructure for solutions Prof John R Helliwell Interactive Publications.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Economics and Impact of the Protein Data Bank (PDB) Archive
PDBe Protein Interfaces, Surfaces and Assemblies
What is cryo EM? EM = (Transmission) Electron Microscopy
The 2015 CryoEM Map and Model Validation Challenges
Metadata supported full-text search in a web archive
Presentation transcript:

Digital Archives for Molecular Microscopy A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK Matthew T. Dougherty NCMI - Baylor College of Medicine Houston, Texas

Bioimage Informatics Informatics in support of biological imaging Why? Image data rapidly increasing (Confocal) Fluorescence microscopy (Cellular B.) EMDB: Electron Microscopy (Structural Biology) High-throughput methods (Genome Biology) Enabling science by making data accessible, reliable, and understandable Standards&Conventions Public Databases Quality assessment Open Microscopy Environment S.Haertel, U. Chile J. Swedlow, U. Dundee EMDB, EBI

Structural Databases at EBI Protein Databank (PDB) Atomic structures (positions of atoms) PDB file format, mmCIF Derived from X-ray crystallography Long tradition, curated data base Huge: 65,000+ entries, 3 wwPDB sites Electron Microscopy Databank (EMDB) Part of PDB at EBI and Rutgers 600 density maps of macromolecular structures and subcellular complexes Started 2002 Curated, but limited metadata, experiment info XML-based

Page 4 SCIENTIFIC BACKGROUND

Page 5 Electron microscope From Schweikert, 2004 Biocenter, U Helsinki

Page 6

Page 7 Single-particle method Tripeptidyl-peptidase II (TPP II) courtesy of B. Rockel, Martinsried Molecular structure Many images computationally combined 3D from 2D resolution increase by avaraging

Page 8 Single-particle analysis: GroEL to 4A Ludtke et al, Structure 2008

Page 9 Data Management Issues Initial EM images: O(1000), 4k x 4k -> O(10GPixel) Particle stacks: O(100,000), 256x256 -> O(10 GPixel) Final data set: 1 MVoxel small Processing power: O(100) cores, some weeks, lab-owned clusters Software: 1970s FORTRAN codes, 1990s C codes fragmented communities, lack of standards

Page 10 Electron tomography 3D reconstruction by taking a series of images from different angles Difficulty: Nanometer accuracy Problems: Limited tilt range ↔ missing wedge ⇒ distortion Imperfections of the tilt ↔ alignment ⇒ limited resolution Computational reconstruction algorithms

Page 11 Tomography of eukaryotic cells PROJECTION SLICE O. Medalia et al, Science, 2002 Dictyostelium discoideum

Page 12 Image enhancement Before Cytoskeleton of Spiroplasma melliferum J. Kürner et al., Science, 2005

Page 13 Image enhancement yellow: geodetic line J. Kürner et al., Science, 2005 After

Page 14 Automated image analysis Manual Automatic A. Linaroudis, Ph.D. Thesis, 2006 Automatic segmentation to identify points/lines/surfaces

Page 15 Data Management Issues Original data: 60 images, 8k x 8k -> O(4 GPixel) Reconstruction: 8k x 8k x 256 -> O(16 GPixel) ? Software: 1970s algorithm in 1990s software Visualization: “let's buy more memory” Future: web-based applications (Google Maps) ?

The Electron Microscopy Data Bank contains EM-derived density maps complementary to coordinate sets in PDB established EBI (Kim Henrick) web-based submission and retrieval hand-curated (R. Newman) A bit like Ebay – and you won't make any money, either

THE ELECTRON MICROSCOPY DATA BANK

A Unified Data Resource for EM NIH-funded joint project Baylor College of Medicine, Houston (W. Chiu, M. Baker) Rutgers University, New Jersey [H. Berman, C. Lawson) PDBe, EBI, Cambridge, UK [K. Henrick, C. Best, R. Newman Baylor College of Medicine Houston, TX Rutgers University, Piscataway, NJ European Bioinformatics Institute, Cambridge, UK

Characteristics Curated Community Archive: PDB and EMDB NIH, EU (in past), and BBSRC funding (+ EMBL) Worldwide cooperation Advisory boards and task forces from the community Open deposition and retrieval → Alternative access systems by other institutions 760 entries, 26 GB data ca 100 entries/year curation both in Europe and US

Growth of EMDB

EMDep deposition system 750 entries, current rate approx /month Contents of an entry: Metadata (XML header) → experimental metadata Map (any format, converted to CCP4/MRC) Additional files Java/Tomcat/XML

Unified data resource plan

Joint deposition system

EMDB search system Java/Tomcat

EMDB search system Java/Tomcat

EMDB Atlas pages XSLT

ISSUES

Metadata management Difficult: many rounds of consulting the community Still most fields remain empty Data harvesting LIMS, PIMS -> rarely used Processing pipelines, image processing software -> Lack of standards, idiosyncrasies Image formats: Appalling lack of standards

Data issues Current: Deposit final result of experiment and computation How much of original/intermediate data should be deposited? Issues: Cost / Practicability Reproducibility of experiment Intellectual property (un-exploited results?) Usefulness

Non-data issues Embargo: Image data can be withheld up to two years Allows original researcher to further exploit them Journals and funders must define: what data must be deposited when they are to be released Quality Standards: Require community acceptance Technically difficult Data Bank does enrich/annotate, but does not do science → quality standards must be set by scientists

Image data formats Current: Variety of historical ad hoc formats Unclear definitions, variations in different software Need: Interoperability Standards Technical level? Acceptance? → Question for the community HDF5 Common container format to deal with numerical data Heavyweight library, but widely available (but Java?) Would at least solve low-level format problems Metadata format still needs to be specified

Ontologies Systematic way to define classes of objects attributes of these objects relationships between objects Provides framework for metadata models Advantage: Powerful formal method Disadvantage: Not yet widely used

TECHNICAL DEVELOPMENTS

Rich data sets Submissions consist of maps (increasingly more than one) relations between data sets → unexpressed XML-based standards for represen-ting relationships between data: Subject-predicate-object relationships (RDF framework) Harvesting interface to EM processing software Web-based visualization for sub-mission and retrieval, complex sub-missions assembled interactively (AJAX)

Rich data submissions

Possible XML representation

Bioimage informatics tools Current EMDB interface: simple and efficient but must be extended to accommodate more complex experiments OMERO interface: geared at labs, not public databases All the beauty of AJAX high-performance visualization

multichannel images lab notebook tagging image markup Bioimage informatics tools BISQUE/BISUICK (UCSB)

No Standards Experiment? Image? Analytics? Annotations? Current Imaging Workflow Paradigm Jason Swedlow (U. Dundee)

Towards Image Informatics

OMERO in 2007/8/9 Jason Swedlow (Univ. Dundee)

CONCLUSIONS

Imaging Centers USERS Databases Grid/cloud computing /storage in house storage storage and computing engines data submission data harvesting acquisition, storage, and management of images storage distribution quality assessment Software A Virtual Research Community

CONCLUSIONS Community data bases are a central part of the Scientific Data Infrastructure Image databases rapidly growing Technical challenges: data formats, size Standards and interoperability Improve metadata collection Keep the community engaged