Recording Metadata Inbal adir 26/4/17.

Slides:



Advertisements
Similar presentations
How to publish genomic Data papers based on BOL data - Biodiversity Data Journal Lyubomir Penev Bulgarian Academy of Sciences & Pensoft Publishers ViBRANT.
Advertisements

Pensoft Writing Tool (PWT) Lyubomir Penev ViBRANT Tools for DNA taxonomists, 11 June 2013, Brussles ViBRANT.
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical.
Gene Ontology John Pinney
Systems Biology Data Dissemination Working Group 25FEB2015.
Physical design. Stage 6 - Physical Design Retrieve the target physical environment Create physical data design Create function component implementation.
Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical Informatics U.T. Southwestern Medical Center Standardizing Metadata Associated.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects and their Implementation in NIAID Bioinformatics Resource Centers Richard.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
U.S. Department of the Interior U.S. Geological Survey NWIS, STORET, and XML National Water Quality Monitoring Council August 20, 2003.
Scratchpads Publication Module - A paradigm shift in publishing RBG Kew, Seminar,
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
U.S. Department of the Interior U.S. Geological Survey NWIS, STORET, and XML Advisory Committee on Water Information September 10, 2003 Kenneth J. Lanfear,
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
EBI is an Outstation of the European Molecular Biology Laboratory. Bioinformatics Challenges in Data Handling and Presentation to the Bioinformaticists.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Tutorial on XML Tag and Schema Registration in an ISO/IEC Metadata Registry Open Forum 2003 on Metadata Registries Tuesday, January 21, 2003; 4:45-5:30.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Habitat-Lite & EnvO Jin Mao Postdoc, School of Information, University of Arizona Nov. 20, 2015.
Standards for representing meeting metadata and annotations in meeting databases Standards for representing meeting metadata and annotations in meeting.
ISEMP Data Management System. Support entire workflow Based on required functions Based on understanding of the data ISEMP Data Management System.
Trends in Biotechnology Constructing and Screening a DNA Library.
Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.
High throughput biology data management and data intensive computing drivers George Michaels.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
ArrayExpress Ugis Sarkans EMBL - EBI
Canadian Bioinformatics Workshops
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
1 © 2003, Cisco Systems, Inc. All rights reserved. DMTF and Cisco Profile overview/comparison August 17, 2005.
Center of Excellence for Oceans and Human Health at the Hollings Marine Laboratory Metadata Development in Support of the Oceans and Human Health Tidal.
Microbial genomics.
Metagenomic Species Diversity.
Why to submit your data and metadata?
Implementing the Surface Transportation Domain
Seminar in Bioinformatics (236818)
Considerations for metagenomics data analysis and summary of workflows
Flanders Marine Institute (VLIZ)
Design and Implementation
Microsoft Office Illustrated
CIS 515 STUDY Lessons in Excellence-- cis515study.com.
Functional Annotation of the Horse Genome
Mangaldai College, Mangaldai
Genomes and Their Evolution
H = -Σpi log2 pi.
Metagenomics Microbial community DNA extraction
2. An overview of SDMX (What is SDMX? Part I)
Summary of the Standards of Learning
Developing a Data Model
Datasets in CRM Site Proposal
Introduction to the MIABIS SOP Working Group
Metadata The metadata contains
Márton Németh – László Drótos How to catalogue a web archive?
The ultimate in data organization
Supporting High-Performance Data Processing on Flat-Files
Use Cases Simple Machine Translation (using Rainbow)
A Presentation by Regina Strelecki
Metadata supported full-text search in a web archive
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

Recording Metadata Inbal adir 26/4/17

Before we start… פתיחה: לחלק פיתקיות ושכל אחד ירשום פרטים על איך הוא הגיע לכיתה- כמה זמן לקח, איך הגיע (באוטו/ברגל), מאיפה הגיע (שיעור אחר/מהבית) ועוד... לאסוף את הכל ולתת למישהו אחד. עכשיו לבקש ממנו להגיד כמה אנשים הגיע לפה ברגל/כמה הגיעו מהבית...

Table of content GSC - The Genomic Standards Consortium MIxS - Minimum Information about any (x) Sequence GCDML - Genomic Contextual Data Markup Language

The Genomic Standards Consortium (GSC)¹ Established in late 2005. Open-membership organization that drives community based standardization activities. Working towards better descriptions of genomes and metagenomes through community level, consensus-driven solutions.

and rapidly increasing… Why standardization? Thousands of genomes Hundreds of metagenomes Tens of thousands of marker gene data sets These data sets should be treated as part of a larger whole—a catalogue of life on earth. and rapidly increasing…

Goals Implementation of new genomic standards. Methods of capturing and exchanging the information captured in these standards. Harmonization of information collection and analysis efforts across the wider genomics community

Working groups Maintenance of an extensible markup language (GCDML). Development of tools and software. Compliance and curation. Biodiversity.

More…. Create a journal “Standards in Genomic Sciences” Participate in ‘Microbial Earth Project’ - calls for the coordinated sequencing of over 9,000 type strains. Participate in ‘M5 Project’ - calls for the coordinated development of a next- generation computational infrastructure.

MIxS² Minimum Information about any (x) Sequence

MIxS Created by GSC. Three minimum information checklists for describing upon submission of: Genomes (MIGS) Metagenomes (MIMS) Environmental marker sequences (MIMARKS) Researchers wishing to conduct comparative genomic studies need adequate description of the environmental context and the experimental methods used.

Main reasons for creating MIxS Testing hypotheses using comparative evo- and eco- genomic approaches. Allow useful grouping, sorting and searching of genomes in databases. Growth in genome sequence data from environmental isolates and metagenomes.

MIxS contain: Information that cannot be calculated from raw genomic sequence. Core descriptors specific to the major taxonomic groups (eukaryotes, bacteria and archaea, plasmids, viruses, organelles) and metagenomes.

Minimum Information about a Metagenome Sequence (MIMS) checklist version 2.0 Project name Environment: Geographic location: latitude and longitude (float), depth and altitude of sample(integer) Time of sample collection (UCT) Habitat (EnvO) Water body: temperature, pH, salinity, pressure, chlorophyll, conductivity, light intensity, dissolved organic carbon (DOC), atmospheric data, density, alkalinity, dissolved oxygen, particulate organic carbon (POC), phosphate, nitrate, sulfates, sulfides, primary production

Minimum Information about a Metagenome Sequence (MIMS) checklist version 2.0 Nucleic acid sequence source: Isolation and growth conditions. Biomaterial treatment (e.g., filtering of sea water). Volume of sample(integer). Sampling strategy (enriched, screened, normalized). Nucleic acid preparation (extraction method; amplification). Library construction (library size(integer), number of reads sequenced(integer), vector). Sequencing method. Assembly: assembly method, estimated error rate and method of calculation

GCDML³ Genomic Contextual Data Markup Language

GCDML Also created by GSC. Free. XML Schema for generating MIGS/MIMS compliant reports for data entry, exchange, and storage. XML is widely used to describe data capture and exchange format: Applications in sequence annotation Protein modeling

GCDML application Tracking the geographic origin and habitat of a sample. Comparative ecological genomic studies. Capture the pathogenicity of a sequenced organism. Describe the host–microbiome relationship in human microbiome studies.

Technical how to build it? Aspects of GCDML Scope what to put in? Technical how to build it?

Scope Defined by the MIGS/MIMS checklist. GCDML will provide the GSC’s official implementation of the checklist. Support different subsets of descriptors across taxa (eukaryotes vs. bacteria vs. viruses or metagenomes).

Technical how to build it? Aspects of GCDML Scope what to put in? Technical how to build it?

Technical Compliant with (MIGS/MIMS), while allowing more expression. Support the integration of terms lists. Allow the recording of legacy data, even when fields are missing. Open and extensible to allow evolution of the MIGS/ MIMS specification. Open to link, map, or incorporate other standards Support versioning.

<nasReport> nasReport = Nucleic Acid Sequence reports Root element for all five taxa MIGS reports. Acts as a container for any number of reports per XML document.

<originalSample> <nasReport> <originalSample>: sample collection <isolate>: single organism <dnaExtract>: DNA extraction < DNALibrary>: DNA library <sequencing>: sequencing details, including assembly <originalSample>

<simpleType> Enumeration of valid terms. Syntactical restriction (the use of free-text fields is decreased to an absolute minimum)

<union> Integrate noncompliant legacy data. Special simple type for enumeration that indicate where and why data is missing. Explicitly state reasons for missing data throughout the schema.

Summary GSC MIxS GCDML

References Field, Dawn, et al. "The genomic standards consortium." PLoS Biol 9.6 (2011): e1001088. Field, Dawn, et al. "The minimum information about a genome sequence (MIGS) specification." Nature biotechnology 26.5 (2008): 541-547. Kottmann, Renzo, et al. "A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML)." OMICS A Journal of Integrative Biology 12.2 (2008): 115-121.