Download presentation
Presentation is loading. Please wait.
1
Recording Metadata Inbal adir 26/4/17
2
Before we start… פתיחה: לחלק פיתקיות ושכל אחד ירשום פרטים על איך הוא הגיע לכיתה- כמה זמן לקח, איך הגיע (באוטו/ברגל), מאיפה הגיע (שיעור אחר/מהבית) ועוד... לאסוף את הכל ולתת למישהו אחד. עכשיו לבקש ממנו להגיד כמה אנשים הגיע לפה ברגל/כמה הגיעו מהבית...
3
Table of content GSC - The Genomic Standards Consortium MIxS - Minimum Information about any (x) Sequence GCDML - Genomic Contextual Data Markup Language
4
The Genomic Standards Consortium (GSC)¹
Established in late 2005. Open-membership organization that drives community based standardization activities. Working towards better descriptions of genomes and metagenomes through community level, consensus-driven solutions.
5
and rapidly increasing…
Why standardization? Thousands of genomes Hundreds of metagenomes Tens of thousands of marker gene data sets These data sets should be treated as part of a larger whole—a catalogue of life on earth. and rapidly increasing…
6
Goals Implementation of new genomic standards.
Methods of capturing and exchanging the information captured in these standards. Harmonization of information collection and analysis efforts across the wider genomics community
7
Working groups Maintenance of an extensible markup language (GCDML).
Development of tools and software. Compliance and curation. Biodiversity.
8
More…. Create a journal “Standards in Genomic Sciences”
Participate in ‘Microbial Earth Project’ - calls for the coordinated sequencing of over 9,000 type strains. Participate in ‘M5 Project’ - calls for the coordinated development of a next- generation computational infrastructure.
9
MIxS² Minimum Information about any (x) Sequence
10
MIxS Created by GSC. Three minimum information checklists for describing upon submission of: Genomes (MIGS) Metagenomes (MIMS) Environmental marker sequences (MIMARKS) Researchers wishing to conduct comparative genomic studies need adequate description of the environmental context and the experimental methods used.
11
Main reasons for creating MIxS
Testing hypotheses using comparative evo- and eco- genomic approaches. Allow useful grouping, sorting and searching of genomes in databases. Growth in genome sequence data from environmental isolates and metagenomes.
12
MIxS contain: Information that cannot be calculated from raw genomic sequence. Core descriptors specific to the major taxonomic groups (eukaryotes, bacteria and archaea, plasmids, viruses, organelles) and metagenomes.
14
Minimum Information about a Metagenome Sequence (MIMS) checklist version 2.0
Project name Environment: Geographic location: latitude and longitude (float), depth and altitude of sample(integer) Time of sample collection (UCT) Habitat (EnvO) Water body: temperature, pH, salinity, pressure, chlorophyll, conductivity, light intensity, dissolved organic carbon (DOC), atmospheric data, density, alkalinity, dissolved oxygen, particulate organic carbon (POC), phosphate, nitrate, sulfates, sulfides, primary production
15
Minimum Information about a Metagenome Sequence (MIMS) checklist version 2.0
Nucleic acid sequence source: Isolation and growth conditions. Biomaterial treatment (e.g., filtering of sea water). Volume of sample(integer). Sampling strategy (enriched, screened, normalized). Nucleic acid preparation (extraction method; amplification). Library construction (library size(integer), number of reads sequenced(integer), vector). Sequencing method. Assembly: assembly method, estimated error rate and method of calculation
16
GCDML³ Genomic Contextual Data Markup Language
17
GCDML Also created by GSC. Free.
XML Schema for generating MIGS/MIMS compliant reports for data entry, exchange, and storage. XML is widely used to describe data capture and exchange format: Applications in sequence annotation Protein modeling
18
GCDML application Tracking the geographic origin and habitat of a sample. Comparative ecological genomic studies. Capture the pathogenicity of a sequenced organism. Describe the host–microbiome relationship in human microbiome studies.
19
Technical how to build it?
Aspects of GCDML Scope what to put in? Technical how to build it?
20
Scope Defined by the MIGS/MIMS checklist.
GCDML will provide the GSC’s official implementation of the checklist. Support different subsets of descriptors across taxa (eukaryotes vs. bacteria vs. viruses or metagenomes).
21
Technical how to build it?
Aspects of GCDML Scope what to put in? Technical how to build it?
22
Technical Compliant with (MIGS/MIMS), while allowing more expression.
Support the integration of terms lists. Allow the recording of legacy data, even when fields are missing. Open and extensible to allow evolution of the MIGS/ MIMS specification. Open to link, map, or incorporate other standards Support versioning.
23
<nasReport> nasReport = Nucleic Acid Sequence reports
Root element for all five taxa MIGS reports. Acts as a container for any number of reports per XML document.
24
<originalSample>
<nasReport> <originalSample>: sample collection <isolate>: single organism <dnaExtract>: DNA extraction < DNALibrary>: DNA library <sequencing>: sequencing details, including assembly <originalSample>
25
<simpleType> Enumeration of valid terms.
Syntactical restriction (the use of free-text fields is decreased to an absolute minimum)
26
<union> Integrate noncompliant legacy data.
Special simple type for enumeration that indicate where and why data is missing. Explicitly state reasons for missing data throughout the schema.
27
Summary GSC MIxS GCDML
28
References Field, Dawn, et al. "The genomic standards consortium." PLoS Biol 9.6 (2011): e Field, Dawn, et al. "The minimum information about a genome sequence (MIGS) specification." Nature biotechnology 26.5 (2008): Kottmann, Renzo, et al. "A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML)." OMICS A Journal of Integrative Biology 12.2 (2008):
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.