Download presentation
Presentation is loading. Please wait.
Published byNelson Ira Simmons Modified over 9 years ago
1
SERNEC Image/Metadata Database Goals and Components Steve Baskauf 2009-11-04 1
2
Overall goals To create a metadata database structure that is flexible and can handle specimen data, specimen images, and live plant images. The database will be designed to easily output to consumers including Morphbank, GBIF, and a SERNEC web portal. To create contributor interface(s) that will allow rapid data entry or transfer with minimal contributor effort. 2
3
Conceptual scheme: players SERNEC database SERNEC Web portal Morphbank GBIF Conversion utility Institutional database Contributors without institutional infrastructure contributorsconsumers 3
4
General Principles SERNEC acts as a facilitator. – Participation in the SERNEC database doesn’t prevent contributors from doing anything that they were already doing – SERNEC doesn’t “own” anything – SERNEC sets minimum standards for participation that will allow the system to operate and that will ensure the quality of the metadata served Components in the system are “black boxes” that don’t require participants to understand other parts Interactions among components are governed by generally recognized standards for communication: XML, LSIDs or LSID-based HTTP URIs, Darwin Core, MRTG System should not collapse if any component disappears. 4
5
Facts About Persistent Identifiers Persistent identifiers (universally unique identifiers=UUIDs=GIUDs) are coming. In a complex system, unique identifiers are needed to determine whether a resource exists already (to prevent creation of duplicate records) Use comes with responsibilities: – Must guarantee uniqueness – Persistence – Should be actionable (provide metadata to users) 5
6
LSIDs (or HTTP URI) assignment urn:lsid: : : or http://authority.org/urn:lsid: : : It appears likely that resolution service will be provided centrally by a big player like GBIF, i.e. they will be the authority: gbif.org. Individual users will be responsible for making sure that their resources have unique string identifiers. SERNEC is probably going to have to be the party ensuring that the namespace is unique (by negotiation with the authority) Some users may generate their own persistent identifiers and that will have to be fine with SERNEC. 6
7
Strategy for Generating Internal Unique IDs Each participating institution MUST have unique IDs within each of their collections (this is the ) SERNEC keeps a list of institution codes checked with biocol.org for uniqueness. If unique IDs within institution, is institutioncode If unique IDs within collection but not institution, is institutioncode_collectioncode Internal Unique ID = : When an authority is willing to handle our GUIDs, we check to make sure that each SERNEC namespace is unique within their authority, then concatenate internal unique ID to authority part of LSID. 7
8
System component: the database Structure needs to be able to handle both specimen and live plant images Must keep track of the status of resources – Are they new with non-redundant IDs? – Have they been updated? – Has the data/metadata been passed on to the consumers? Should be simple enough or exportable enough to outlive SERNEC if necessary SERNEC database 8
9
Individual Herbarium specimen Specimen image Individual Live plant image Specimen image Relevant occurrence types are specimens & images Record fields governed by: Darwin Core (general specimen & live-plant image metadata ) MRTG (image-specific specimen & live-plant metadata) Individuals may be represented by a composite of the relationship types shown if the plant is both imaged directly and collected. 9
10
Determination structure compatible with annotations Determination structure compatible with taxonomic concept mapping (multiple possible names) Determination structure capable of tracking resources used to make determination Determinations linked to standardized taxon units (ITIS TSNs and/or LSIDS Individual (I) resource determination 1 (D1) determination 2 (D2) taxon 1 (T1) taxon 2 (T2) 10
11
SERNEC database /consumer relationships SERNEC web portal: regional data, end-user educational resources, facilitation of collaboration Morphbank: permanent image repository, provider to downstream secondary consumers (i.e. EOL) GBIF: primary biodiversity database, possible future resolution service for persistent identifiers SERNEC database SERNEC Web portal Morphbank GBIF consumers 11
12
SERNEC database/web portal Support Flora of the Southeast or successor web documentation efforts Provide user-friendly mechanisms for searching for data and images, organize “courtesy requests” for non-commercial use of large numbers of images Provide access to data-driven educational/research applications, e.g. visual keys, iPhone data apps, teacher lesson plans 12
13
SERNEC database/Morphbank Capable of generating XML needed by Morphbank for image submission. Query Morphbank services to determine whether contributor has already uploaded the image to Morphbank Update Morphbank image records if contributor changes metadata. 13
14
SERNEC database/GBIF Provide primary biodiversity records to GBIF using IPT/TAPIR protocol for institutions not capable of maintaining their own services. Assuming at some point in the future GBIF or another organization provides resolution services for organizations not capable of acting as LSID authorities, data from the SERNEC database would be passed to the resolution provider to be used for LSID resolution. 14
15
SERNEC database/provider relationships Contributors without institutional infrastructure: SERNEC-created web-based tools would allow users having limited record-keeping capabilities and IT infrastructure to submit metadata and images Contributors with institutional infrastructure: SERNEC would create customized conversion utilities that would accept database output of various formats and convert them to a form that can be recognized by the SERNEC database SERNEC database Conversion utility Institutional database Contributors without institutional infrastructure 15
16
SERNEC/Contributors without IT infrastructure Users would be responsible for: – Collecting and organizing their own metadata using software (e.g. Specify or Excel) capable of simple text (CSV or tab delineated) or Excel output. – Maintaining identifiers (strings) that are unique within their institution. SERNEC-provided software would generate LSIDs and convert metadata to fit SERNEC database data model as well as facilitating the association of images with metadata It is assumed that contributors will have little or no interaction with consumers (GBIF, Morphbank) outside of that facilitated by SERNEC 16
17
SERNEC/contributors with IT infrastructure Contributors may have their own system for: – maintaining a complex database for metadata – generating LSIDs and either maintaining their own authority or transmitting metadata directly to another institution acting as the authority (e.g. GBIF) – managing specimen and live-plant images and associating them with the appropriate metadata in their database Conversion utility enables the SERNEC database to “talk” to contributor’s system and update SERNEC database 17
18
Main points All the necessary components (standards, contributors, consumer organizations) exist or will exist within the next year. SERNEC has established relationships with all of the required players. Players are willing to participate and have a vested interest in seeing it succeed. SERNEC has the human, financial, and IT resources to pull this off. Participants take care of themselves to the maximum extent possible, SERNEC “helps” smaller institutions to participate on same level as bigger players. 18
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.