SERNEC Image/Metadata Database Goals and Components Steve Baskauf 2009-11-04 1.

Slides:



Advertisements
Similar presentations
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Advertisements

1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Managing data in your institution James A J Wilson Friday 24 June 2011.
1 Adaptive Management Portal April
R utgers C ommunity R epository RU CORE Fedora Repository Object Datastreams.
A LOOMING CRISIS: MAINTAINING ACCESS TO ELECTRONIC RESEARCH PRODUCTS Daphne Fautin University of Kansas Gail Kampmeier Illinois Natural History Survey.
An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,
R utgers C ommunity R epository RU CORE Fedora Repository Object Datastreams.
© 2004, The Trustees of Indiana University 1 OneStart Workflow Basics Brian McGough, Manager, Systems Integration, UITS Ryan Kirkendall, Lead Developer.
SQL Reporting Services Overview SSRS includes all the development and management pieces necessary to publish end user reports in  HTML  PDF 
Improving Quality with the Substance Registry Services (SRS) John Harman U.S. EPA May 14, 2009.
Africa Information Highway and SDMX implementation in Africa Beejaye Kokil Economic & Social Statistics Division African Development Bank
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
Introduction to UDDI From: OASIS, Introduction to UDDI: Important Features and Functional Concepts.
ALLOWS FOR efficient computerization and management of biological collections and mobilization of specimen information onto the Internet.ALLOWS FOR efficient.
Portal and AQAS-Philadelphia University 21-22/6/2011 AVCI Platform in PU Dr. Abdel-Rahman Al-Qawasmi Philadelphia University Director of Computer Center.
IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi.
SWIS Digital Inspections Project (SWIS DIP) Chris Allen, Information Management Branch California Integrated Waste Management Board November 5, 2008 The.
Classroom User Training June 29, 2005 Presented by:
6-1 DATABASE FUNDAMENTALS Information is everywhere in an organization Information is stored in databases –Database – maintains information about various.
Proposal for App Id and Service Provider Id registration Group Name: Shelby Kiewel Source: Shelby Kiewel, iconectiv / Ericsson,
DE&T (QuickVic) Reporting Software Overview Term
Use case lessons: Components of the SEEK architecture Robert K. Peet University of North Carolina.
Web Architecture & Services (2) Representational State Transfer (REST)
An Overview of MPEG-21 Cory McKay. Introduction Built on top of MPEG-4 and MPEG-7 standards Much more than just an audiovisual standard Meant to be a.
Introduction: Databases and Database Users
Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences
On-line data submission training California Partnership for Achieving Student Success.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
REST - Introduction Based on material from InfoQ.com (Stefan Tilkov) And slides from MindTouch.com (Steve Bjorg) 1.
Microsoft Access Designing and creating tables and populating data.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Automated (meta)data collection – problems and solutions Grete Christina Lingjærde and Andora Sjøgren USIT, University of Oslo.
Reports and Learning Resources Module 5 1. SLMS Primary Administrator Training Module 5: Reports and Learning Resources 2.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
ITGS Databases.
TDWG Life Sciences Identifiers Applicability Statement Ben Richardson Review Manager, LSID Applicability Statement Western Australian Herbarium Department.
Overview PlantCollections – Publish information about public garden collections – Using existing infrastructure Morphbank – Goals and capabilities of.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Database Management Systems (DBMS)
Metadata Input Tool for CADIS Scientists and Data Managers by D. Stott August 8, 2007.
Methods and Techniques for Integration of Small Datasets September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban.
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
Breakout Group #3 Data Distribution and Access Recommendations for Workshop Report John Dwyer, Rapporteur.
U.S. Environmental Protection Agency Central Data Exchange Pilot Project Promoting Geospatial Data Exchange Between EPA and State Partners. April 25, 2007.
Web Technologies Lecture 10 Web services. From W3C – A software system designed to support interoperable machine-to-machine interaction over a network.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
IABIN Species and Specimens Thematic Network (SSTN) IABIN Executive Committee/Coordinating Institution Meeting. Tierras Enamoradas, Costa Rica. February.
Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
The overview How the open market works. Players and Bodies  The main players are –The component supplier  Document  Binary –The authorized supplier.
ODATA DESIGN PRINCIPLES July 26, BUILD ON HTTP, REST OData is a RESTful HTTP Protocol Build on HTTP Entities modeled as Resources Relationships.
Presentation on Database management Submitted To: Prof: Rutvi Sarang Submitted By: Dharmishtha A. Baria Roll:No:1(sem-3)
NVS New Zealand National Vegetation Survey. What is NVS? NVS (National Vegetation Survey) – New Zealand’s largest archive facility for plot-based vegetation.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
CSRP: Post-bind Submission (PbS) On-line Submission Portal High Level Design July 2015.
Course on persistent identifiers, Madrid (Spain) Information architecture and the benefits of persistent identifiers Greg Riccardi Director Institute for.
This was written with the assumption that workbooks would be added. Even if these are not introduced until later, the same basic ideas apply Hopefully.
1 © Charles Schwab & Co., Inc. All rights reserved. Member SIPC ( ). Electronic Trading The Charles Schwab Corporation (Schwab) provides services.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Creating and submitting Cal-PASS Data files California Partnership for Achieving Student Success.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Software Project Configuration Management
Introduction to Persistent Identifiers
Jessie Kennedy Rob Gales, Robert Kukla
Flanders Marine Institute (VLIZ)
Presentation transcript:

SERNEC Image/Metadata Database Goals and Components Steve Baskauf

Overall goals To create a metadata database structure that is flexible and can handle specimen data, specimen images, and live plant images. The database will be designed to easily output to consumers including Morphbank, GBIF, and a SERNEC web portal. To create contributor interface(s) that will allow rapid data entry or transfer with minimal contributor effort. 2

Conceptual scheme: players SERNEC database SERNEC Web portal Morphbank GBIF Conversion utility Institutional database Contributors without institutional infrastructure contributorsconsumers 3

General Principles SERNEC acts as a facilitator. – Participation in the SERNEC database doesn’t prevent contributors from doing anything that they were already doing – SERNEC doesn’t “own” anything – SERNEC sets minimum standards for participation that will allow the system to operate and that will ensure the quality of the metadata served Components in the system are “black boxes” that don’t require participants to understand other parts Interactions among components are governed by generally recognized standards for communication: XML, LSIDs or LSID-based HTTP URIs, Darwin Core, MRTG System should not collapse if any component disappears. 4

Facts About Persistent Identifiers Persistent identifiers (universally unique identifiers=UUIDs=GIUDs) are coming. In a complex system, unique identifiers are needed to determine whether a resource exists already (to prevent creation of duplicate records) Use comes with responsibilities: – Must guarantee uniqueness – Persistence – Should be actionable (provide metadata to users) 5

LSIDs (or HTTP URI) assignment urn:lsid: : : or : : It appears likely that resolution service will be provided centrally by a big player like GBIF, i.e. they will be the authority: gbif.org. Individual users will be responsible for making sure that their resources have unique string identifiers. SERNEC is probably going to have to be the party ensuring that the namespace is unique (by negotiation with the authority) Some users may generate their own persistent identifiers and that will have to be fine with SERNEC. 6

Strategy for Generating Internal Unique IDs Each participating institution MUST have unique IDs within each of their collections (this is the ) SERNEC keeps a list of institution codes checked with biocol.org for uniqueness. If unique IDs within institution, is institutioncode If unique IDs within collection but not institution, is institutioncode_collectioncode Internal Unique ID = : When an authority is willing to handle our GUIDs, we check to make sure that each SERNEC namespace is unique within their authority, then concatenate internal unique ID to authority part of LSID. 7

System component: the database Structure needs to be able to handle both specimen and live plant images Must keep track of the status of resources – Are they new with non-redundant IDs? – Have they been updated? – Has the data/metadata been passed on to the consumers? Should be simple enough or exportable enough to outlive SERNEC if necessary SERNEC database 8

Individual Herbarium specimen Specimen image Individual Live plant image Specimen image Relevant occurrence types are specimens & images Record fields governed by: Darwin Core (general specimen & live-plant image metadata ) MRTG (image-specific specimen & live-plant metadata) Individuals may be represented by a composite of the relationship types shown if the plant is both imaged directly and collected. 9

Determination structure compatible with annotations Determination structure compatible with taxonomic concept mapping (multiple possible names) Determination structure capable of tracking resources used to make determination Determinations linked to standardized taxon units (ITIS TSNs and/or LSIDS Individual (I) resource determination 1 (D1) determination 2 (D2) taxon 1 (T1) taxon 2 (T2) 10

SERNEC database /consumer relationships SERNEC web portal: regional data, end-user educational resources, facilitation of collaboration Morphbank: permanent image repository, provider to downstream secondary consumers (i.e. EOL) GBIF: primary biodiversity database, possible future resolution service for persistent identifiers SERNEC database SERNEC Web portal Morphbank GBIF consumers 11

SERNEC database/web portal Support Flora of the Southeast or successor web documentation efforts Provide user-friendly mechanisms for searching for data and images, organize “courtesy requests” for non-commercial use of large numbers of images Provide access to data-driven educational/research applications, e.g. visual keys, iPhone data apps, teacher lesson plans 12

SERNEC database/Morphbank Capable of generating XML needed by Morphbank for image submission. Query Morphbank services to determine whether contributor has already uploaded the image to Morphbank Update Morphbank image records if contributor changes metadata. 13

SERNEC database/GBIF Provide primary biodiversity records to GBIF using IPT/TAPIR protocol for institutions not capable of maintaining their own services. Assuming at some point in the future GBIF or another organization provides resolution services for organizations not capable of acting as LSID authorities, data from the SERNEC database would be passed to the resolution provider to be used for LSID resolution. 14

SERNEC database/provider relationships Contributors without institutional infrastructure: SERNEC-created web-based tools would allow users having limited record-keeping capabilities and IT infrastructure to submit metadata and images Contributors with institutional infrastructure: SERNEC would create customized conversion utilities that would accept database output of various formats and convert them to a form that can be recognized by the SERNEC database SERNEC database Conversion utility Institutional database Contributors without institutional infrastructure 15

SERNEC/Contributors without IT infrastructure Users would be responsible for: – Collecting and organizing their own metadata using software (e.g. Specify or Excel) capable of simple text (CSV or tab delineated) or Excel output. – Maintaining identifiers (strings) that are unique within their institution. SERNEC-provided software would generate LSIDs and convert metadata to fit SERNEC database data model as well as facilitating the association of images with metadata It is assumed that contributors will have little or no interaction with consumers (GBIF, Morphbank) outside of that facilitated by SERNEC 16

SERNEC/contributors with IT infrastructure Contributors may have their own system for: – maintaining a complex database for metadata – generating LSIDs and either maintaining their own authority or transmitting metadata directly to another institution acting as the authority (e.g. GBIF) – managing specimen and live-plant images and associating them with the appropriate metadata in their database Conversion utility enables the SERNEC database to “talk” to contributor’s system and update SERNEC database 17

Main points All the necessary components (standards, contributors, consumer organizations) exist or will exist within the next year. SERNEC has established relationships with all of the required players. Players are willing to participate and have a vested interest in seeing it succeed. SERNEC has the human, financial, and IT resources to pull this off. Participants take care of themselves to the maximum extent possible, SERNEC “helps” smaller institutions to participate on same level as bigger players. 18