Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences August 3, 2010
I. Challenges in Biodiversity Informatics Common interest in databasing metadata. Metadata describe resources and their properties. Resource: anything that can be assigned an identifier (e.g. a tree, a specimen, an image, a taxon, a name, etc.) Property: a string literal that describes the resource or a relationship between the subject resource and some other resource.
Example: Vanderbilt Arboretum 5935 identified and geolocated trees
Example subject resource (a tree) “native” literal property relationship property object resource (an image) text string establishmentMeans depiction
Relationship “graph” the tree (7-314) image (79657) “native” establishmentMeans depiction Tree IDEstablishment Means Image ID 7-314native native cultivated79684 Traditional database (typical for specimens)
Non-“flat” relationships in live-plant imaging live tree whole tree image leaf image bark image determination taxon standardized views Baskauf and Kirchoff (2008) Vulpina 7:16-30
Duplicate herbarium specimens live tree specimen image duplicate herbarium specimen at institution B herbarium specimen at institution A live tree same individual determination A taxon A determination B taxon B
live tree (individual organism) whole tree image leaf image bark image determination A taxon A specimen image herbarium specimen determination B taxon B Complex relationships individual-based organization system Baskauf (2010) Biodiversity Informatics 7:17-44
II. Building blocks of a Web-based metadata system 1.We need to be able to unambiguously identify the resources (globally unique identifiers =GUIDs) 2.We need standardized property definitions (e.g. Darwin Core terms) 3.We need a technological solution for communicating properties and relationships to a user anywhere (RDF/XML representation sent to user via the Internet) design principles
Building block #1: GUIDs A globally unique identifier (GUID) should be: 1.globally unique 2.actionable 3.persistent Anyone on the planet should be able to use the GUID to find out about the particular thing that it identifies, forever. That is a pretty tall order (but you can do it)!!!
1. How do you make an identifier globally unique? Create a locally unique identifier: – identifier (catalog number) unique within a collection, e.g. GIS tree ID number: – namespace (collection code) unique within the institution, e.g. vanderbilt vanderbilt/7-314 Make it globally unique by appending a domain name that you control, e.g. bioimages.vanderbilt.edu
Complete HTTP URI GUID combine “ with other pieces: This identifier looks like a URL! An HTTP URI is a uniform resource identifier as well as a resource locator (web address=URL).
2. What does actionable mean? Something happens when you put an actionable GUID in a Web browser (GUID is “resolved”). HTTP URIs –unlike LSIDs and DIOs, they work in any web browser –resolved using existing Internet infrastructure –consensus GUID of Linked Data (Semantic Web) community –
3. Persistent URIs always work URIs “break”: when filenames change: Javascript based URI: Independent of method: Both URIs eventually lead to the same page, but the second URI is simpler and won’t change. URIs “break”: when domain names disappear bioblitznashville.org vs. vanderbilt.edu Planning for URI permanence is important.
How long is “persistent”? Forever is a pretty long time. The Internet is only 40 years old and the Web only 20. Plan for your institution and domain name to last at least 10 years. Don’t change the URI of anything that you are trying to identify!
Building block #2: Standardized property definitions Recent consensus on metadata terms: Dublin Core Metadata Initiative (DCMI) = describes generic resources Friend-Of-A-Friend (FOAF) = describes people and their affiliations Darwin Core (DwC) = describes biodiversity resources Media Resources Task Group (MRTG) = describes media (e.g. images) in a biodiversity context
A property described by a metadata term: is an HTTP URI, e.g. has a definition that can be accessed via the Internet has an abbreviated form that usually makes sense to humans dwc: = so the abbreviated URI for the term is dwc:establishmentMeans
subject resource (tree) “native” object resource (image) establishmentMeans depiction native dwc:establishmentMeans foaf:depiction Resource Description Framework (RDF) graph Building block #3: Communicating relationships
native dwc:establishmentMeans foaf:depiction Resource Description Framework (RDF) graph RDF in XML format (a tiny snippet) native How do you translate relationships into language a computer can understand?
III. Why use a new way to describe metadata? People are good at figuring out what web pages mean. Computers (like a GoogleBot) have to guess what the information on a web page means. The Semantic Web (a.k.a. Web 2.0) provides a means to provide information to computers explicitly.
Content Negotiation, part 1 “I am a human. Send me web server GET MIME type: text/html I cannot send this guy a tree! Web page
Content Negotiation, part 2 web server GET MIME type: application/rdf+xml ! XML file “I am a computer. Send me
What’s so great about this? A computer can crawl the Web and discover metadata about resources that are identified by HTTP URI GUIDs. RDF metadata from many sources can be assembled into a database (RDF “triple store”). The database can be searched or used to generate web content. Source data does not need to be “sent” to the database; any “semantic web client” can retrieve it at will. The format is standard, no special communication protocols are required.
Why would this benefit me now? RDF/XML metadata files for numerous resources can be transformed directly into web pages using a single program file. single web page using XSLT and/or AJAX
Benefits (cont.) Branding in the URI.
Benefits (cont.) HTTP URI GUIDs provide direct access to metadata about a resource to anyone with Internet access. –Clickable attribution link in website –Reference link in publication PDF –Physical QR codes for Smart Phone access
QR code on a museum display