Steven Perry Dave Vieglais
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for building scientific data networks based on RDF, OWL, and open data access protocols.
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Objective Build a data access network that… – Can handle many types objects – Is resilient to changes in data models – Refers to objects with GUIDs – Allows fast & efficient searches – Allows incremental harvesting – Simplifies creation of client software
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics RDF and OWL RDF described by OWL allows… – Machine readable controlled vocabularies – Distinction between classes and properties – Data objects as resources identified with globally unique LSIDs – Query languages to examine patterns of relationships between objects
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Framework Components Provides access to RDF data sets through multiple protocols
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Framework Components Provides access to RDF data sets through multiple protocols Libraries for building client applications
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Framework Components Provides access to RDF data sets through multiple protocols Libraries for building client applications Web-based client for accessing data on a wasabi network
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics A Simple Network
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Wasabi Server Server – Stores a cached copy of source data in RDF format called a data set – Each data set is bound to one or more protocols handlers – Standard protocols include OAI, SimpleLSID, and SPARQL
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Wasabi Server
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Loading Data
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Loading Data Loading RDF Data – RDF data can be loaded from one or more files directly into Wasabi – Wasabi will not assign new LSIDs – Wasabi checks to see if any data objects are new or have changed and can scan for deleted data objects
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Loading Data
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Loading Data Loading Non-RDF Data – Wasabi uses a synchronizer program to generate RDF from SQL output or delimited files – Synch program must know about your source data format – Wasabi can assign LSIDs if needed – Wasabi checks to see if any data objects are new or have changed and can scan for deleted data objects
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics OAI-PMH
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics OAI-PMH Open Archive Initiative Protocol for Metadata Harvesting – Wasabi implementation allows efficient harvesting – Supports incremental harvesting “What objects have changed since Oct ?” – Notifies clients about deletions
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics LSID Resolution
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics LSID Resolution Life Science Identifier Metadata Resolution – Wasabi supports a simple HTTP-GET LSID metadata resolution service – Supports metadata resolution “What is the RDF metadata for urn:lsid:auth.org:ns:23?” – Compliant LSID resolution through plug-in for IBM LSID resolver.
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics LSID Resolution
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics SPARQL
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics SPARQL SPARQL Protocol – SPARQL is the W3C candidate for querying RDF – SPARQL protocol bound to HTTP-GET – ASK and SELECT queries return SPARQL XML results – DESCRIBE and CONSTRUCT queries return RDF/XML results
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics SPARQL SPARQL Query Language Example – “What is urn:lsid:auth.org:person:3424?” DESCRIBE <rdf:RDF xmlns:j.0=“ xmlns:rdf:” Steven Perry
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics SPARQL SPARQL Query Language Example – “What is the genus of the specimen urn:lsid:auth.org:spec:657?” SELECT ?genus WHERE { ?txname ?txname ?genus } ?genus = “Heteractis”
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Wasabi Server OAI, SPARQL, and LSID are standard protocols, so Wasabi services can be used by non-Wasabi clients.
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Wasabi Client Library
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Wasabi Client Library Client Library – Contains implementations of clients for protocols used by Wasabi – Can be included in projects that need to communicate with Wasabi servers – Programmatic access to services (hides XML messaging layer) – Provides status and progress listeners – Can be used to query non-Wasabi implementations of OAI or SPARQL
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Wasabi Indexer
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Wasabi Indexer Indexer – Harvests from 1 or more RDF sources – Sources can be Wasabi servers (via OAI) sets of RDF files, etc. – Multiple types of indices can be fed from a single set of descriptions – Indexers can filter by object type, etc. – Indexers should understand incremental updates and deletions
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Wasabi Indexer
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Wasabi Indexer
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Wasabi Indexer
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Wasabi Portal
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Wasabi Portal Portal – Customizable human interface that allows access to 1 or more Wasabi servers – Default portal requires a Lucene index of harvested data. Most portal queries are against the index – To retrieve and display data objects, the portal makes repeated LSID resolution calls so servers can log access
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Wasabi Portal Portal – Portal automatically configures search forms and renderers based on downloaded OWL ontologies – Provides simple search, advanced search, ontology browsing, and export of downloaded data to CSV or RDF files
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics A More Realistic Network
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Implementation – – Java 1.5 with Spring, Jena, Lucene, and more – Server requires servlet container (Tomcat, WebLogic, etc.) – Server requires JDBC database (MySQL, PostgreSQL, etc.)
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Current State – Server, Client Library and Indexer components are feature complete – Portal is still under development – Using experimental OWL data models; awaiting TDWG ontology.
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Future Plans – Complete portal – Construct the FishNet2 network (25+ servers) – Construct the PlantCollections network (15+ servers)
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Conclusion WASABI is a framework for building scientific data networks based on RDF, OWL, and open data access protocols.
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Conclusion RDF allows us to share complex data models OWL allows machines to understand the data models and provides opportunities for extending models over time Standard protocols (OAI, LSID, & SPARQL) allow for integration across data networks and with the semantic web
W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Support Development of Wasabi is supported by the National Science Foundation as part of the Integrated Community Infrastructure (ICI) project.