Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Content aggregation and information re-use.

Similar presentations


Presentation on theme: "Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Content aggregation and information re-use."— Presentation transcript:

1 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Content aggregation and information re-use Ana Macario, Bastian Onken and Hans Pfeiffenberger Alfred Wegener Institute for Polar and Marine Research

2 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use About us n Helmholtz Association – big infrastructure labs n AWI - RV “Polarstern” (100 M€) and stations n ~ 400 scientists n ~ 50 TB of ship- and station-generated datasets - among those up to 100 years old time series n Computer centre in charge of supplying n IT-part of productive working environment n preservation of valuable or at least costly datasets - since finished Ph.D.’s don’t care (almost) => mostly in that order of precedence n We try to take the middle ground – at the institute as well as here Why Plankton ?? 45 Gt/a primary production ~ 50% of living matter

3 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Road map 1. EU-project: Plankton*Net 2. Introduction to taxonomy 3. Rich content 4. Towards NOA for Plankton*Net

4 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Background -> 2 year EU project (acronym “Plankton-Net”) with 6 partners: AWI; Marine Biology Lab, Woods Hole; Station Biologique, Roscoff; Universidade de Lisboa; IPIMAR, Lisbon; Natural History Museum, London -> Original scope: to create a network of interoperable repositories on plankton taxonomy -> Motivation: to give taxonomists support in the hard task of identifying species and to rescue historically relevant collections -> Scope keeps growing… information system which aggregates taxonomic content, descriptions, assets (images, documents), environmental and molecular data, annotations, etc – and supplies an interactive environment for contributing Early 2004, AWI started a small project with MBL to archive images and taxonomic keys/descriptions for phytoplankton found in the North Sea …

5 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Road map 1. EU-project: Plankton*Net 2. Introduction to taxonomy 3. Rich content 4. Towards NOA for Plankton*Net

6 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Taxonomy and its challenges Information about organisms is often linked to a name. This can create problems in information retrieval… n one taxon can have many names n the same name can refer to many taxa

7 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Taxonomic Name Server The uBio Taxonomic Name Server (MBL-WHOI Library, Woods Hole, USA), implemented as a web service, acts as a name thesaurus. Two services are offered: NameBank is a repository of millions of recorded biological names and facts that link those names together ClassificationBank stores multiple classifications and taxonomic concepts that are the result of expert opinions. It extends the functionality of NameBank.

8 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use nameBank Alternative names Vernacular names More or less specific Scientific names evolve over time as specimen‘s names are updated over the years. When dealing with vernacular (common) name, the problem is even more difficult given the fact that it may appear in several languages What‘s in a name?

9 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use What‘s in a classification? ClassificationBank is a taxon concept server

10 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Road map 1. EU-project: Plankton*Net 2. Introduction to taxonomy 3. Rich content 4. Towards NOA for Plankton*Net

11 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use What is the content of PlanktonNet? Data and meta-data associated with organisms (taxa) : „by value“ n descriptive metadata (Darwin Core schema) n Images, SEM photos, schematic drawings, etc n Annotations „by reference“: linkout, include via Web-Service n Taxonomic keys, synonyms and classification n Bibliographical references n Geo-referenced environmental data n Molecular data

12 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use http://planktonnet.awi.de from BioPedia, re-use via WS quality linkouts to PANGAEA, WDC-Mare

13 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use

14 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use

15 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use

16 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use

17 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use n This is the working, local “prototype” (not a vision !!) n It has been fitted with an OAI-PMH module, to enable it as a data-provider

18 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use 1. EU-project: Plankton*Net 2. Introduction to taxonomy 3. Rich content 4. Towards Network Overlays for Plankton*Net Road map

19 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Rich Content for and from Plankton*Net WDC/Pangaea: Water temperature and salinity, nutrients, lipid biomarkers stratigraphy Plankton*net@AWI Digitalization of biodiversity-related literature Molecular data Description Plankton*net@Roscoff …….. Taxonomic naming&classification SP OAI-PMH (+) DP

20 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Reality check n Highly heterogeneous information systems n Metadata harvesting is problematic; lacking OAI-PMH compliance n Providing web services is not standard n Schema use is not standard; crosswalks problematic n „Why RDF-Ontology (and such things) when one can do tagging (and annotation) with Flickr?“

21 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use

22 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use

23 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Short-term goals n Create a central „catalog“ with Dublin Core metadata as minimum and Darwin Core as an extended metadata format for „Plankton*Net“ n Harvest all Plankton*Net data providers (with respective „set“ information) using OAI-PMH n Long-term archival of all harvested records in a repository n Create a portal for accessing the locally harvested items as well as remote ones

24 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Short-term goals (cont.) Limitations: n Only metadata is harvested n Relationships limited to collection item n Restricted only to publicly available items n No support for collaborative work (e.g.,resource annotation/revision)

25 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Long-term goals n Harvesting of metadata AND data (images, documents, etc) associated with a given resource; n relevant for preservation / mirroring purposes n „Branding“ as a result of targeted quality control of metadata from field experts; n workflow needed n Versioning and traceability n Access control policies at item level

26 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Long-term goals (cont.) n Expression of rich relationship n beyond simple collection items (e.g., structural, equivalence and annotation type of relationships) n „Combine“ and „disseminate“ harvested content with other, re-used content in flexible ways -> foundation for a rich service offering => Networked Overlay Architecture (NOA) with FEDORA

27 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Conclusions n Ontologists can learn from hundreds of years of taxonomy n Though an old field, information is a moving target (preservation vs. improvement ?) n Where is the (inter-)“action” happening ? n What (where and when) do we “preserve” ? n We believe that the visions and concepts of Fedora and NOA are appropriate to the problem n The scope of the problem and user ambitions have to be contained and satisfied in stages

28 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Thank You ! Questions ??

29 Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use n Traditional field – dates back 4th century BC n Specimen identification is not straight forward; world-wide experts on a class or genus level n Information quality relevant in several cases (e.g., harmful algae blooms and associated health consequences) n Revision/annotation as unstructured metadata about a resource n Information on both metadata provenance and annotation provenance are relevant for branding n Type of desired queries: n Find resources contributed by... n Find resources revised / annotated by..., etc „Branding“ and taxonomy => Versioning, traceability


Download ppt "Ana Macario, Bastian Onken and Hans Pfeiffenberger Plankton*Net: Content aggregation and information re-use Content aggregation and information re-use."

Similar presentations


Ads by Google