Experts Workshop on the IPT, v. 2, Copenhagen, Denmark The Pathway to the Integrated Publishing Toolkit version 2 Tim Robertson Systems Architect Global Biodiversity Information Facility (GBIF) 20 June 2011
Agenda ‣ Why an IPT? ‣ The project history ‣ IPT version 1.0 ‣ The rationale for version 2.0 ‣ Key functionality of the IPT v2.0
Who has used an IPT?
Who has installed an IPT?
The IPT Vision ‣ A single platform allowing the sharing of ‣ Primary biodiversity data ‣ Species name information ‣ Dataset descriptions (metadata)
The IPT Vision ‣ The ability to register with GBIF ‣ Technical contact information ‣ E.g. Internet URLs ‣ Physical contact information ‣ E.g. telephone details ‣ Institutional affiliations ‣ Accurate attribution
The IPT Vision ‣ Connect databases ‣ Upload text files ‣ Lower the technical threshold for participation
The IPT Vision ‣ Flexibility to accommodate data extensions ‣ Support efficient and simple transfer of content ‣ An open source project
Why an IPT? ‣ Biodiversity provider tools existed ‣ DiGIR ‣ PHP implementation ‣ BioCASe ‣ Python implementation ‣ TAPIR ‣ PHP /.NET implementation
Why an IPT? ‣ Limitations in existing tools ‣ Checklist content lacking ‣ No formally recognized metadata standards ‣ No automatic registration with GBIF ‣ Schemas either simple or very complex ‣ Data transfer sub-optimal (e.g. speed) ‣ No ability to upload data
Why an IPT?
Who has used the IPT v1.0?
Who had trouble using the IPT v1.0?
IPT v1.0 ‣ First released 2009 ‣ Java based web application
IPT v1.0: Feature rich ‣ Administration ‣ Users, organisations, extensions, vocabularies ‣ Datasets ‣ Text files, connect a database ‣ Discovery of content ‣ Graphs, metrics, maps, search, browse ‣ Interfaces ‣ DwC Archive, TAPIR, OGC WMS
Consequences of features ‣ Required an embedded database ‣ Limited performance ‣ Required a mapping server ‣ Significant resources (memory)
Community Feedback ‣ Server requirements too high for many ‣ Performance unsatisfactory ‣ Dataset size limitations a barrier ‣ Stability unacceptable ‣ Data loss in 2 instances ‣ Complexity too high for some
The concept was sound! …rationale for
Who has used the IPT v2.0?
Who has installed the IPT v2.0?
v2.0: Key functionality ‣ User management ‣ Extension management ‣ Institution management ‣ Configuring datasets ‣ Managing dataset state ‣ Interfaces
User management ‣ Administrator ‣ Manager (different trust levels) ‣ With registration permissions ‣ Without registration permissions ‣ General user
Extension management ‣ By communicating with the GBIF registry, automatically discover ‣ Data extensions ‣ Vocabularies
Institution management ‣ No ability to create institutions ‣ By communicating with the GBIF registry, select ‣ Institution hosting the IPT ‣ Institutions that will share datasets in the IPT
Configure Datasets ‣ Author metadata ‣ GBIF Metadata profile ‣ Upload text files ‣ CSV, tab delimited etc. ‣ Connect a database ‣ MySQL, Oracle, SQL Server, PostgreSQL etc.
Configure Datasets ‣ Map content to extensions ‣ Manage user permissions ‣ Shared dataset management
Configure Datasets ‣ Manage dataset state ‣ Private: only to the managers ‣ Public: anybody ‣ Registered: On the GBIF network
Interfaces ‣ Darwin Core Archive ‣ Ecological Metadata Language ‣ Now as a manuscript also in
‣ Reduced functionality ‣ TAPIR ‣ Geoserver ‣ Visualisations ‣ Search and browse
‣ Reduced server requirements ‣ Memory 1-2GB (v1.0) now 256MB (v2.0)
‣ Increased performance ‣ 24m records ‣ 50 minutes ‣ MySQL ‣ 256MB memory
‣ No internal database ‣ Increase robustness with simple files