Biodiversity Informatics Biodiversity informatics and the manipulation of biological information Jim Croft
Outline ‘Biodiversity Informatics’ Australia’s Virtual Herbarium as a model of use and management of biodiversity knowledge New ways of managing biological knowledge Information management issues Current trends and future directions in biodiversity knowledge management
Biodiversity Informatics Management of our knowledge of biodiversity using modern techniques of data and information management
Taxonomy of Database Interoperability Multi-database systems Non-federated Federated Loosely coupled Tightly coupled Multiple schemasUnified schema Sheth & Larson (1990) [ Autonomous ]
Tightly Coupled Central administration Semantic consistency – Schemas –Authority files Common technology Difficult to implement Proprietary solutions tolerated Expensive
Loosely Coupled Closer to Reality Independent management Suited to scientific systems Common publication syntax –Export schema Less functionality … Doable Need open standards
Intermediate Coupling Scientific Independence Common syntax & semantics for the exchange of information. –Import/export –HISPID, Darwin Core, TDWG/CODATA abcd Leverage Existing Open Standards –Participation in wider, more loosely coupled federations –Simplicity –Distribution of effort
Data Refinement data information knowledge action Increasing refinement & utility of data the real world observations Envir. decision making conservation restoration biology resource mgmt utilization Policy & strategy government corporate individual
Herbarium Specimens
Specimen Data Capture
– Scientific name – Collection date – Collector name & number – Location – Soils – Habitat (incl. topography) – Vegetation community – Associated species Specimen Data The core information is from herbarium specimens Beyond taxonomy & names Collections data:
A Herbarium Database Structure
What do we want to know? What species does a plant belong to? What is its name? What other species is it related to? What does it look like? Where does it grow? Where might it grow? What other species grow with it? What species grow in a defined area? How did they get there?
What is a Virtual Herbarium? An on-line digital representation of a scientific collection of preserved plant specimens and botanical information
What is the AVH? Spread across Australian herbaria Data distributed; resides with custodians Each herbarium has a portal to receive requests and to deliver data A common single query AVH interface in each herbarium polls all herbaria Major Australian Herbaria
AVH Partners State Herbarium of South Australia Queensland Herbarium Australian National Herbarium Northern Territory Herbarium Tasmanian Herbarium Industry Partner: KE Software National Herbarium of Victoria National Herbarium of New South Wales Western Australian Herbarium Australian Biological Resources Study
Why is there an AVH? Pressure on Herbaria to work more efficiently Demand for access to larger amounts of data Demand to access data more quickly Demand to view data in different ways Pressure on herbaria to appear and to be more responsive to community needs
> 18,000 species of higher plants > 64,000 available names Extensive synonymy (4 names per plant) 8 major government-funded herbaria Similar number of university herbaria > 6,500,000 specimens in Aust. herbaria data elements per specimen Several Kb per specimen (excl. images) What is the AVH task?
Herbarium database status
$10M over 5 years to database all major Australian herbarium collections $10 million:- $ 4 million Commonwealth - $ 4 million State/Territory - $ 2 million private Initial focus on capture of herbarium specimen data Ultimate aim a complete flora information system The AVH Agreement
Australia’s Virtual Herbarium On-line access to herbarium specimen information and botanical knowledge
Australian Plant Name Index (APNI)
Acacia salicina
Incurved Recurved Research Potential: Plant distribution analysis ? Incurved Recurved Pultenaea distribution classes in eastern Australia ?
On-line systems Often regionally based Integrating: –Plant names and synonyms –Descriptive Flora treatments –Illustrations –Distributions –etc. Flora Information Systems
Botanical illustrations
Search all records on-line Digital images available (‘best of class’) 35,000 images of Australian plants and vegetation National Plant Photograph Index
High resolution image of type specimen of Austrobaileya downloaded over the Internet from the Herbarium of the New York Botanical Garden Type Images on demand
Flora & Revision Databases New ways of managing and delivering botanical information
A Flora in XML Example in HTML Platyzoma microphyllum R.Br., Prodr. 160 (1810) Gleichenia platyzoma F.Muell., Veg. Chatham.-Isl. 63 (1864). T: Facing Island, Qld, R.Brown Iter Austral. 102 ; lecto: BM. Illus.: S.B.Andrews… Rhizome short-creeping… Sporangia in zones in distal half of frond. Fig. 55 Widespread across northern Australia… Grows in sandy or swampy soils.... Map 135. W.A.: 14.4 km NW of Mt… Example in XML Platyzoma microphyllum R.Br, Prodr Gleichenia platyzoma F.Muell. Veg. Chatham.-Isl T: Facing Island, Qld, … Illus.: S.B.Andrews… Rhizome short-creeping… Sporangia in zones in distal half of frond. Fig. 55 Widespread across northern Australia… Grows in sandy or swampy soils... Map 135. W.A.: 14.4 km NW of Mt…
A Flora XML Schema fragment
A Flora database structure
A Flora database report
W-P file EditorsW-P file Botanist Publisher C-R Copy Book, etc. An old process of publication
W-P file EditorsW-P file Botanist Publisher C-R Copy Book, etc. An new process of publication XML file DatabaseXML fileOutputs
Editors Botanist Publisher C-R Copy Book, etc. A future process of publication XML file DatabaseOutputs Database Outputs
Interactive Identification Using computers to identify and name plant species and display information about them
Interactive Plant Identification
Current trends, future directions ?
Trends in Biodiverssity Information Management Nomenclatural Regional Text-based Taxon-based Individual effort Single user Standalone Centralized Proprietary System Idiosyncratic Design Nonstandard data content Conventional Developmental Access charges Taxonomic Global Image-based Spatially-based Partnerships Multiuser Networked Distributed Open System Standard Architecture Standard data content Innovative Stable Freely available
Global Organization Several parallel and complementary initiatives: –Global Biodiversity Information Facility (GIF) –Taxonomic Databases Working Group (TDWG) –Global Taxonomic Initiative (GTI) –International Organization for Plant Information (IOPI) –Species 2000 –All Species Foundation (ALL)
Data Flow within GBIF Network Service Metadata Collection NodeCollection Nodes GBIF Portal Participant Node Service Metadata Participant Node Service Metadata Specimen Index Data Detailed Specimen Data Aggregated Data Detailed Specimen Data Aggregated Data User Browser HTML Data
Requirements for Interoperability Standards…
URL UML abcd URI XHTML HTTP UDDI XSLT XPATH RDF PNG SVG DOM CSS SAX HISPID ITF BNF Z39.50 WAIS ASN.1 XML schema Standards for Interoperability of Biodiversity Databases Dublin Core RDFS Z39.19 SOAP cgi RMI DARWIN CORE WSDL