Download presentation
Presentation is loading. Please wait.
Published byRodger George Modified over 9 years ago
1
Online tools and standards for Biodiversity data in the Semantic Web Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences The Natural History Museum London
2
http://… What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere
3
http://… link, What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere
4
http://… What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere
5
http://… is a author of person Fred book What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere
6
The Semantic web: “The future of the web …and always will be” – Peter Norvig (Google) What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere
7
Biodiversity informatics The study of the transformation and communication of information in Life and Earth sciences provides the means (generating and enhancing the necessary infrastructure)
8
Research vs Infrastructure Slide adapted from Patterson D. 2013, Tempe, Arizona
9
vs Infrastructure Discovery Ephemeral Individualistic Massive redundancy Optional Risk taking Slide adapted from Patterson D. 2013, Tempe, Arizona Research
10
vs Infrastructure Discovery Ephemeral Individualistic Massive redundancy Optional Risk taking Implementation Communal / agreed Essential Persistent Robust & reliable Adaptable Slide adapted from Patterson D. 2013, Tempe, Arizona Research
11
What are the current challenges in Biodiversity informatics?
12
Publications based on countless specimens, images, maps, keys and datasets Current taxonomic data production Typically generated by small communities for “local” research projects Figure from Costello M.J et al, 2013 doi: 10.1126/science.1230318
13
15-20k new spp. described annually (2M total) 1 30k nomenclatural acts (12M total) 1 20k phylogenies (750k total) 2 31k taxa sequenced (360k taxa total) 3 800k BioMed papers (40M total pp. of taxonomy) 4 Countless specimens, images, maps, keys and datasets Our current taxonomic data production Figures from 1) Zhang, Zootaxa 2011 4, 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed. 1.8 M described spp. (17M names) 300M pages (over last 250 years) 1.5-3B specimens
14
Estimates of 7.5 million species still undescribed 1 1 How Many Species Are There on Earth and in the Ocean? Mora C et al. doi:10.1371/journal.pbio.1001127 Now imagine that…
15
Biodiversity informatics landscape Key problems Landscape is complex, fragmented & hard to navigate Many audiences (policy makers, scientists, amateurs, citizen scientists) Many scales (global solutions to local problems) Figure adapted from Peterson et al, Syst. & Biodiv. 2010 doi: 10.1080/14772001003739369
16
Science is carried out “locally” By local scientists Being part of local infrastructures Having local funders Science is global It needs global standards Global workflows Cooperation of global players BUT
17
Expected volume of taxonomic and biodiversity data Need of extracting, aggregating and linking data on a global level
18
Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE doi:10.1016/j.tree.2011.11.001 This requires data, information & knowledge to be… Digital Not printed paper Openly accessible Not behind barriers (e.g. paywalls) Linked-up Not in silos “ Link together evolutionary data … by developing analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses” To achieve this…
19
Hour-glass motif for big data infrastructure Data re-use Data generation Data pool Slide adapted from Patterson D. 2013, Tempe, Arizona
20
Big data world with re-use data AggregationVisualizationAnalysisManipulation ModelsObservationsExperimentsProcessed Data re-use Data generation Data pool
21
AggregationVisualizationAnalysisManipulation ModelsObservationsExperimentsProcessed Data re-use Data generation Data pool Big data world with re-use data
22
Nodes interconnected Slide adapted from Patterson D. 2013, Tempe, Arizona
23
But how many biodiversity informatics projects are out there?
24
At least 679 ! But how many biodiversity informatics projects are out there? Sources: EDIT, TDWG & ViBRANT 2013 Categories: Data Aggregator - a web site that collates data from a variety of sources (digital and hardcopy) and presents it in one form Data Indexer - a web site that provides lists or indexes of other sites that provide data Data Provider - a web site that provides data directly from research or other studies Data Standards - a web site that contributes to formulating or developing standards for data Facilitator - a web site that facilitates the provision of data by other projects or web sites
25
GBIF: Our global leader in occurrence data Aggregators
26
http://www.eu-nomen.eu/portal/ EU-NOMEN - PESI Aggregators
27
Making taxonomy digital, open & linked Aggregators
28
Scratchpads are an integrated system to Enter, Curate, Mark-up, Link and Publish data taxonomic workflow in a single virtual environment
29
A Scratchpad is a website that holds data for you and your community The Scratchpads concept Your data External data & services
30
65,000 unique visitors/month Per month unique visitors to Scratchpads sites 580 Scratchpads Communities by 8,185 active registered users covering 55,607 taxa in 653,274 pages. In total more than 1,300,000 visitors
31
Researchers can assemble, test, and analyse their data records in BOLD before uploading them to: International Nucleotide Sequence Database Collaboration (DDBJ, ENA, GenBank) BOLD Barcode of Life Data Systems Facilitators
32
Biodiversity literature openly available to the world as part of a global biodiversity community Biodiversity Heritage Library BHL http://www.biodiversitylibrary.org/ > 40 M pages of legacy literature Providers
33
Standard Exchange formats
34
http://rs.tdwg.org/dwc/index.htm Darwin Core (DwC) Primarily used as a specimen records metadata standard Standard Exchange formats
35
Access to Biological Collection Data (ABCD) http://www.tdwg.org/standards/115/ highly detailed and aims to provide a complete set of data elements for natural history collection items Standard Exchange formats
36
Audubon Core Multimedia Resources Metadata Schema http://www.tdwg.org/standards/638/ The Audubon Core metadata schema ("AC") is a representation-neutral metadata vocabulary for describing biodiversity-related multimedia resources and collections. Standard Exchange formats
37
http://tdwg.napier.ac.uk/index.php?pagename=HomePage Taxonomic Concept Transfer Schema (TCS) Mechanism to exchange data concerning the names of organisms Standard Exchange formats
38
Standards facilitate systems interoperability
39
UPIDs to identify content Identifiers A key to find something in a database. We need Unique Identifiers
40
10.4289/0013-8797.115.1.75 We need Unique Identifiers
41
http://hdl.handle.net/10.4289/0013-8797.115.1.75 http://dx.doi.org/10.4289/0013-8797.115.1.75 http://www.google.co.uk/search?q=10.4289/0013-8797.115.1.75 http://zoobank.org/10.4289/0013-8797.115.1.75 We need Unique Identifiers
42
Can a taxonomic name be used as a UPID? Is it Unique? Is it Persistent? Is it an Identifier? Are taxonomic names enough for communication between Scientists? YES Are taxonomic names enough for communication between machines? CAN BE IF We need Unique Identifiers
43
For example: Page R., Brief Bioinform (2008) 9 (5): 345-354. doi: 10.1093/bib/bbn022 We need Unique Identifiers
44
ONLY IF Name reconciliation Patterson, D. J. et al. 2010. Names are key to the big new biology. TREE 25: 686-691 doi: 10.1016/j.tree.2010.09.004 We need Unique Identifiers
45
The need for Controlled Vocabularies and Ontologies Knowledge Organisation Systems Google has done it: http://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html Ontologies Plant anatomical and structural development Ontology http://www.plantontology.org/
46
Deans A. et al. Time to change how we describe biodiversity, Trends in Ecology & Evolution 2012 doi:10.1016/j.tree.2011.11.007 Example of ontology usage
47
Examples of integrated projects http://protectedplanet.net http://thymus.myspecies.info
48
How are all this relevant to my work ? What should I take home ?
49
Repositories #bigdata Repositories #bigdata Providers Data silos Community
50
The four nodes of data workflow 1. We collect and generate data 2. We curate, link and structure data 3. We analyse data 4. We publish data
51
Data curation Data curation Data analysis Data analysis Data publishing Data publishing The four nodes of data workflow Data collection & generation Data collection & generation What are the bottlenecks in the workflow ?
52
Data curation Data curation Data analysis Data analysis Data publishing Data publishing What we need is… Data collection & generation Data collection & generation a seamless workflow
53
Old Joke: A drunk is crawling around a lamp post on his hands and knees. A cop comes along … Cop: What are you doing? Drunk: Looking for my car keys. Cop: Are you sure you dropped them here? Drunk: No, I dropped them in the alley. Cop: So why are you looking here? Drunk: Because the light’s better. Old Joke
54
Science is a ‘light’s better’ endeavor in that research effort is not directed at areas where the work is technically infeasible. Research is directed where real, interpretable results may be obtained. We do, in fact, conduct research where the light’s better. But, when the light changes, so does science. With better illumination, we look in new areas. We find new things… Old Joke
55
Addressing the challenges of biodiversity informatics “…the field [of biodiversity informatics] appears to be growing in a void of overarching, motivating questions, effectively making it a set of technologies in search of questions to address.” Peterson et al, Syst. & Biodiv. 2010 doi: 10.1080/14772001003739369
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.