GACS: Towards a common concept scheme for information in agriculture International Conference on Big Data and Knowledge Discovery Bangalore, March 9-11, 2016 Food and Agriculture Organization of the UN
GACS project: participants Working group: – FAO (AGROVOC) – CABI (CABT) – NAL (NALT) Steering committee – Those above + – INRA – CGIAR consortium – AgroKnow Caterina Caracciolo211/03/2016
GACS = Global Agriculture Concept Scheme To make a common repository of terminological and conceptual information in agriculture – To enable.. Of information To achieve efficiency of scale by maintaining core concepts in cooperation A voluntary project – So far, only funded by in-kind contribution of the partners Caterina Caracciolo311/03/2016
Why bother? #1 To have smoother access to data Thesauri/vocabularies are used in applications, as part of document metadata Caterina Caracciolo411/03/2016
Why bother? #2 To “inherit” information from the others Caterina Caracciolo5 FAO: Biotechnology Glossary Geopolitical Ontology AGROVOC 11/03/2016 Example: outbound links of AGROVOC SKOS mappings
A vision for the future 11/03/20166Caterina Caracciolo
Requirements 1.Ensure compatibility with existing databases 2.Maximize reuse of work, e.g., available translations 3.Use RDF-based technologies (e.g., URIs, SKOS) for integration with other resources on the web 4.Available as Linked Open Data
Three thesauri Caterina Caracciolo811/03/2016
First estimate of overlap* Caterina Caracciolo9 * Obtained by automatic mapping using AgreementMakerLight 11/03/2016
The process 1.Selected a core from the three thesauri 2.Mapped concepts to one another 3.Clean up results
From each: the 10,000 concepts most frequently used in their respective databases. Step 1: concept selection Plus: all countries AND all higher-level organisms
Automatic mapping Tool: AgreementMakerLight applied to the full thesauri, for completeness Manual validation Tool: spreadsheets Evaluated 60 to 150 rows/hour Evaluation took 500 to 600 hours for GACS Beta. Step 2: Concepts mapped
Step 3: clean up results The goal is: To obtain a “core” of concepts usable in real life to replace (part of) AGROVOC and NALT To have a “core” to which other resources may hook up – e.g., thesauri, code lists
Something to consider...
Different in concept granularity AGROVOC: [animal oil], [animal fats] NALT: [animal fats and oils]
Different modelling of scientific names and taxonomies ! Very many scientific and common names of organisms in each thesaurus ! Common names and scientific names - are they different things? Or different names for the same things? – Salmo salar, Salmo carpio, Salmo ferox – salmon
The same thing?
Scientific names and taxonomies AGROVOC – has separate hierarchies (taxonomic / common sense), but often not very up-to-date CABI – has mostly scientific names NAL – has both scientific and common names, al together. More up-to-date
Lumps clusters of concepts originated from the mapping
Example: lumps in March 2015 Lumps 11/03/201620Caterina Caracciolo NOW: all solved 15,090 concepts 972 lumps
Only hierarchies? – No, add Thematic groups Plus Product 11/03/201621Caterina Caracciolo
Currently: Beta ~ 15,000 concepts, ~ 400,000 terms, 29 lang. All RDF-SKOS Editing and viz tool from partners: – VocBench (U Tor Vergata, Rome IT & FAO) – SKOSMOS (Finnish National Library) Caterina Caracciolo2211/03/2016
Beta 1.6 Caterina Caracciolo /03/2016
AGROVOC and NALT may be phased out Extension module(S)? GACS CABT GACS
Ongoing Quality improvement – Hierarchyies, labels, scope notes, definitions Consolidate decisions – E.g, on taxonomies and common names Write guidelines 1.To document decisions taken 2.To guide future editors of GACS Some new issues coming up – copyright on definitions? Caterina Caracciolo2511/03/2016
Next 1.Release GACS data officially 2.Define AGROVOC and CABI as “extension” Caterina Caracciolo2611/03/2016
References A web site will be set up soon Now, all reports are available from AIMS To follow progress, subscribe to the AIMS community!
Forming GACS concepts by merging the source concepts and aggregating their information rice UF paddy UF paddy rice cereals UF feed cereals UF small grain cereals (grain) Oryza sativa UF Oryza glutinosa UF Oryza indica UF Oryza japonica UF Oryza sativa … (subsp, var etc.) Oryza UF Padia UF rice (plant) agrovoc:c_5435 cabt:82917 nalt:56271 exactMatch agrovoc:c_5438 cabt:82935 nalt:56277 exactMatch agrovoc:c_1474 cabt:26247 exactMatch agrovoc:c_6599 cabt: nalt:56293 exactMatch (Note: GACS uses SKOS, not traditional thesaurus tags) GACS concept Mapped to the Thesauri of origin