Download presentation
Presentation is loading. Please wait.
Published byDoris Glenn Modified over 8 years ago
1
GACS: Towards a common concept scheme for information in agriculture International Conference on Big Data and Knowledge Discovery Bangalore, March 9-11, 2016 Caterina.Caracciolo@fao.org Food and Agriculture Organization of the UN
2
GACS project: participants Working group: – FAO (AGROVOC) – CABI (CABT) – NAL (NALT) Steering committee – Those above + – INRA – CGIAR consortium – AgroKnow Caterina Caracciolo211/03/2016
3
GACS = Global Agriculture Concept Scheme To make a common repository of terminological and conceptual information in agriculture – To enable.. Of information To achieve efficiency of scale by maintaining core concepts in cooperation A voluntary project – So far, only funded by in-kind contribution of the partners Caterina Caracciolo311/03/2016
4
Why bother? #1 To have smoother access to data Thesauri/vocabularies are used in applications, as part of document metadata Caterina Caracciolo411/03/2016
5
Why bother? #2 To “inherit” information from the others Caterina Caracciolo5 FAO: Biotechnology Glossary Geopolitical Ontology AGROVOC 11/03/2016 Example: outbound links of AGROVOC SKOS mappings
6
A vision for the future http://aims.fao.org/sites/default/files/Report_workshop_Agrisemantics.pdf 11/03/20166Caterina Caracciolo
7
Requirements 1.Ensure compatibility with existing databases 2.Maximize reuse of work, e.g., available translations 3.Use RDF-based technologies (e.g., URIs, SKOS) for integration with other resources on the web 4.Available as Linked Open Data
8
Three thesauri Caterina Caracciolo811/03/2016
9
First estimate of overlap* Caterina Caracciolo9 * Obtained by automatic mapping using AgreementMakerLight 11/03/2016
10
The process 1.Selected a core from the three thesauri 2.Mapped concepts to one another 3.Clean up results
11
From each: the 10,000 concepts most frequently used in their respective databases. Step 1: concept selection Plus: all countries AND all higher-level organisms
12
Automatic mapping Tool: AgreementMakerLight applied to the full thesauri, for completeness Manual validation Tool: spreadsheets Evaluated 60 to 150 rows/hour Evaluation took 500 to 600 hours for GACS Beta. Step 2: Concepts mapped
13
Step 3: clean up results The goal is: To obtain a “core” of concepts usable in real life to replace (part of) AGROVOC and NALT To have a “core” to which other resources may hook up – e.g., thesauri, code lists
14
Something to consider...
15
Different in concept granularity AGROVOC: [animal oil], [animal fats] NALT: [animal fats and oils]
16
Different modelling of scientific names and taxonomies ! Very many scientific and common names of organisms in each thesaurus ! Common names and scientific names - are they different things? Or different names for the same things? – Salmo salar, Salmo carpio, Salmo ferox – salmon
17
The same thing?
18
Scientific names and taxonomies AGROVOC – has separate hierarchies (taxonomic / common sense), but often not very up-to-date CABI – has mostly scientific names NAL – has both scientific and common names, al together. More up-to-date
19
Lumps clusters of concepts originated from the mapping
20
Example: lumps in March 2015 Lumps 11/03/201620Caterina Caracciolo NOW: all solved 15,090 concepts 972 lumps
21
Only hierarchies? – No, add Thematic groups Plus Product 11/03/201621Caterina Caracciolo
22
Currently: Beta 1.6 http://tester-os-kktest.lib.helsinki.fi/gacsdemo/gacs/en/ ~ 15,000 concepts, ~ 400,000 terms, 29 lang. All RDF-SKOS Editing and viz tool from partners: – VocBench (U Tor Vergata, Rome IT & FAO) – SKOSMOS (Finnish National Library) Caterina Caracciolo2211/03/2016
23
Beta 1.6 Caterina Caracciolo23 http://tester-os-kktest.lib.helsinki.fi/gacsdemo/gacs/en/ 11/03/2016
24
AGROVOC and NALT may be phased out Extension module(S)? GACS CABT GACS
25
Ongoing Quality improvement – Hierarchyies, labels, scope notes, definitions Consolidate decisions – E.g, on taxonomies and common names Write guidelines 1.To document decisions taken 2.To guide future editors of GACS Some new issues coming up – copyright on definitions? Caterina Caracciolo2511/03/2016
26
Next 1.Release GACS data officially 2.Define AGROVOC and CABI as “extension” Caterina Caracciolo2611/03/2016
27
References A web site will be set up soon Now, all reports are available from AIMS website @FAO http://aims.fao.org To follow progress, subscribe to the AIMS community!
28
Forming GACS concepts by merging the source concepts and aggregating their information rice UF paddy UF paddy rice cereals UF feed cereals UF small grain cereals (grain) Oryza sativa UF Oryza glutinosa UF Oryza indica UF Oryza japonica UF Oryza sativa … (subsp, var etc.) Oryza UF Padia UF rice (plant) agrovoc:c_5435 cabt:82917 nalt:56271 exactMatch agrovoc:c_5438 cabt:82935 nalt:56277 exactMatch agrovoc:c_1474 cabt:26247 exactMatch agrovoc:c_6599 cabt:101613 nalt:56293 exactMatch (Note: GACS uses SKOS, not traditional thesaurus tags) GACS concept Mapped to the Thesauri of origin
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.