Division of semantic labor in the Global WordNet Grid Piek Vossen, VU University Amsterdam German Rigau, University of the Basque Country 5 th Global Wordnet.

Slides:



Advertisements
Similar presentations
Content Localization & Gilbane Conference Boston Nov 28, 2006 C. Donner Whats this? Taxonomies.
Advertisements

By David J Smith. If the worlds population was 100 people...
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 1 DB unimo Searching for data and services F. Guerra 1, A. Maurino 2, M. Palmonari.
Chapter 7 System Models.
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Building Wordnets Piek Vossen, Irion Technologies.
FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 KYOTO (ICT ) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.
Flarenet-Silt workshop on Ontology and Lexicon September-19 th -2009, Pisa Division of semantic labor over vocabulary and ontology layers Piek Vossen,
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Y. Jaques Yves Jaques ICIS Requirements Gathering, June 2008, Rome NeOn Lifecycle Support for Networked Ontologies.
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
1 ICS-FORTH & Univ. of Crete SeLene November 15, 2002 A View Definition Language for the Semantic Web Maganaraki Aimilia.
|epcc| NeSC Workshop Open Issues in Grid Scheduling Ali Anjomshoaa EPCC, University of Edinburgh Tuesday, 21 October 2003 Overview of a Grid Scheduling.
1/ 26 AGROVOC and the OWL Web Ontology Language: the Agriculture Ontology Service - Concept Server OWL model NKOS workshop Alicante,
George Anadiotis, Spyros Kotoulas and Ronny Siebes VU University Amsterdam.
Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,
Computer Literacy BASICS
BONy: a knowledge centric collaborative learning platform social.bonynetwork.eu Alfio Massimiliano Gliozzo, Concetto Elvio Bonafede and Aldo Gangemi STLAB.
Chapter 11: Models of Computation
26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
The Mobile Internet Present and Future Jon White Digital Publishing Director Macmillan Education, Oxford OReilly TOC Conference February 11 th – 13 th.
1 ISWC-2003 Sanibel Island, FL IMG, University of Manchester Jeff Z. Pan 1 and Ian Horrocks 1,2 {pan | 1 Information Management.
Adding Up In Chunks.
Lecture plan Outline of DB design process Entity-relationship model
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Boolean and Vector Space Retrieval Models
Who are the Experts?Simon KampaSlide 1 Who are the Experts? Simon Kampa IAM Group University of Southampton
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Use Case: Populating Business Objects.
Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Analyzing Genes and Genomes
Systems Analysis and Design in a Changing World, Fifth Edition
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 12 View Design and Integration.
PSSA Preparation.
Essential Cell Biology
Modeling Main issues: What do we want to build How do we write this down.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
From Model-based to Model-driven Design of User Interfaces.
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Xyleme A Dynamic Warehouse for XML Data of the Web.
歐盟科研架構計畫之人文及社會科學領域 (EU-FP7 SSH) 計畫徵求 說明會 國立中山大學 - 歐盟科研架構計畫之人文及社會科學國家聯絡據點 KYOTO (ICT ) Yielding Ontologies for Transition-Based Organization.
The Semantic Web – WEEK 5: RDF Schema + Ontologies The “Layer Cake” Model – [From Rector & Horrocks Semantic Web cuurse]
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
Coastal Atlas Interoperability - Ontologies (Advanced topics that we did not get to in detail) Luis Bermudez Stephanie Watson Marine Metadata Interoperability.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Integrating lexical units, synsets and ontology in the Cornetto Database Piek Vossen 1, 2, Isa Maks 1, Roxane Segers 1, Hennie van der Vliet 1 1: Faculty.
Very Large Cross-lingual Resources at OAEI 2008 Laura Hollink Véronique Malaisé Vrije Universiteit Amsterdam.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Knowledge Representation. Keywordsquick way for agents to locate potentially useful information Thesaurimore structured approach than keywords, arranging.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Constructing A Yami Language Lexicon Database from Yami Archiving Projects Meng-Chien Yang(Providence University, Taiwan) D. Victoria Rau(National Chung.
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
Mapping the NCI Thesaurus and the Collaborative Inter-Lingual Index Amanda Hicks University of Florida HealthInsight Workshop, Oslo, Norway.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
KYOTO (ICT ) Knowledge Yielding Ontologies for Transition-Based Organization Intelligent Content and Semantics The First KYOTO Workshop February.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
ece 627 intelligent web: ontology and beyond
ece 720 intelligent web: ontology and beyond
Extracting Semantic Concept Relations
Chaitali Gupta, Madhusudhan Govindaraju
Presentation transcript:

Division of semantic labor in the Global WordNet Grid Piek Vossen, VU University Amsterdam German Rigau, University of the Basque Country 5 th Global Wordnet Conference Mumbai, India, Jan 30 – Feb 5, 2010

Overview KYOTO as a domain implementation of the Global Wordnet Grid Scope of knowledge integration Division of linguistic labor How to integrate resources? How to make inferences?

KYOTO – some statistics European-Asian project March 2008 – March countries (The Netherlands, Italy, Germany, Spain, Taiwan, Japan, Czech Republic) 12 sites –Universities & research institutes: VUA, CNR-ILC, CNR-IIT, BBAW, EHU, AS, NICT, Masaryk –Companies: Synthema, Irion –User organizations: ECNC, WWF 7 languages (English, Italian, Japanese, Dutch, Spanish, Basque, Chinese)

KYOTO – Overall architecture Overview of the KYOTO process

GWC2010, Mumbai 5 Applying ontology mappings

GWC2010, Mumbai 6 Gobal Wordnet Grid Domain Ontology Base concepts Wn DOLCE/SUMO OntoWordnet Domain V

GWC2010, Mumbai 7 Available repositories in KYOTO Environment domain Term database: 500,000 terms per 1,000 documents per language Open data project: –DBPedia: 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples) –GeoNames: 8 million geographical names and consists of 6.5 million unique features whereof 2.2 million populated places and 1.8 million alternate names Domain thesauri and taxonomies: Species 2000: 2,1 million species Wordnets for 7 languages: about 50,000 to 120,000 synsets per language Ontologies: SUMO, DOLCE, SIMPLE

GWC2010, Mumbai 8 Domain T TV TV V T TT Species Domain Kyoto Knowledge Base Ontology Base concepts Wn DBPedia Terms 500K 2,100K DOLCE/SUMO OntoWordnet Terms 500K Species 2,100K Domain V

GWC2010, Mumbai 9 Species in the ontology - Implies to store 2.1 million species twice in the ontology.

GWC2010, Mumbai 10 Should all knowledge be stored in the central ontology? Vocabularies are too large for full inferencing with current reasoners Vocabularies are linguistically too diverse to be represented in an ontology Inferencing capabilities of formal ontologies is not needed for all levels of knowledge

GWC2010, Mumbai 11 Modeling knowledge in a domain Knowledge needs to be divided over different lexical and ontological layers: –Precisely define the relations between lexical and ontological layers –Precisely define the inferencing based on the distributed knowledge layers

GWC2010, Mumbai 12 Division of linguistic labor principle Putnam 1975: –No need to know all the necessary and sufficient properties to determine if something is "gold" –Assume that there is a way to determine these properties and that domain experts know how to recognize instances of these concepts. –Speakers can still use the word "gold" and communicate useful information

GWC2010, Mumbai 13 Division of semantic labor principle Digital version of Putnam (1975): –Computer does not need to have all the necessary and sufficient properties to determine if something is a "European tree frog" –Computer assumes that there is a way to determine this and that domain experts (people) know how to recognize instances of these concepts. –Computers can still reason with semantics and do useful stuff with textual data

GWC2010, Mumbai 14 What does the computer need to know? Distinction between rigid and non-rigid (Welty & Guarino 2002): –being a "cat" is essential to individual's existence and therefore rigid –being a "pet" is a temporarily role and therefore non- rigid; a cat can become a pet and stop being a pet without ceasing to exist –Felix is born as a cat and will always be a cat, but during some period Felix can become a pet and stop being a pet while he continuous to exist as a cat All 2.1 million species are rigid concepts

GWC2010, Mumbai 15 What does the computer need to know? Roles and processes in documents have more information value than the defining properties of species: –Species defined in terms of physical properties already known to expert; –Roles such as "invasive species", "migration species", "threatened species" express THE important properties of instances of species Roles are typically the terms we learn from the text not the species!

GWC2010, Mumbai 16 Wordnet-ontology-relations Rigid synset relations to ontology: –Synset:Endurant(Object); Synset:Perdurant(Event); Synset:Quality: –sc_equivalenceOf (= relation in WN-SUMO) or sc_subclassOf (+ relation in WN-SUMO) Non-rigid synset relations to ontology: –Synset:Role; Synset:Endurant(Object); Synset:Perdurant(Event) –sc_domainOf: range of ontology types that restricts a role –sc_playRole: role that is being played –sc_participantOf: the process in wich the role is played Rigidity can be detected automatically (Rudify, 80% precision, IAG 80%) and is stored in wordnets as attributes to synsets

Global Wordnet Grid Model perdurant change-of-location migration endurant object organism bird role done-by has-source has-destination has-path some has bird_1_Nsc_equivalentOf bird rigid English Wordnet in WN-LMFKYOTO Ontology in OWL-DL (Extension of DOLCE LT) migration_bird_1_Nsc_domainOf bird non-rigidsc_playRole done-by sc_participantOf migration migration_4_Nsc_equivalentOf migration migrate_1_Vsc_equivalentOf migration duck_1_N, rigid hyponym subclass

Global Wordnet Grid Model perdurant change-of-location migration endurant object organism bird role done-by has-source has-destination has-path some has bird_1_Nsc_equivalentOf bird rigid English Wordnet in WN-LMFKYOTO Ontology in OWL-DL (Extension of DOLCE LT) migration_bird_1_Nsc_domainOf bird non-rigidsc_playRole done-by sc_participantOf migration migration_4_Nsc_equivalentOf migration migrate_1_Vsc_equivalentOf migration duck_1_N, rigid subclass Dutch Wordnet migrerende dieren_1_Nsc_domainOf organism (migrating species)sc_playRole done-by non-rigidsc_participantOf migration equivalent_hypernym eng n (bird) eend_1_N (duck) equivalent eng n (duck) Spanish Wn, Basque Wn Italian Wn, Japanese Wn Chinese Wn.... Cross-lingual equivalence mappings are expressed through wordnet mappings

Wordnet to ontology mappings {create, produce, make}Verb, English -> sc_ equivalenceOf construction {artifact, artefact}Noun, English -> sc_domainOf physical_object -> sc_playRole result-existence -> sc_participantOf construction {kunststof}Noun, Dutch // lit. artifact substance -> sc_domainOf amount_of_matter -> sc_playRole result-existence -> sc_participantOf construction {meat}Noun, English -> sc_domainOf cow, sheep, pig -> sc_playRole patient -> sc_participantOf eat {,, }Noun, Chinese -> sc_domainOf animal -> sc_playRole patient -> sc_participantOf eat { غذاء, لحم, طعام}Noun, Arabic -> sc_domainOf cow, sheep -> sc_playRole patient -> sc_participantOf eat

Wordnet to ontology mappings {teacher}Noun, English -> sc_domainOf human -> sc_playRole done-by -> sc_participantOf teach {leraar}Noun, Dutch // lit. male teacher -> sc_domainOf man -> sc_playRole done-by -> sc_participantOf teach {lerares}Noun, Dutch // lit. female teacher -> sc_domainOf woman -> sc_playRole done-by -> sc_participantOf teach

Wordnet-LMF

WN-LMF Synset relations

WN-LMF Synset relations

GWC2010, Mumbai 24 Division of labor in knowledge sources Eleutherodactylus augusti Eleutherodactylus Leptodactylidae Anura Amphibia Chordata Animalia Eleutherodactylus atrabracus barking frog frog:1, toad:1, toad frog:1, anuran:1, batrachian:1, salientian:1 amphibian:3 vertebrate:1,craniate:1 chordate:1 animal:1 Base Concept 2.1 million species100,000 synsets2,000 types endurant physical-object organism endemic frog endangered frog poisonous frog alien frog 500,000 terms Skos database Wordnet-LMFOntology-OWL-DL Term database perdurant endanger

GWC2010, Mumbai 25 How to make inferences? Sparql queries to large Virtuoso databases: Aligned Species 2000, DBPedia Sql queries to term database Graph matching on wordnets stored in DebVisDic Reasoning on a small ontology

KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 26 Ontotagger applied to KAF Apply WSD to every term in the KAF representation of a text For each term in KAF representation of a text: (a)If wordnet synset (WSD) then check for ontology mappings, if none traverse wordnet hierarchy to find first mapping (b)Else check the SKOS database for wordnet mapping, if necessary traverse broader relations up to the first wordnet mapping and go to a.) (c)Else check the term database for wordnet mappings, if necessary traverse parent relations up to the first wordnet mapping and go to a.) Collect all mappings from the ontology and all (relevant) ontological implications and insert them into the KAF representation of the text.

KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 27 Examples 1.Migration birds in the Humber Estuary. 2.The migration of birds to the Humber Estuary 3.Bird migration in the Humber Estuary 4.Birds that migrate to the Humber Estuary

Annotation of ontological implications in KAF

Annotation of ontological implications in KAF

Annotation of ontological implications in KAF

KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 31 Kybot profiles IF T1 + to + T2 & T1.impliedType="change_of_location" & T1.impliedRole="has-target" & T2.Type="location" THEN IF T1 + from + T2 & T1.impliedType="change_of_location" & T1.impliedRole="has-source" & T2.Type="location" THEN

Kybot Knowledge Patterns <event eid="e1" target="t2" lemma="feed" pos="V" tense="PAST" aspect="NONE" polarity="POS"/> <event eid="e2" target="t20" lemma="migrate" pos="V" tense="PRESENT" aspect="NONE" polarity="POS"/>

GWC2010, Mumbai 33 Conclusion: Should all knowledge be stored in the central ontology? Vocabularies are too large for full inferencing Vocabularies are linguistically too diverse to be represented in an ontology Inferencing capabilities of formal ontologies is not needed for all levels of knowledge A model of division of labor (along the lines of Putnam 1975) in which knowledge is stored in 3 layers: –SKOS vocabularies and term databases –wordnet (WN-LMF) –ontology (OWL-DL), Each layer supports different types of inferencing ranging from Sparql queries, graph algorithms to reasoning. Mapping relations that support the division of labour and different types of inferencing and that allow for the encoding of language- specific lexicalizations and restrictions.

Conclusions Ontologies are abstract and minimal and lexicons are large and rich Semantic relations in lexicons are complementary to ontological relations Semantic relations expressed in a language system should be compatible with ontologies Large vocabularies of types (rigid things in the world) can be mapped to the ontology through combinations of lexical relations and basic ontological mappings Lexicalizations of contextual and subjective concepts need to be expressed through more complex relations Equivalences across languages partially through ontological expressions and partially across lexicons

Applying WSD to terms

GWC2010, Mumbai 36 How to integrate the data? Species 2000 vocabulary: 2,171,281 concepts in MySql database with parent relations: –Kingdom -> Class -> Order -> Family -> Genus -> Species -> Infra species –Animalia -> Chordata -> Amphibia -> Anura -> Leptodactylidae - > Eleutherodactylus -> Eleutherodactylus augusti Converted to SKOS format Aligned with DBPedia for language labels Aligned with Wordnet using vocabulary and relation mappings Published in Virtuoso, accessed with SPARQL queries

GWC2010, Mumbai 37 How to integrate data? Extending language labels using DBPedia Language Species 2000DBPedia extension English 69,045834,821 Spanish 1,731358,499 Italian 17,552215,511 Dutch 5,397185,437 Chinese 58,77483,756 Japanese 4,625139,754

GWC2010, Mumbai 38 Vocabulary match with Wordnet synsets If polysemous then SSI-Dijkstra weighting of senses based on the hyperonym chain Results still to be evaluated: –Animalia (animal:1)-> Chordata (chordate:1) - > Amphibia (amphibian:3) -> Anura -> Leptodactylidae -> Eleutherodactylus -> Eleutherodactylus augusti (barking frog:1) How to integrate data? Alignment Species 2000 with wordnet

GWC2010, Mumbai 39 Word-sense-disambiguation is applied to terms in KAF (Kyoto Annotation Format) Term hierarchy is extracted from KAF: –land:5 grassland:1 -> biome:1 woodland:1 -> biome:1 cropland urban land Results still to be evaluated: SemEval2010 How to integrate data? Alignment of terms with wordnet