KYOTO (ICT-211423) Knowledge Yielding Ontologies for Transition-Based Organization Intelligent Content and Semantics The First KYOTO Workshop February 2-3 2009 Overall Kyoto Architecture and Kyoto Annotation Format Carlo Aliprandi - SyNTHEMA
Kyoto Architecture - Baselines KYOTO: an information sharing system that enables the extraction of deep semantics (Web 3.0) from texts, for a selected domain, anchoring meaning across cultures and languages KYOTO: a social platform (Web 2.0) for knowledge sharing and transfer supporting people and organization in building, maintaining and improving knowledge Baselines for KYOTO architecture: Strong backbone for data exchange among components Adopt and adapt existing standards Open and public system Synchronize across versions/languages/NLP tools/research groups API to connect to sources and services Services to plug and unplug different knowledge sources (Lexicon, Wordnets, Ontologies Tradeoff btw generic vs domain resources The First KYOTO Workshop, Amsterdam, February 2-3 2009
The First KYOTO Workshop, Amsterdam, February 2-3 2009 System components Capture Server system for selecting, converting and storing documents into the Kyoto document DB. linguistic processors producing KAF annotations Wikyoto system wiki system for yielding wordnets and ontologies. Main interface for concept and fact users Document Manager Term Editor Kybot Editor The First KYOTO Workshop, Amsterdam, February 2-3 2009
The First KYOTO Workshop, Amsterdam, February 2-3 2009 System components Tybot Server Automatic term and relation extraction from KAF documents and population of term database Validation of terms and population and mapping to D-WNs via Wikyoto Kybots Server Semi-Automatic fact annotation on KAF documents, using patterns (Kybots) Kyoto Search system Main interface for end-users Fact search system Fact alert system The First KYOTO Workshop, Amsterdam, February 2-3 2009
(Simplified) architecture: domain expert point-of-view The First KYOTO Workshop, Amsterdam, February 2-3 2009
The First KYOTO Workshop, Amsterdam, February 2-3 2009 Overall architecture Wordnet (Japanese) Wordnet (Dutch) Wordnet (Spanish) Wordnet (Chinese) Wordnets Basque Term DB Japan Term DB Domain Wordnet Extracted Terms Term Editor Doc. Manager Kybot Wikyoto [2] [1] [3] Concept User Kybots DB Document Base Domain Ontology Tybot Server Capture Indexing Kybot Fact User FrameNet DOLCE Kyoto Ontology SUMO Ontologies Search App. Browse Kyoto System L.P. (Dutch) (English) (Basque) (Italian) Linguistic Processor End User The First KYOTO Workshop, Amsterdam, February 2-3 2009
The First KYOTO Workshop, Amsterdam, February 2-3 2009 Data formats: KAF Kyoto Annotation Format (Level 1) a multi-layered annotation format for: Tokenizaton and word form segmentation POS tagging Lemmatization and Term extraction Constituency Tagging Dependency Tagging ENG-3.0-107695012-N The First KYOTO Workshop, Amsterdam, February 2-3 2009
The First KYOTO Workshop, Amsterdam, February 2-3 2009 Semantic Annotation Semantic Annotation Format for: Named Entity Recognition (time, events, quant. …) Word Sense Disambiguation (D-WSD) Semantic Role Labeling (SRL) no synsets KAF level2 (SemKAF) ENG-3.0-107630294-N The First KYOTO Workshop, Amsterdam, February 2-3 2009
The First KYOTO Workshop, Amsterdam, February 2-3 2009 Data formats Level of annotation: Morpho-syntax annotation Semantic annotation Terms representation Facts annotation Wordnets Ontologies Standard format }KAF TMF KAF LMF OWL The First KYOTO Workshop, Amsterdam, February 2-3 2009
The First KYOTO Workshop, Amsterdam, February 2-3 2009 KAF annotation : words <text> <wf wid="w1" sent="1" para="1">Tropical</wf> <wf wid="w2" sent="1" para="1">terrestrial</wf> <wf wid="w3" sent="1" para="1">species</wf> <wf wid="w4" sent="1" para="1">populations</wf> <wf wid="w5" sent="1" para="1">declined</wf> <wf wid="w6" sent="1" para="1">by</wf> <wf wid="w7" sent="1" para="1">55</wf> <wf wid="w8" sent="1" para="1">per</wf> <wf wid="w9" sent="1" para="1">cent</wf> <wf wid="w10" sent="1" para="1">on</wf> <wf wid="w11" sent="1" para="1">average</wf> <wf wid="w12" sent="1" para="1">from</wf> <wf wid="w13" sent="1" para="1">1970</wf> <wf wid="w14" sent="1" para="1">to</wf> <wf wid="w15" sent="1" para="1">2003</wf> </text> Tropical terrestrial species populations declined by 55 per cent on average from 1970 to 2003. The First KYOTO Workshop, Amsterdam, February 2-3 2009
The First KYOTO Workshop, Amsterdam, February 2-3 2009 KAF annotation : terms <term tid="t5" type="open" lemma="decline" pos="V"> <spans> <target id="w5"/> </spans> <term tid="t7" type="open" lemma="55 per cent" pos="N"> <target id="w7"/> <target id="w8"/> <target id="w9"/> </term> Tropical terrestrial species populations declined by 55 per cent on average from 1970 to 2003. The First KYOTO Workshop, Amsterdam, February 2-3 2009
KAF annotation : constituents <chunks> <!-- terrestrial species --> <chunk cid="2" head="t3" phrase="NP"> <spans> <target id="t2"/> <target id="t3"/> </spans> </chunk> <!-- terrestrial species populations --> <chunk cid="3" head="t4" phrase="NP"> <target id="t2"/> <target id="t3"/> <target id="t4"/> <!-- Tropical terrestrial species --> <chunk cid="4" head="t3" phrase="NP"> <target id="t1"/> <target id="t2"/> <target id="t3"/> </spans> </chunk> </chunks> Tropical terrestrial species populations declined by 55 per cent on average from 1970 to 2003. The First KYOTO Workshop, Amsterdam, February 2-3 2009
KAF annotation : dependencies <deps> <dep from="t4" to="t5" rfunc="subj"/> <dep from="t4" to="t1" rfunc="mod"/> <dep from="t4" to="t2" rfunc="mod"/> <dep from="t4" to="t3" rfunc="mod"/> <term tid="t1" type="open" lemma="tropical" pos="G"> .. <term tid="t2" type="open" lemma="terrestrial" pos="G"> <term tid="t3" type="open" lemma="species" pos="N"> <term tid="t4" type="open" lemma="population" pos="N"> <term tid="t5" type="open" lemma="decline" pos="V"> Tropical terrestrial species populations declined by 55 per cent on average from 1970 to 2003. The First KYOTO Workshop, Amsterdam, February 2-3 2009
KAF annotation: Word to Sense mapping <term tid="t4" type="open" lemma="population" pos="N"> <spans> <target id="w4"/> </spans> </term> <term tid="t4" type="open" lemma="population" pos="N"> <spans> <target id="w4"/> </spans> <senseAlt> <sense sensecode="EN-17-00861095-n" /> <sense sensecode="EN-17-00859568-n" /> ....... </senseAlt> </term> The First KYOTO Workshop, Amsterdam, February 2-3 2009
The First KYOTO Workshop, Amsterdam, February 2-3 2009 KAF annotation: WSD <term tid="t4" type="open" lemma="population" pos="N"> <spans> <target id="w4"/> </spans> <senseAlt> <sense sensecode="EN-17-00861095-n" /> <sense sensecode="EN-17-00859568-n" /> ....... <term tid="t4" type="open" lemma="population" pos="N"> <spans> <target id="w4"/> </spans> <senseAlt> <sense sensecode="EN-17-00859568-n" confidence="0.80 "/> <sense sensecode="EN-17-00257849-n" confidence="0.13 /> <sense sensecode="EN-17-00962397-n" confidence="0.07 /> </senseAlt> </term> The First KYOTO Workshop, Amsterdam, February 2-3 2009
Linguistic processors KAF parser (Kyoto core) Free KAF parser (Kyoto +) Semantic KAF (post WSD) Semantic KAF (embed. WSD) English yes Dutch Italian Spanish yes* Basque Chinese Japanese The First KYOTO Workshop, Amsterdam, February 2-3 2009
The First KYOTO Workshop, Amsterdam, February 2-3 2009 Kyoto open-ness The kernel of the system. Core components available as Open Source Integrating existing resources Usable by anybody in the 7 Kyoto langs Fast delivery: at M12 beta available for several components (Capture Server, LPs, Tybot server, Wikyoto …) Third-part resources as plug-ins Third-part (open sources) linguistic processors New languages Search Interface Fact Alert System - News Monitoring System The First KYOTO Workshop, Amsterdam, February 2-3 2009
The First KYOTO Workshop, Amsterdam, February 2-3 2009 Thanks The First KYOTO Workshop, Amsterdam, February 2-3 2009