FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 KYOTO (ICT ) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content and Semantics Piek Vossen, VU University Amsterdam
FP7, Information Day Call 5, Luxembourg, May 11-12, Project goals Open platform for knowledge sharing across languages and cultures –Wiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skills –Bootstrap this knowledge through open text mining & concept learning –Enables knowledge transition and information search across different target groups, transgressing linguistic, cultural and geographic boundaries. –Enables deep semantic search for facts and knowledge Free, open source license (GPL)
FP7, Information Day Call 5, Luxembourg, May 11-12, Languages: –English, Dutch, Italian, Spanish, Basque, Chinese, Japanese Domain: –Environmental domain, BUT usable in any domain Global: –Both European and non-European languages Available: –Free: as open source system and data (GPL) Future perspective: –Content standardization that supports world wide communication Scope
FP7, Information Day Call 5, Luxembourg, May 11-12, KYOTO (ICT ) Funded: –7 th Framework Program-ICT of the European Union: Intelligent Content and Semantics –Taiwan and Japan funded by national grants STREPS project: research & development Duration: –March 2008 – March 2011 Effort : –364 person months of work.
FP7, Information Day Call 5, Luxembourg, May 11-12, Consortium 1.Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), 2.Consiglio Nazionale delle Ricerche (Pisa, Italy), 3.Berlin-Brandenburg Academy of Sciences and Humantities (Berlin, Germany), 4.Euskal Herriko Unibertsitatea (San Sebastian, Spain), 5.Academia Sinica (Tapei, Taiwan), 6.National Institute of Information and Communications Technology (Kyoto, Japan), 7.Irion Technologies (Delft, The Netherlands), 8.Synthema (Rome, Italy), 9.European Centre for Nature Conservation (Tilburg, The Netherlands), Subcontractors: –World Wide Fund for Nature (Zeist, The Netherlands), –Masaryk University (Brno, Czech)
FP7, Information Day Call 5, Luxembourg, May 11-12, Current situation environment domain Vast amount of information in all kinds of formats and structures: websites, documents, databases, experts, community networks Scattered over the world: different regions, languages and cultures Highly dynamic and developing Increasing time and information pressure Technology gap, use first results Google Critical knowledge dependency
FP7, Information Day Call 5, Luxembourg, May 11-12, KYOTO cycle
FP7, Information Day Call 5, Luxembourg, May 11-12, KYOTO's Solution Text mining: –Massive and accurate indexing of facts from vast amounts of text; –In any language/culture from scattered sources; –Again and again to detect trends and changes; –Direct relation between knowledge modeling effort and text mining Knowledge modeling: –automatic learning of terms and concepts from text in any language; –formalization of knowledge in computer usable format -> wordnets & ontologies Community software: –For experts in the field and not knowledge engineers –Continuous and collaborative effort: adapt to the changing domain; consensus in the field; consensus across languages and cultures –Produce interoperable, formal, standardized knowledge structures; –Relate knowledge structure to expressions in languages
Top Middle H20CO2 Substance Abstract Process Physical Ontology Environmental organizations Tybot: term yielding robot Kybot: knowledge yielding robot Wordnets Distributed, diverse & dynamic data 1 Capture text: "Sudden increase of CO2 emissions in 2008 in Europe" 2 CO2 emission 3 Wikyoto maintain terms & concepts 4 Index facts: Process:Emission Involves: CO2 Property:increase, sudden When: 2008 Where: Europe 5 Text & Fact Index Semantic Search 6 Citizens Governments Companies Domain CO2 Emission H20 Pollution Greenhouse Gas
FP7, Information Day Call 5, Luxembourg, May 11-12, Achievements after 1 st year First version of all system components –Wordnets in 7 languages in uniform database formats –Standard representation for output of linguistic processing for 7 languages, based on ISO proposals –Tybot (term extraction), Kybot (fact extraction) and Wikyoto (user editor) –Semantic search Extensive definition of user requirements Integration of system components
Potential impact
Kyoto Knowledge Base WnIT Domain WnEN Domain WnEU Domain WnNL Domain WnJP Domain WnCH Domain WnES Domain Ontology Domain Ontology
FP7, Information Day Call 5, Luxembourg, May 11-12, Linking Open Data dataset cloud Wordnet sailing terms Ontology environment concepts environment facts Ontology medical concepts Wordnet legal terms Wordnet medical terms medical facts legal facts Ontology legal concepts Ontology sailing concepts Wordnet environment terms Wordnet environment terms Wordnet environment terms Wordnet environment terms Wordnet environment terms
Project characteristics
FP7, Information Day Call 5, Luxembourg, May 11-12, Why STRP project? Major technical challenges Cross-cultural and cross-lingual Small consortium for intense collaboration and discussion Bridge the gap between users and technology: two-directional process Role out needs to follow from technical achievements
FP7, Information Day Call 5, Luxembourg, May 11-12, How to keep focus? Use existing state of the art technology Start from current practice as baseline Develop robust platform that adds to baseline, with baseline as fall back Gradually add richer data, more precision and new functionalities Allow end-users to control the process, driven by textual examples Open standardized architecture that can be developed further
Thank you for your attention