Download presentation
Presentation is loading. Please wait.
1
歐盟科研架構計畫之人文及社會科學領域 (EU-FP7 SSH) 計畫徵求 說明會 2009.12.30 國立中山大學 - 歐盟科研架構計畫之人文及社會科學國家聯絡據點 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content and Semantics http://www.kyoto-project.eu/ http://www.kyoto-project.eu/ Chu-Ren Huang 黃居仁, Academia Sinica 1 December 30 2009, NSYSU, Kaohsiung
2
2 Overview History: Background information What is KYOTO Personal Journey: Building an internationally recognized career on Taiwan-based research Key Perspectives: –Global View –Integrative Thinking/Opposable Mind
3
History: Background information December 30 2009, NSYSU, Kaohsiung 3
4
Pre-History Pioneering Chinese Language Resources and Language Processing: since 1988 Construction of WordNet – Since 2000 Organized COLING: 2002 ISLE: International Standards in Language Engineering 2000-2002 –EC (ISLE – IST-1999-10647)+NSF+Asia December 30 2009, NSYSU, Kaohsiung 4
5
Brief History of KYOTO January 2006: Concept of Global WordNet Grid 2006 discussion of possibilities January 2007: Meeting in Kyoto (Amsterdam, Princeton/Berlin, Pisa, Kyoto, Taipei) –Identify the FP7 call to submit to –-Identify ecology/environment as the domain December 30 2009, NSYSU, Kaohsiung 5
6
Application Timeline I (2007) feb-15: General comments feb-15: Contact end-users feb-22: Find out the possibilities for non- European partners feb-22: Determine the final consortium a.o. based on the outcome of 2. feb-28: Determine the details (part A of the proposal) required from the EU for each partner December 30 2009, NSYSU, Kaohsiung 6
7
Application Timeline III (2007) mar-apr: Revision and finalizing proposal May 10: formal Submission Acknowledged July 15: Review Result –8 out or 45 project passed review –Call ID:* FP7-ICT-2007-1 *Instrument:* CP-FP-INFSO *Title:* Knowledge Yielding Ontologies for Transition-based Organization December 30 2009, NSYSU, Kaohsiung 7
8
Application Timeline III (2007) apr-13: Collect all forms (part A of the proposal) and signatures from the partners (PISA, AMSTERDAM) apr-13: Finalize the proposal part B (PISA, AMSTERDAM) may-02:Submit proposal part A and B (PISA, AMSTERDAM) December 30 2009, NSYSU, Kaohsiung 8
9
What is KYOTO December 30 2009, NSYSU, Kaohsiung 9
10
10 KYOTO (ICT-211423) Overview Title : Knowledge Yielding Ontologies for Transition-Based Organization Funded: –7 th Framework Program-ICT of the European Union: Intelligent Content and Semantics –Taiwan and Japan funded by national grants Goal: –Open and free platform for knowledge sharing across languages and cultures –Wiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skills –Bootstrap through open text mining & concept learning –Enables knowledge transition and information search across different target groups, transgressing linguistic, cultural and geographic boundaries. –Enables deep semantic search for facts and knowledge URL: http://www.kyoto-project.eu/ (http://www.kyoto-project.eu/)http://www.kyoto-project.eu/ Duration: –March 2008 – March 2011 Effort : –364 person months of work.
11
December 30 2009, NSYSU, Kaohsiung 11 Consortium 1.Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), 2.Consiglio Nazionale delle Ricerche (Pisa, Italy), 3.Berlin-Brandenburg Academy of Sciences and Humantities (Berlin, Germany), 4.Euskal Herriko Unibertsitatea (San Sebastian, Spain), 5.Academia Sinica (Tapei, Taiwan), 6.National Institute of Information and Communications Technology (Kyoto, Japan), 7.Irion Technologies (Delft, The Netherlands), 8.Synthema (Rome, Italy), 9.European Centre for Nature Conservation (Tilburg, The Netherlands), Subcontractors: –World Wide Fund for Nature (Zeist, The Netherlands), –Masaryk University (Brno, Czech)
12
December 30 2009, NSYSU, Kaohsiung 12 KYOTO (ICT-211423) Overview Languages: –English, Dutch, Italian, Spanish, Basque, Chinese, Japanese Domain: –Environmental domain, BUT usable in any domain Global: –Both European and non-European languages Available: –Free: as open source system and data (GPL) Future perspective: –Content standardization that supports world wide communication
13
December 30 2009, NSYSU, Kaohsiung The Taiwan Team PI: Chu-Ren Huang Co-I: Jason S. Chang (NTHU), Shu-Kai Hsieh (NTNU), Sue-jin Ker (SCU) Other Participants: Kathleen Ahrens (NTU), Ya-min Chou (MCU), Shu-chuan Tseng (AS) Funded: by NSC 13
14
December 30 2009, NSYSU, Kaohsiung Background: Multilingualism’s Challenges to HLT The scaling up of language resources in a complex and distributed environment Language resources are inherently distributed Language resources are best created and updated where the language is spoken and by people who speak it: human expertise, updating ling. changes, Impractical to maintain all language resources at the same site: huge quantity, rights 14
15
December 30 2009, NSYSU, Kaohsiung Multilingualism: Challenges to HLT II The scaling up of language resources in a complex and distributed environment To overcome linguistic diversity to support shared tasks and applications: web search etc. To create synergy of information from different languages To function as a foundation of inter-cultural collaboration 15
16
December 30 2009, NSYSU, Kaohsiung Proposed Answer to the Challenge Wordnet as shared language resource Wordnet: a concept-driven and relation-based lexical knowledgebase –About 40 language wordnets have been built –Sharing basic representation of meaning (synset indexes), which is mapped to an upper ontology (SUMO, among others) –Sharing a (universal) set of lexical semantic relations Information can be exchange using the same format regardless of source language 16
17
December 30 2009, NSYSU, Kaohsiung Proposed Answer to the Challenge Wordnets as Web Services Wordnet are distributed, just like grid nodes –Each wordnet site will be a grid node –Each will be a natural hosts for language related information service based on wordnet –Including any meta-NLP task: bootstrapping wordnets, harmonizing ontologies, building bilingual lexica, supporting cross-lingual alignments, etc. –And applications: multilingual query expansion, second language e-learning, machine translation, etc. 17
18
December 30 2009, NSYSU, Kaohsiung The Global Wordnet Grid First discussed at the 3rd GWA at Jeju, Korea in February 2006, by Chu-Ren Huang, Adam Pease, and Pied Vossen, among others A call for contribution can be found on GWA website http://www.globalwordnet.org/gwa/gwa_grid.htm Small scale experiment being carried out by ILC-CNR (Italy) and Academia Sinica (Taiwan) teams –Soria et al. (2006) Planned strategic session in January 2007 in Kyoto 18
19
State of the art in the environment domain
20
December 30 2009, NSYSU, Kaohsiung 20
21
December 30 2009, NSYSU, Kaohsiung 21 Baseline retrieval results 6 persons, 30 high-level questions, Result Rank CONFIRMED DISAPPROVED UNDECIDED Total 01320.31%2720.30%1015.87%5019.23% 169.38%96.77%914.29%249.23% 2812.50%139.77%711.11%2810.77% 357.81%64.51%34.76%145.38% 4812.50%64.51%23.17%166.15% 523.13%75.26%34.76%124.62% 623.13%64.51%46.35%124.62% 723.13%21.50%11.59%51.92% 846.25%32.26%11.59%83.08% 911.56%53.76%00.00%62.31% 1320.31%4936.84%2336.51%8532.69% Total6424.62%13351.15%6324.23%260
22
December 30 2009, NSYSU, Kaohsiung 22 KYOTO's Solution Text mining: –Massive and accurate indexing of facts from vast amounts of text; –In any language/culture from scattered sources; –Again and again to detect trends and changes; –Direct relation between knowledge modeling effort and text mining Knowledge modeling: –automatic learning of terms and concepts from text in any language; –formalization of knowledge in computer usable format -> wordnets & ontologies Community software: –For experts in the field and not knowledge engineers –Continuous and collaborative effort: adapt to the changing domain; consensus in the field; consensus across languages and cultures –Produce interoperable, formal, standardized knowledge structures; –Relate knowledge structure to expressions in languages
23
Top Middle H20CO2 Substance Abstract Process Physical Ontology Environmental organizations Tybot: term yielding robot Kybot: knowledge yielding robot Distributed, diverse & dynamic data 1 Capture text: "Sudden increase of CO2 emissions in 2008 in Europe" 2 CO2 emission 3 Wikyoto maintain terms & concepts 4 Index facts: Process:Emission Involves: CO2 Property:increase, sudden When: 2008 Where: Europe 5 Text & Fact Index Semantic Search 6 Citizens Governments Companies Domain CO2 Emission H20 Pollution Greenhouse Gas Wordnets 23 December 30 2009, NSYSU, Kaohsiung
24
Integration of knowledge
25
December 30 2009, NSYSU, Kaohsiung 25 Available data repositories Open data project: –DBPedia: 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples). –GeoNames Domain database Species 2000: 2,1 million species Term database: 500,000 terms per 10,000 documents per language Wordnets for 7 languages: about 50,000 to 120,000 synsets per language Ontologies: EuroWordNet top ontology, SUMO, DOLCE
26
December 30 2009, NSYSU, Kaohsiung 26 How to integrate the data? Species 2000 vocabulary: 2,171,281 concepts in MySql database with parent relations: –Kingdom -> Class -> Order -> Family -> Genus -> Species -> Infra species –Animalia -> Chordata -> Amphibia -> Anura -> Leptodactylidae - > Eleutherodactylus -> Eleutherodactylus augusti Converted to SKOS format Aligned with DBPedia for language labels Aligned with Wordnet using vocabulary and relation mappings Published in Virtuoso, accessed with SPARQL queries
27
December 30 2009, NSYSU, Kaohsiung 27 How to integrate data? Extending language labels using DBPedia Language Species 2000DBPedia extension English 69,04583,4821 Spanish 1,73135,8499 Italian 17,55221,5511 Dutch 5,39718,5437 Chinese 58,77483,756 Japanese 4,62513,9754
28
Domain T V TV V T TT Vocabularies Domain Domain Domain Kyoto Knowledge Base Ontology Base concepts Wn Domain Wn Terms Vocabularies TermsDBPedia TV 500K 2,100K 500K 2,100K DOLCE/OntoWordnet 28 December 30 2009, NSYSU, Kaohsiung
29
29 Should all knowledge be stored in the central ontology? Vocabularies are too large for full inferencing Vocabularies are linguistically too diverse to be represented in an ontology Inferencing capabilities of formal ontologies is not needed for all levels of knowledge A model of division of labor (along the lines of Putnam 1975) in which knowledge is stored in 3 layers: –SKOS vocabularies and term databases –wordnet (WN-LMF) –ontology (OWL-DL), Each layer supports different types of inferencing ranging from Sparql queries, graph algorithms to reasoning. Mapping relations that support the division of labour and different types of inferencing and that allow for the encoding of language- specific lexicalizations and restrictions.
30
December 30 2009, NSYSU, Kaohsiung 30 What does the computer need to know? Distinction between rigid and non-rigid (Welty & Guarino 2002): –being a "cat" is essential to individual's existence and therefore rigid –being a "pet" is a temporarily role and therefore non- rigid; a cat can become a pet and stop being a pet without ceasing to exist –Felix is born as a cat and will always be a cat, but during some period Felix can become a pet and stop being a pet while it continuous to exist All 2.1 million species are rigid concepts
31
December 30 2009, NSYSU, Kaohsiung 31 What does the computer need to know? Roles and processes in documents have more information value than the defining properties of species: –Species defined in terms of physical properties already known to expert; –Roles such as "invasive species", "migration species", "threatened species" express THE important properties of instances of species Telicity: Roles are typically the terms we learn from the text not the species!
32
December 30 2009, NSYSU, Kaohsiung 32 Division of labor in knowledge sources Eleutherodactylus augusti Eleutherodactylus Leptodactylidae Anura Amphibia Chordata Animalia Eleutherodactylus atrabracus barking frog frog:1, toad:1, toad frog:1, anuran:1, batrachian:1, salientian:1 amphibian:3 vertebrate:1,craniate:1 chordate:1 animal:1 Base Concept 2.1 million species100,000 synsets1,000 types endurant physical-endurant physical-object endemic frog endangered frog poisonous frog alien frog 500,000 terms Skos database WordnetOntology Term database
33
December 30 2009, NSYSU, Kaohsiung 33 Wordnet-ontology-relations Rigid synsets: –Synset:Endurant; Synset:Perdurant; Synset:Quality: –sc_equivalenceOf or sc_subclassOf Non-rigid synsets: –Synset: Role –sc_domainOf: range of ontology types that restricts a role –sc_playRole: role that is being played
34
December 30 2009, NSYSU, Kaohsiung 34 Lexicalization of process-related concepts {create, produce, make}Verb, English -> sc_ equivalenceOf ConstructionProcess {artifact, artefact}Noun, English -> sc_domainOf PhysicalObject -> sc_playRole ConstructedRole {kunststof}Noun, Dutch // lit. artifact substance -> sc_domainOf AmountOfMatter -> sc_playRole ConstructedRole {meat}Noun, English -> sc_domainOf Cow, Sheep, Pig -> sc_playRole EatenRole { 名 肉, 食物, 餐 }Noun, Chinese -> sc_domainOf Cow, Sheep, Pig, Rat, Mole -> sc_playRole EatenRole { غذاء, لحم, طعام}Noun, Arabic -> sc_domainOf Cow, Sheep -> sc_playRole EatenRole
35
December 30 2009, NSYSU, Kaohsiung 35 How to make inferences? Sparql queries to large Virtuoso databases: Aligned Species 2000, DBPedia Sql queries to term database Graph matching on wordnets Reasoning on a small ontology
36
Semantic Search (skipped)
37
The core Kyoto system is distributed under the free open source license (GPL)
38
Personal Journey: Building an internationally recognized career on Taiwan- based research December 30 2009, NSYSU, Kaohsiung 38
39
Pre-History Pioneering Chinese Language Resources and Language Processing: since 1988 Construction of WordNet – Since 2000 Organized COLING: 2002 ISLE: International Standards in Language Engineering 2000-2002 –EC (ISLE – IST-1999-10647)+NSF+Asia December 30 2009, NSYSU, Kaohsiung 39
40
Key Perspectives: Global View Integrative Thinking/ Opposable Mind December 30 2009, NSYSU, Kaohsiung 40
41
Global View Think and Act Globally –Put what is good for the world before what is good for Taiwan –What is good for the world must be good for Taiwan, but what is good for Taiwan (thinking parochially) may not be good for the world –Hence cannot be supported by other partners –CANNOT be done NOT GOOD for Taiwan December 30 2009, NSYSU, Kaohsiung 41
42
Think Globally Research Direction: Think of Global Impact –Not of local ranking –Find your own niche 寧為雞首,不為牛後 Think of the scale of Taiwan –And act strategically –Contributing Team Partner vs. Team Leader : Choose the RIGHT team, NOT my team December 30 2009, NSYSU, Kaohsiung 42
43
Integrative Thinking/ Opposable Mind Create a Win-Win Situation out of a Zero- Sum Game The Opposable Mind (Roger Martin 2007) The Design of Business: Why Design Thinking is the Next Competitive Advantage (Martin 2009) December 30 2009, NSYSU, Kaohsiung 43
44
In Sum: 友 直 諒 多聞 December 30 2009, NSYSU, Kaohsiung 44
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.