1 Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying Department of Computing The Hong Kong Polytechnic University Chinese Core Ontology Construction from a Bilingual.

Slides:



Advertisements
Similar presentations
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Improved TF-IDF Ranker
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Database Systems: Design, Implementation, and Management Tenth Edition
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
A Framework for Ontology-Based Knowledge Management System
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Methodology Conceptual Database Design
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
CSCI 3140 Module 2 – Conceptual Database Design Theodore Chiasson Dalhousie University.
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
LREC 2008 AWN 1 Arabic WordNet: Semi-automatic Extensions using Bayesian Inference H. Rodríguez 1, D. Farwell 1, J. Farreres 1, M. Bertran 1, M. Alkhalifa.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
Ontology Mapping in Pervasive Computing Environment C.Y. Kong, C.L. Wang, F.C.M. Lau The University of Hong Kong.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Knowledge Structure Vijay Meena ( ) Gaurav Meena ( )
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Keyword Translation Accuracy and Cross-Lingual Question Answering in Chinese and Japanese Teruko Mitamura Mengqiu Wang Hideki Shima Frank Lin In CMU EACL.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Enhancing Text Clustering by Leveraging Wikipedia Semantics.
Corpus Exploitation from Wikipedia for Ontology Construction Gaoying Cui, Qin Lu, Wenjie Li, Yirong Chen The Department of Computing The Hong Kong Polytechnic.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Constructing A Yami Language Lexicon Database from Yami Archiving Projects Meng-Chien Yang(Providence University, Taiwan) D. Victoria Rau(National Chung.
Software Testing.
Chaitali Gupta, Madhusudhan Govindaraju
Generalized Diagnostics with the Non-Axiomatic Reasoning System (NARS)
Presentation transcript:

1 Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying Department of Computing The Hong Kong Polytechnic University Chinese Core Ontology Construction from a Bilingual Term Bank

2 Outline Introduction Related Works Algorithm Design– COCA Performance Evaluation Conclusion

3 Introduction What is a Core Ontology A mid-level ontology Bridges the gap between an upper ontology and a domain ontology

4 Concepts and Terminologies Upper Ontology A general ontology to ensure reusability across different domains (e.g.: Computer Program in SUMO) Domain Ontology An ontology conceptualize a specific domain (e.g.: Free Software in IT domain) More application dependent, more extents of concepts Midlevel Ontology(Core Concept) Basic concepts of a domain More application independent, more intents of concepts. core ontology (e.g.: Software) Frequently used, ability to form other concepts Core Terms Lexical units of core concepts

5 Related Works Manually constructed ontologies SUMO Famous upper level ontology works based on lexicon CoreLex (Buitelaar, P., 1998) EuroWordnet (Rodríguez, 1998 ) Ontology harmonization: Core ontology “Towards a Core Ontology for Information Integration” (M. Doerr, 2003) A most similar work “Enriching Core Ontology with Domain Thesaurus through Concept and Relation Classification ” (Huang, 2007) Use Concept and Relation Classification to Enrich core ontology

6 Our Previous Works Chinese terminology extraction Chinese core term extraction(Ji et al, 2007) Preliminary work on automatic construction of core ontology construction using English-Chinese Term Bank (MRCOCA, Ontolex 2007, Chen, 2007) Bilingual lexicon Extended strings Frequency information in synset Weight from extended strings are integrated into final weight by simple addition Mapping to synset and SUMO can only achieve accuracy of about 50%

7 Issues What kind of concept should be included? How to identify core concepts If through core terms, disambiguation What and how to identify relations? Making use of available resources Chinese NLP resource scares English NLP resources abundant

8 Requirements of Core Ontology The concepts must be widely accepted and commonly referenced Corresponding core terms must be highly used and productive The concepts/terms can be mapped to upper ontology. So the core ontology can inherit the attributes provided by upper ontology

9 Core Ontology Construction Algorithm(COCA) for Chinese Extract Chinese core terms from a bilingual term bank Mapped core term Tc to English terms Mapping English terms to WordNet Mapping synset to a upper ontology concept in SUMO

10 COCA - Resources Used ITCTerm a domain specific core term list ( Chen, 2007 ) CETBank Chinese-English bilingual term bank 1,500 most productive core terms extracted can serve as suffixes to form more than 50% of the terms in CETBank) WordNet SUMO Mappings between WordNet and SUMO

11 The Framework of COCA

12 COCA – Statistical Translation Module Translation ambiguity: Each Chinese core term T C ∈ ITCTerm has a set of translations T_Set E, T E ∈ T_Set E Objective to estimate the likelihood of every translation using extended terms of T C P(T E | T C ) for all T E ∈ T_Set E.

13 COCA - Sense Disambiguation Module Mapping a given T C to the Synset S through its translation set T_Set E (T C ) Mapping probability of a English term T E to take a synset S using freq. info in WordNet Mapping probability of T C to take a particular synset S via an English translation T E

14 COCA - Concept Selection Module Combining three features multi-path feature hypernyms feature part-of-speech feature Using Union Probability of Independent Events

15 Feature 1 –Multi-Paths to Synset Multiple paths is the path between Chinese core terms and synset via different English translations The feature merges the probability of multiple paths

16 Feature 2 – Hyponyms in domain Incorporate info on all the extended strings Extended String uses the core term as headword and is the hyponym of the core term Length Ratio Union Probability of Independent Events

17 Feature 3 – Part of Speech Probability of the POS tagpos(S) owned by a synsetS given a core termT c PoS Tag estimation: Heuristics on Adj, Verb, and noun based on position

18 Integrate Features Using Union Probability of Independent Events

19 Evaluation Algorithm Output A pair of for each Chinese core term with the highest mapping weight Evaluation Standard For each T c_i, whether their mappings to Synset are the best match with respect to this domain Answer Preparation Answer is manually made by two experts in IT domain respectively on the same set of data

20 Performance The evaluation conducted on the top N frequent core terms The algorithm COCA achieves 71% in accuracy (N is 28 in this paper) Compared to the result of MRCOCA (Chen, 2007) which achieved only 50% Two examples of core term to syntset mapping generated by the algorithm are given for “ 软件 ” and “ 网络 ”.

21 No.ZhEnSUMO ConceptSynset 1 软件 (SC) SoftwareComputerProgram+software,software_system (computer science) written programs or procedures or rules and associated documentation pertaining to the operation of a computer system and that are stored in read/write memory 2 软件 FacilityStationaryArtifact+facility,installation something created to provide a particular service; "the assembly plant is an enormous facility" 3 软件 FacilitySubjectiveAssessment Attribute +proficiency, facility, technique skillfulness in the command of fundamentals deriving from practice and familiarity; "practice greatly improves proficiency" 4 软件 FacilitySubjectiveAssessment Attribute +adeptness,adroitness,deftness,facility,quickness skillful performance without difficulty; "his quick adeptness was a product of good design" 5 软件 facilityRoom+toilet, lavatory, lav, can, facility, john, privy, bathr a room equipped with washing and toilet facilities 6 软件 facilitySubjectiveAssessment Attribute +facility,readiness a natural effortlessness; "a happy readiness of conversation"--Jane Austen 7 网络 (S) netArtifact+network,net,mesh,meshwork,reticulation an interconnected or intersecting configuration or system of components 8 网络 (C) networkCollection+network,web an intricately connected system of things or people; "a network of spies" or "a web of intrigue" 9 网络 networkSocialInteraction+network communicate with and within a group; "You have to network if you want to get a good job" 10 网络 netPursuing+net,nett catch with a net; "net a fish" 11 网络 netMaking+web,net construct or form a web, as if by weaving 12 网络 netSubjectiveAssessment Attribute +final,last,net conclusive in a process or progression; "the final answer"; "a last resort"; "the net result" 13 网络 netCurrencyMeasure+net,nett remaining after all deductions; "net profit"

22 Conclusion Evaluation of COCA repeated on an English- Chinese bilingual Term bank with more than 130K entries show that the algorithm is “42%” improved in accuracy compared to MRCOCA (Our Previous Works) The three features and the new algorithm based on probability made the improvement

23 Term bank can help to quickly construct domain core ontology by selecting the concept nodes and relations used in domain Bilingual term bank can further introduce the second language realization of the core ontology effectively and automatically

24 Future Works Evaluation on three features how effective they are how much they contribute to the final performance Consideration of more features such as abbreviation, synset of head word of core term and etc. Use of other resources

25 Q&A

26 Q A