Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying Department of Computing The Hong Kong Polytechnic University Chinese Core Ontology Construction from a Bilingual.

Similar presentations


Presentation on theme: "1 Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying Department of Computing The Hong Kong Polytechnic University Chinese Core Ontology Construction from a Bilingual."— Presentation transcript:

1 1 Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying Department of Computing The Hong Kong Polytechnic University Chinese Core Ontology Construction from a Bilingual Term Bank

2 2 Outline Introduction Related Works Algorithm Design– COCA Performance Evaluation Conclusion

3 3 Introduction What is a Core Ontology A mid-level ontology Bridges the gap between an upper ontology and a domain ontology

4 4 Concepts and Terminologies Upper Ontology A general ontology to ensure reusability across different domains (e.g.: Computer Program in SUMO) Domain Ontology An ontology conceptualize a specific domain (e.g.: Free Software in IT domain) More application dependent, more extents of concepts Midlevel Ontology(Core Concept) Basic concepts of a domain More application independent, more intents of concepts. core ontology (e.g.: Software) Frequently used, ability to form other concepts Core Terms Lexical units of core concepts

5 5 Related Works Manually constructed ontologies SUMO Famous upper level ontology works based on lexicon CoreLex (Buitelaar, P., 1998) EuroWordnet (Rodríguez, 1998 ) Ontology harmonization: Core ontology “Towards a Core Ontology for Information Integration” (M. Doerr, 2003) A most similar work “Enriching Core Ontology with Domain Thesaurus through Concept and Relation Classification ” (Huang, 2007) Use Concept and Relation Classification to Enrich core ontology

6 6 Our Previous Works Chinese terminology extraction Chinese core term extraction(Ji et al, 2007) Preliminary work on automatic construction of core ontology construction using English-Chinese Term Bank (MRCOCA, Ontolex 2007, Chen, 2007) Bilingual lexicon Extended strings Frequency information in synset Weight from extended strings are integrated into final weight by simple addition Mapping to synset and SUMO can only achieve accuracy of about 50%

7 7 Issues What kind of concept should be included? How to identify core concepts If through core terms, disambiguation What and how to identify relations? Making use of available resources Chinese NLP resource scares English NLP resources abundant

8 8 Requirements of Core Ontology The concepts must be widely accepted and commonly referenced Corresponding core terms must be highly used and productive The concepts/terms can be mapped to upper ontology. So the core ontology can inherit the attributes provided by upper ontology

9 9 Core Ontology Construction Algorithm(COCA) for Chinese Extract Chinese core terms from a bilingual term bank Mapped core term Tc to English terms Mapping English terms to WordNet Mapping synset to a upper ontology concept in SUMO

10 10 COCA - Resources Used ITCTerm a domain specific core term list ( Chen, 2007 ) CETBank Chinese-English bilingual term bank 1,500 most productive core terms extracted can serve as suffixes to form more than 50% of the terms in CETBank) WordNet SUMO Mappings between WordNet and SUMO

11 11 The Framework of COCA

12 12 COCA – Statistical Translation Module Translation ambiguity: Each Chinese core term T C ∈ ITCTerm has a set of translations T_Set E, T E ∈ T_Set E Objective to estimate the likelihood of every translation using extended terms of T C P(T E | T C ) for all T E ∈ T_Set E.

13 13 COCA - Sense Disambiguation Module Mapping a given T C to the Synset S through its translation set T_Set E (T C ) Mapping probability of a English term T E to take a synset S using freq. info in WordNet Mapping probability of T C to take a particular synset S via an English translation T E

14 14 COCA - Concept Selection Module Combining three features multi-path feature hypernyms feature part-of-speech feature Using Union Probability of Independent Events

15 15 Feature 1 –Multi-Paths to Synset Multiple paths is the path between Chinese core terms and synset via different English translations The feature merges the probability of multiple paths

16 16 Feature 2 – Hyponyms in domain Incorporate info on all the extended strings Extended String uses the core term as headword and is the hyponym of the core term Length Ratio Union Probability of Independent Events

17 17 Feature 3 – Part of Speech Probability of the POS tagpos(S) owned by a synsetS given a core termT c PoS Tag estimation: Heuristics on Adj, Verb, and noun based on position

18 18 Integrate Features Using Union Probability of Independent Events

19 19 Evaluation Algorithm Output A pair of for each Chinese core term with the highest mapping weight Evaluation Standard For each T c_i, whether their mappings to Synset are the best match with respect to this domain Answer Preparation Answer is manually made by two experts in IT domain respectively on the same set of data

20 20 Performance The evaluation conducted on the top N frequent core terms The algorithm COCA achieves 71% in accuracy (N is 28 in this paper) Compared to the result of MRCOCA (Chen, 2007) which achieved only 50% Two examples of core term to syntset mapping generated by the algorithm are given for “ 软件 ” and “ 网络 ”.

21 21 No.ZhEnSUMO ConceptSynset 1 软件 (SC) SoftwareComputerProgram+software,software_system (computer science) written programs or procedures or rules and associated documentation pertaining to the operation of a computer system and that are stored in read/write memory 2 软件 FacilityStationaryArtifact+facility,installation something created to provide a particular service; "the assembly plant is an enormous facility" 3 软件 FacilitySubjectiveAssessment Attribute +proficiency, facility, technique skillfulness in the command of fundamentals deriving from practice and familiarity; "practice greatly improves proficiency" 4 软件 FacilitySubjectiveAssessment Attribute +adeptness,adroitness,deftness,facility,quickness skillful performance without difficulty; "his quick adeptness was a product of good design" 5 软件 facilityRoom+toilet, lavatory, lav, can, facility, john, privy, bathr a room equipped with washing and toilet facilities 6 软件 facilitySubjectiveAssessment Attribute +facility,readiness a natural effortlessness; "a happy readiness of conversation"--Jane Austen 7 网络 (S) netArtifact+network,net,mesh,meshwork,reticulation an interconnected or intersecting configuration or system of components 8 网络 (C) networkCollection+network,web an intricately connected system of things or people; "a network of spies" or "a web of intrigue" 9 网络 networkSocialInteraction+network communicate with and within a group; "You have to network if you want to get a good job" 10 网络 netPursuing+net,nett catch with a net; "net a fish" 11 网络 netMaking+web,net construct or form a web, as if by weaving 12 网络 netSubjectiveAssessment Attribute +final,last,net conclusive in a process or progression; "the final answer"; "a last resort"; "the net result" 13 网络 netCurrencyMeasure+net,nett remaining after all deductions; "net profit"

22 22 Conclusion Evaluation of COCA repeated on an English- Chinese bilingual Term bank with more than 130K entries show that the algorithm is “42%” improved in accuracy compared to MRCOCA (Our Previous Works) The three features and the new algorithm based on probability made the improvement

23 23 Term bank can help to quickly construct domain core ontology by selecting the concept nodes and relations used in domain Bilingual term bank can further introduce the second language realization of the core ontology effectively and automatically

24 24 Future Works Evaluation on three features how effective they are how much they contribute to the final performance Consideration of more features such as abbreviation, synset of head word of core term and etc. Use of other resources

25 25 Q&A

26 26 Q A


Download ppt "1 Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying Department of Computing The Hong Kong Polytechnic University Chinese Core Ontology Construction from a Bilingual."

Similar presentations


Ads by Google