Download presentation
Presentation is loading. Please wait.
Published byAlan Armstrong Modified over 9 years ago
1
BILINGUAL CO-TRAINING FOR MONOLINGUAL HYPONYMY-RELATION ACQUISITION Jong-Hoon Oh, Kiyotaka Uchimoto, Kentaro Torisawa ACL 2009
2
Outline Goal Motivation Co-Training Concept Task Co-Training Algorithm System Architecture Experiment Conclusion
3
Goal Bilingual Co-training to improve monolingual semantic knowledge. Hyponym ~ word whose semantic knowledge is contained in another word (is-a relationship). Prior work hacking Wikipedia to acquire hyponymy relationship (2008, Sumida, Torisawa; Hacking Wikipedia for Hyponymy Relationship Acquisition). Need to use a data set that has been manually labeled.
4
Motivation Developing high-level NLP applications requires accurate semantic knowledge. The relation can be seen as a classification task of semantic relationship. This technique can inexpensively increase our semantic knowledge base. Learning settings vary from language to language and the reliable data in one set may be unreliable in another set, so we can fix the unreliable data with the reliable data.
5
Bilingual Co-Training Concept
6
Task Hyponym-relation acquisition from Wikipedia. In Sumida 2008, original approach was recognize relation between words like 酵素 and 加水分解酵 素, but this cannot account for the English version enzyme and hydrolase because they do not share a sub-string. Solution: borrow this data from Japanese and add it to English. Continually swap back and forth to increase data.
7
Should we use Machine Translation? NO! Since we are only dealing with nouns we can simply use dictionary look up. Consistent results were achieved without Machine Translation.
8
Co-Training 1/5 S and T are two different languages. CL is binary classification result {yes, no}. X = X S U X T set of instances in languages S and T to be classified. Classifier c assigns class label cl in CL and a confidence value r in R + for the assignment of the label ~ c(x) = (x, cl, r). Use Support Vector Machine (SVM), the distance between a sample and the hyperplane determined by the SVMS are used as the confidence value r.
9
Co-training 2/5 L is subset of Cartesian Product of X and CL. Classifier c trained by training data L, then c = LEARN(L). S and T are manually prepared by LS and LT. Bilingual dictionary D BI is translation pair of instances in X S and X T. DBI = {(s,t)} subset of cartesian product of X S and X T. (s=(enzyme, hydrolase), t=( 酵素, 加水分解酵素 ))
10
Co-training 3/5 c 0 S and c 0 T are leaned with manually labeled instances L S and L T. c i S c i T are applied to classify instances in X S and S T. CR i S set of classification result of c i S on instances X S that is not in L i S and is registered in D BI. Select from CR i S newly labeled instances to be added to a new training set T. TopN(CR i S ) is a set of ciS(x), whose r S is top-N highest in CR i S. c i S acts as teacher and c i T as student.
11
Co-training 4/5 The teacher instructs the student in the class label x T, which is a translation of x S through D BI, through cl S only if he has a certain level of confidence, rS > threshold, and r T < theta or cl S = cl T (avoid possibility that student has confidence but disagrees with the teacher). Then roles are reversed. Co-training is based on different features of the same instances, and this case they are divided by languages.
12
Co-Training 5/5
13
System Architecture 1/4
14
System Architecture - Candidate Extraction 2/4 Every English and Japanese Article constructed as: Item: Subsection: List items (Tiger, Siberian Tiger) is a Hyponym relation
15
System Architecture - Hyponymy- Relation Classification 3/4 hyper is a hyponym candidate. hypo is hyper’s candidate. (hyper, hypo) is the hyponym-relation candidate.
16
System Architecture - Bilingual Instance Dictionary Construction 4/4 Multi-lingual wikipedia articles are linked by cross- language links. English and Japanese articles are extracted and their titles are regarded as translation pairs. Use these pairs to build a dictionary.
17
Experiment 1/3 May 2008 English Wikipedia and June 2008 Japanese Wikipedia. Use 24,000 randomly selected hyponym candidates. 8,000 relations were found in the manually checked data for both languages. Use TinySVM ~ 100 iterations, Threshold = 1, and TopN = 900.
18
Experiment 2/3 Three experiments to show effects of bilingual co- training, training data size, and bilingual instance dictionary. SYT = Sumida 2008, INIT = based on initial classifier with new training data, TRAN = based on the classifier, BICO = Bilingual co-training.
19
Experiment 3/3 Can we always improve performance through bilingual co-training with one strong and one weak classifier? Use training data (20,000) for a strong classifier, and for the other language various weak classifiers (1,000; 5,000; 10,000; 15,000)
20
Conclusion BICO showed 3.6% – 10. 9% improvement in F1. Can help reduce the cost of preparing new training data in other languages. Can be useful for any weak set if a strong set exists.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.