…) "East-Asia cooperation office" East-Asia cooperation office east-asia cooperation office(icl>…) "Tokyo University" "University of Kyoto" "World Bank(icl>…)""> …) "East-Asia cooperation office" East-Asia cooperation office east-asia cooperation office(icl>…) "Tokyo University" "University of Kyoto" "World Bank(icl>…)"">
Download presentation
Presentation is loading. Please wait.
Published byBlake Watkins Modified over 9 years ago
1
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on Universal Knowledge and Language (ICUKL2002), Goa, 25-29 November 2002 Christian BOITET GETA, CLIPS, IMAG, Grenoble Christian.Boitet@imag.fr
2
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 2 Which problems? What Igor said "remains to be done" 1.representation of multi-word concepts (« long UWs »); 2.elliptical expressions; 3.treatment of arguments both in the UW dictionary and in the UNL expressions and 1.conventions about attributes 2.XML formats for UNL documents
3
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 3 Representation of multi-word concepts (long UWs) — 1 Problematic examples of "UNKNOWN LONG UWs" "Institute of Advanced studies (UNU/IAS)"(icl>…) "East-Asia cooperation office" East-Asia cooperation office east-asia cooperation office(icl>…) "Tokyo University" "University of Kyoto" "World Bank(icl>…)"
4
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 4 Representation of long UWs — 2 What are the problems? 1.No hope of including all these long UWs in our UNL-LLL dictionaries because of potentially immense, unbounded number of such UWs Maybe never more than 5%, 10% of them in open domains 2.Necessity to include an analyzer of English compounds in order to translate "unknown long UWs" piece by piece. but such compounds are extremely ambiguous
5
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 5 Let us think a bit more Proper nouns CAN be decomposed. This is NOT to say that their translation is always compositional. Compositional: World Bank ==> Banque du Mondefalse Idiomatic: World Bank ==> Banque mondialecorrect So that we should have a solution allowing BOTH Compositional deconversion if the long UW is unknown Idiomatic deconversion after it put in the UNL-LLL dictionary
6
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 6 Proposal of a solution Origin Proposed by H.Uchida at a meeting in Tokyo (1999?) Not yet included but still needed and still the best Principle Headword encodes a UNL representation of the compound Possible syntax "(mod(bank(icl>entity).@entry,world):01)"(icl> entity) "(mod(bank(icl> entity).@entry,world))"(icl> entity) … or a better one!
7
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 7 How to deconvert Case 1: "(mod(bank(icl>institution).@entry,world))"(icl>instituti on) is not in the UNL-FR dictionary ==> French deconverter "unwraps" mod(bank(icl>institution).@entry,world) into a scope of the UNL-graph
8
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 8 Another example «"(mod(university.@entry,Tokyo(icl>town)):01)"(icl>entity)»university.@entry,Tokyo(icl>town)):01)"(icl>entity)» Compositional deconversion Université de Tokyo University of Tokyo Universität von Tokyo Tokyo no daigaku (or Tokyo ni daigaku) Idiomatic deconversion Université de Tokyo (or Todai!) Tokyo University / University of Tokyo Universität Tokyo Tokyo daigaku / Todai
9
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 9 Elliptical expressions Example Do you prefer the first or the second solution? I prefer the first. Je préfère le premier? Je préfère la première? ==> A bad deconversion will be very misleading. Possible solution Encode the elided element and put.@eld on it..@eld That is equivalent to "preedit" the input text I prefer the first solution. …and in the spirit of the new idea by H.Uchida of preediting for semantic relations
10
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 10 Treatment of arguments in the UW dictionary in the UNL expressions See talk by I.Bogulslavskij The solution proposed entails 1.a very small change in the UNL syntax Allow attributes.@A,.@B,.@C,.@D on arcs hence also on restrictions by sem.rel..@A.@B.@C.@D 2.a discipline in the UW creation all arguments should appear as restrictions
11
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 11 "Argument-full" + "readable" UW Argument-full look(icl>do,agt.@A>person,obj.@B>thing); look(icl>do,agt.@A>person,gol.@B>thing); look(icl>do,agt.@A>person,dst.@B>thing); Readable look(icl>do, agt.@A>person, obj.@B>thing);look for something Even more readable look for(icl>do,agt.@A>person, obj.@B>thing);look for something
12
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 12 Continuing that list… look for(icl>do,agt.@A>person, obj.@B>thing);look for something look at(icl>do,agt.@A>person, plt.@B>thing);look at something or look at(icl>do,agt.@A>person, obj.@B>thing);look at something look like(icl>do,agt.@A>person, cmp.@B>thing);look like something look like(icl>do,agt.@A>person, obj.@B>thing);look like something might also cover "look as" in "he looks as a good man" or look as if(icl>do,agt.@A>person, obj.@B>thing);it looks as if… look(icl>do,agt.@A>person, obj.@B>thing);look for something
13
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 13 Attributes The problem lion(icl>mammal).@plur ==> un lion, les lions, lions? We don't know whether definiteness has been computed ==> it is.@undef ==> use it.@undef or not ==> it is UNKNOWN ==> compute default Solution: for every attribute XXXX, put.@XXXX.@XXXX for +XXXX (1 or true).@unXXXX.@unXXXX for -XXXX (0 or false) nothingfor XXXX unknown (? or undefined)
14
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 14 XML formats for UNL documents A minimal UNL-xml format strictly equivalent of UNL-htmlr –proposed & used by Tsai W.J. for the SWIIVRE-UNL web site & his Ph.D. Methodology for defining and using other, more detailed UNL-xml-xyz formats: –xyz is an application (e.g. a graphical editor, or statistics- gathering tool, etc.), –Automatic parsing of the basic UNL-xml format introduces new tags, –An object document model (DOM) suitable for application xyz can then be defined and used.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.