© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on Universal Knowledge and Language (ICUKL2002), Goa, November 2002 Christian BOITET GETA, CLIPS, IMAG, Grenoble
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 2 Which problems? What Igor said "remains to be done" 1.representation of multi-word concepts (« long UWs »); 2.elliptical expressions; 3.treatment of arguments both in the UW dictionary and in the UNL expressions and 1.conventions about attributes 2.XML formats for UNL documents
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 3 Representation of multi-word concepts (long UWs) — 1 Problematic examples of "UNKNOWN LONG UWs" "Institute of Advanced studies (UNU/IAS)"(icl>…) "East-Asia cooperation office" East-Asia cooperation office east-asia cooperation office(icl>…) "Tokyo University" "University of Kyoto" "World Bank(icl>…)"
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 4 Representation of long UWs — 2 What are the problems? 1.No hope of including all these long UWs in our UNL-LLL dictionaries because of potentially immense, unbounded number of such UWs Maybe never more than 5%, 10% of them in open domains 2.Necessity to include an analyzer of English compounds in order to translate "unknown long UWs" piece by piece. but such compounds are extremely ambiguous
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 5 Let us think a bit more Proper nouns CAN be decomposed. This is NOT to say that their translation is always compositional. Compositional: World Bank ==> Banque du Mondefalse Idiomatic: World Bank ==> Banque mondialecorrect So that we should have a solution allowing BOTH Compositional deconversion if the long UW is unknown Idiomatic deconversion after it put in the UNL-LLL dictionary
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 6 Proposal of a solution Origin Proposed by H.Uchida at a meeting in Tokyo (1999?) Not yet included but still needed and still the best Principle Headword encodes a UNL representation of the compound Possible syntax entity) "(mod(bank(icl> entity) … or a better one!
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 7 How to deconvert Case 1: on) is not in the UNL-FR dictionary ==> French deconverter "unwraps" into a scope of the UNL-graph
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 8 Another example Compositional deconversion Université de Tokyo University of Tokyo Universität von Tokyo Tokyo no daigaku (or Tokyo ni daigaku) Idiomatic deconversion Université de Tokyo (or Todai!) Tokyo University / University of Tokyo Universität Tokyo Tokyo daigaku / Todai
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 9 Elliptical expressions Example Do you prefer the first or the second solution? I prefer the first. Je préfère le premier? Je préfère la première? ==> A bad deconversion will be very misleading. Possible solution Encode the elided element and on That is equivalent to "preedit" the input text I prefer the first solution. …and in the spirit of the new idea by H.Uchida of preediting for semantic relations
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 10 Treatment of arguments in the UW dictionary in the UNL expressions See talk by I.Bogulslavskij The solution proposed entails 1.a very small change in the UNL syntax Allow on arcs hence also on restrictions by 2.a discipline in the UW creation all arguments should appear as restrictions
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 11 "Argument-full" + "readable" UW Argument-full Readable look(icl>do, for something Even more readable look for something
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 12 Continuing that list… look for something look at something or look at something look like something look like something might also cover "look as" in "he looks as a good man" or look as looks as if… for something
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 13 Attributes The problem ==> un lion, les lions, lions? We don't know whether definiteness has been computed ==> it ==> use or not ==> it is UNKNOWN ==> compute default Solution: for every attribute XXXX, for +XXXX (1 or for -XXXX (0 or false) nothingfor XXXX unknown (? or undefined)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 14 XML formats for UNL documents A minimal UNL-xml format strictly equivalent of UNL-htmlr –proposed & used by Tsai W.J. for the SWIIVRE-UNL web site & his Ph.D. Methodology for defining and using other, more detailed UNL-xml-xyz formats: –xyz is an application (e.g. a graphical editor, or statistics- gathering tool, etc.), –Automatic parsing of the basic UNL-xml format introduces new tags, –An object document model (DOM) suitable for application xyz can then be defined and used.