Download presentation
Presentation is loading. Please wait.
Published byMelanie Kristin Stevenson Modified over 9 years ago
1
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 1 LREC-2002, Las Palmas, May 2002 Mathieur Lafourcade & Christian Boitet LIRMM, Montpellier GETA, CLIPS, IMAG, Grenoble Christian.Boitet@imag.fr http://www-clips.imag.fr/getahttp://www-clips.imag.fr/geta Mathieu.Lafourcade@lirmm.frMathieu.Lafourcade@lirmm.fr http://www.lirmm.fr/~lafourca UNL Lexical Selection with Conceptual Vectors
2
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 2 Outline The problem: disambiguation in UNL-French deconversion Finding the known UW nearest to an unknown UW Finding the best French lemma for a given UW Conceptual vectors Nature & example on French (873 dimensions) Building (Dec. 201: 64,000 terms, 210,000 CVs) CVD (CV Disambiguation) running for French Recooking the vectors attached to a document tree Placing each recooked vector in the word sense tree Using CVD in UNL-French deconversion: ongoing
3
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 3 The UNL-FR deconversion process UNL-FRA Graph (UW) UNL-L1 Graph “ UNL Tree ” GMA structure UMA structure UMC structure French utterance Validation & Localization Graph to tree conversion Structural transfer Paraphrase choice Morphological generation Syntactic generation Lexical Transfer Conceptual vectors computations UNL-FRA Graph (French LU)
4
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 4 The problem: disambiguation in UNL-French deconversion Find the known UW nearest to an unknown UW known UWs:obj(open(icl>occur),door) (in KB context)a door opens obj(open(icl>do),door) one opens a door input graph:obj(open(icl>occur,ins>concrete thing),door) ins(open(icl>occur,ins>concrete thing),key…) a key opens a door / a door opens with a key ==> choose nearest open(icl>occur) for correct result Find best French lemma for a UW in a given context meeting(icl>event) ==> réunion [ACTION, DURATION…] rencontre [EVENT, MOMENT…]
5
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 5 How to solve them? 1. unknown UW best known UW 1.Accessing KB in real time impractical (web server) 2.KB not enough: still many possible candidates 2. known UW best LU 1.Often no clear symbolic conditions for selection 2.Possibility to transform UNL LUfr dictionary into a kind of neural net (cf. MSR MindNet) 3. a possible unifying solution: Lexical selection through DCV, Disambiguation using Conceptual Vectors which works quite well for French on large scale experiments
6
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 6 Conceptual vectors CV = vector in concept space (4th level in Larousse) V(to tidy up) = CHANGE [0.84], VARIATION [0.83], EVOLUTION [0.82], ORDER [0.77], SITUATION [0.76], STRUCTURE [0.76], RANK [0.76] … V(to cut) = GAME [0.8], LIQUID [0.8], CROSS [0.79], PART [0.78] MIXTURE [0.78], FRACTION [0.75], TORTURE [0.75] WOUND [0.75], DRINK [0.74] … Global vector of a term = normalized sum of the CVs of its meanings/senses V(head) = HEAD [0.83],. BEGINNING [0.75], ANTERIORITY [0.74], PERSON [0.74] INTELLIGENCE [0.68], HIERARCHY [0.65], …
7
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 7 Conceptual vectors and sense space Conceptual vector model Reminiscent of Vector Models (Salton and all.) & Sowa Applied on preselected concepts (not terms) Concepts are not independent Set of k basic concepts Thesaurus Larousse = 873 concepts (translation of Roget’s) A vector = a 873 uple of reals in [0..1] Encoding for each dimension C = 2 15 : [0..32767] Sense space = vector space + vector set
8
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 8 Thematic relatedness Conceptual vector distance Angular Distance D A (x, y) = angle (x, y) 0 <= D A (x, y) <= Interpretation if D A (x, y) = 0 x // y (colinear):same idea if D A (x, y) = /2 x y (orthogonal):nothing in common if D A (x, y) = D A (x, y) = D A (x, -x):-x anti-idea of x x’ x y
9
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 9 Collection process Start from a few handcrafted term/meanings/vectors //running constantly on Lafourcade’s Mac <choose a word at random (with or without a CV) find NL definitions of its senses (mainly on the Web) for each sense definition SD analyze SD into linguistic tree TreeDef attach existing or null CVs to lexical nodes of TreeDef iterate propagation of CVs in TreeDef (ling. rules used here) until CV(root) converges or limit of cycle numbers is reached CV(sense) CV(root(TreeDef)) use vector distance to arrange the CVs of senses into a binary « discrimination tree »
10
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 10 An example discrimination tree
11
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 11 Status on French CVs By Dec. 2001 64,000 terms 210,000 CVs Average of 3.3 senses/term Method robot to access web lexicon servers large coverage French analyzer by J.Chauché in Sigmart See more details on http://www.lirmm.fr/~lafourca
12
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 12 Disambiguation in French Recook the vectors attached to a document tree –Take a document –Analyze it with Sigmart analyzer into ONE possibly big tree (30 pages OK as a unit) –Use the same process as for processing definitions –Final CV(root) usable as thematic classifier of document –Final CV (lexemes) used as « sense in context » Place each recooked vector in the discrimination tree –Walk down the discrimination tree, using vector distance –Stop at nearest node: If leave node, full disambiguation (relative to available sense set) If internal node, partial disambigation (subset of senses)
13
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 13 Example with some ambiguities The white ants strike rapidly the trusses of the roof
14
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 14 Initialize: attach CVs to lexemes The white ants strike rapidly the trusses of the roof
15
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 15 Up / Down propagation of the CVs
16
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 16 Result: sense selection The white ants strike rapidly the trusses of the roof
17
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 17 Disambiguation in UNL-French deconversion Our set-up Example input UNL-graph Outline of the process Two usages of DCV (disambiguation with CV) Finding the known UW nearest to an unknown UW Finding the best French lemma for a given UW
18
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 18 A UNL input graph Ronaldo has headed the ball into the left corner of the goal”
19
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 19 Corresponding UNL-tree with CVs attached: localization DCV 1- Ronaldo: agt corner: plt left: mod 1- goal(icl>thing): obj score(icl>event,agt>human,fld>sport).@entry.@past.@complete 1- goal(icl>thing): obj V thing (goal) V(human) V place (corner) V(left) V = V event (score) + V human (score) + V sport (score) 2- Ronaldo: pos V(human) V body (head) head(pof>body): ins
20
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 20 Result of first step: the « best » UWs The vector contextualization generalizes both kinds of localization (lexical and cultural). On each node, the selected UW is the one in the UNL-French database which vector is the closest to the contextualized vector. Formulas used for up and dow propagation:
21
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 21 Second step: select the « best » LUs Depending on the strategy of the generator, a lexical unit (LU) may be a lemma a whole derivational family (pay, payment, payable…) Dictionay: { } Input: Output: LU i with nearest CV i
22
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 22 Conclusion Another case of fruitful integration of symbolic & numerical methods Further work planned integration into running UNL-FR server work on feed-back (Pr SU’s line of thought) if user corrects the choice of LU for chosen UW or worse, if user chooses a LU corresponding to another UW! ==> then recompute vectors by giving more weight to chosen CVs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.