Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.

Similar presentations


Presentation on theme: "Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics."— Presentation transcript:

1 Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics Saarland University

2 Comments/Thoughts  Useful approach, as it can potentially speed up and support annotation and thus making new FrameNets.  Uses only few resources, therefore extendable to other language pairs (in principle).  First experiments ‘only’.

3 Multilingual FrameNets  Having FrameNet for as many languages as possible would be nice.  There are numerous monolingual and cross- lingual applications.  BUT: Building ‘a FrameNet’ is knowledge and labour intensive work, and thus expensive, funding may be a problem.

4 Bootstrapping Multilingual FNs  (Re-) Use as much knowledge from existing FrameNets as possible.  Ease the task of annotators by making useful suggestions.  Use automatic methods for knowledge acquisition. LSA Swamp of Language

5 More than one strand of hair may be needed…  By the way: Change_hair_configuration is not yet in FN.

6 FR.FrameNet  In FR.FrameNet, several methods have been explored that could reduce time and costs of building new FrameNets.  Tasks explored:  Lexical Unit (Frame Evoking Element) transfer  Identify Frame Elements  Disambiguating LU-Frame Assignment

7 Lexical Unit Transfer  Can be seen as the task of finding and disambiguating translation pairs (links to Machine Translation, lexicography).  Extract disambiguated translations from existing ‘cluster-based’ dictionary.  Some manual annotation required, but relatively fast and simple way of acquiring a solid core lexicon.

8 Manual Filtering  Is frame information currently used for disambiguation?  How is the manual annotation done? Sounds like rules of thumb. Guidelines?  How is it evaluated?

9 Resources needed  Lexical unit transfer  English FrameNet  Large coverage bi-lingual dictionary (source►target language, optimally sense-disambiguated)   Corpus in target language  (Some) manual annotation (Read: OK,  may be problem for ‘small’ languages,  may be problem for small projects)

10 Lexical Unit Transfer: Other Possibilities  Using ‘human readable’ resources  Use existing dictionaries  Problem: Disambiguation  Using machine readable resources  Use Euro WordNet or similar  Problem again: Disambiguation  Use parallel corpora  Padó&Lapata, AAAI-05

11 Identify Frame Elements  Core idea: The same semantic restrictions/preferences should apply to Frame Elements in source and target language.  How can these semantic preferences be learned?  First step: Learn cross-lingual semantic similarity  Second step: Identify Frame Elements in one language and transfer.

12 Bilingual Infomap/Latent Semantic Analysis (LSA)  Originally used for crosslingual information retrieval.  Use bilingual, parallel ‘core’ corpus.  Parallel documents/paragraphs/… are put together and count as one text.  Build vector space.  Monolingual and cross-lingual similarities will ‘fall out’.

13 Identify and transfer Frame Elements  Use Berkeley FrameNet corpus as training corpus (English): Frame Elements (content words+POS) from annotated examples are used as starting point.  Use semantic space (generated by LSA) to find good (hopefully semantically related) translation candidates for words making up Frame Element.  To identify French Frame Element: Find ‘closest’ vector.  Several good examples, some less good ones.

14 Add Clustering  Inspection of data shows: Frame Elements may have semantically different fillers.  Thus, clustering of LSA vectors seems promising.  Identifying French Frame Elements: Instead of finding closest vector, check whether word vector belong to one of the clusters.  Problems: Identify optimal number of clusters, sparse data, …

15 Resources Needed  Frame Element identification/transfer  English FrameNet  Parallel corpus source/target language   Additional corpora in both languages  Corpus in target language  (Tagger in source/target language  )  (Not so little) manual annotation  (Read: OK,  may be problem for ‘small’ languages,  may be problem for small projects)

16 Use information from WordNet?  For French:  Use (Euro) WordNet alternatively/in addition:  Use Euro WordNet links (translations)  Use WordNet to expand ‘queries’  Use similarity measures such as Jiang&Conrath 97.  For other languages that do not have WordNet: ???

17 Syntax  Certain Frame Elements are semantically totally heterogeneous, but syntactically (relatively) easy to identify  For example: Statement.Message (engl.: say that X, fr.: dire que X)  Problem: Semantic transfer can be learned using LSA, syntactic transfer (that≈que) cannot.  Could (partially) parsed parallel corpora be used to learn syntactic transfer? Can ‘syntactic’ and ‘semantic’ Frame Element identification be combined? Alternatively: Can ‘syntactic’ Frame Elements be recognised and left to annotators altogether?

18 Frame Element Preferences  Knowing more about Frame Elements (explicitly) would be very helpful.  Automatic Frame/Frame Element assignment.  Manual annotation/guidelines.  Transfer to other languages.  Encoding preferences as links within FrameNet  Encoding preferences as links with external resources (WordNet? SUMO/MILO?), cf. work by Aljoscha Burchardt  Cf. yesterday’s talk by Michael Ellsworth

19 Conclusions  (Some) more research required.  Optimising the annotation process probably very important, e.g.:  Use several cycles (start with ‘more certain’ cases, re- train with the additional data, …)  Integrate different strategies, e.g. ‘syntax’ and ‘semantics’.  Which decisions can be made automatically? Can suggestions be made? How good are they? Recall vs. precision optimisations


Download ppt "Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics."

Similar presentations


Ads by Google