Download presentation
Presentation is loading. Please wait.
1
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics Saarland University
2
Comments/Thoughts Useful approach, as it can potentially speed up and support annotation and thus making new FrameNets. Uses only few resources, therefore extendable to other language pairs (in principle). First experiments ‘only’.
3
Multilingual FrameNets Having FrameNet for as many languages as possible would be nice. There are numerous monolingual and cross- lingual applications. BUT: Building ‘a FrameNet’ is knowledge and labour intensive work, and thus expensive, funding may be a problem.
4
Bootstrapping Multilingual FNs (Re-) Use as much knowledge from existing FrameNets as possible. Ease the task of annotators by making useful suggestions. Use automatic methods for knowledge acquisition. LSA Swamp of Language
5
More than one strand of hair may be needed… By the way: Change_hair_configuration is not yet in FN.
6
FR.FrameNet In FR.FrameNet, several methods have been explored that could reduce time and costs of building new FrameNets. Tasks explored: Lexical Unit (Frame Evoking Element) transfer Identify Frame Elements Disambiguating LU-Frame Assignment
7
Lexical Unit Transfer Can be seen as the task of finding and disambiguating translation pairs (links to Machine Translation, lexicography). Extract disambiguated translations from existing ‘cluster-based’ dictionary. Some manual annotation required, but relatively fast and simple way of acquiring a solid core lexicon.
8
Manual Filtering Is frame information currently used for disambiguation? How is the manual annotation done? Sounds like rules of thumb. Guidelines? How is it evaluated?
9
Resources needed Lexical unit transfer English FrameNet Large coverage bi-lingual dictionary (source►target language, optimally sense-disambiguated) Corpus in target language (Some) manual annotation (Read: OK, may be problem for ‘small’ languages, may be problem for small projects)
10
Lexical Unit Transfer: Other Possibilities Using ‘human readable’ resources Use existing dictionaries Problem: Disambiguation Using machine readable resources Use Euro WordNet or similar Problem again: Disambiguation Use parallel corpora Padó&Lapata, AAAI-05
11
Identify Frame Elements Core idea: The same semantic restrictions/preferences should apply to Frame Elements in source and target language. How can these semantic preferences be learned? First step: Learn cross-lingual semantic similarity Second step: Identify Frame Elements in one language and transfer.
12
Bilingual Infomap/Latent Semantic Analysis (LSA) Originally used for crosslingual information retrieval. Use bilingual, parallel ‘core’ corpus. Parallel documents/paragraphs/… are put together and count as one text. Build vector space. Monolingual and cross-lingual similarities will ‘fall out’.
13
Identify and transfer Frame Elements Use Berkeley FrameNet corpus as training corpus (English): Frame Elements (content words+POS) from annotated examples are used as starting point. Use semantic space (generated by LSA) to find good (hopefully semantically related) translation candidates for words making up Frame Element. To identify French Frame Element: Find ‘closest’ vector. Several good examples, some less good ones.
14
Add Clustering Inspection of data shows: Frame Elements may have semantically different fillers. Thus, clustering of LSA vectors seems promising. Identifying French Frame Elements: Instead of finding closest vector, check whether word vector belong to one of the clusters. Problems: Identify optimal number of clusters, sparse data, …
15
Resources Needed Frame Element identification/transfer English FrameNet Parallel corpus source/target language Additional corpora in both languages Corpus in target language (Tagger in source/target language ) (Not so little) manual annotation (Read: OK, may be problem for ‘small’ languages, may be problem for small projects)
16
Use information from WordNet? For French: Use (Euro) WordNet alternatively/in addition: Use Euro WordNet links (translations) Use WordNet to expand ‘queries’ Use similarity measures such as Jiang&Conrath 97. For other languages that do not have WordNet: ???
17
Syntax Certain Frame Elements are semantically totally heterogeneous, but syntactically (relatively) easy to identify For example: Statement.Message (engl.: say that X, fr.: dire que X) Problem: Semantic transfer can be learned using LSA, syntactic transfer (that≈que) cannot. Could (partially) parsed parallel corpora be used to learn syntactic transfer? Can ‘syntactic’ and ‘semantic’ Frame Element identification be combined? Alternatively: Can ‘syntactic’ Frame Elements be recognised and left to annotators altogether?
18
Frame Element Preferences Knowing more about Frame Elements (explicitly) would be very helpful. Automatic Frame/Frame Element assignment. Manual annotation/guidelines. Transfer to other languages. Encoding preferences as links within FrameNet Encoding preferences as links with external resources (WordNet? SUMO/MILO?), cf. work by Aljoscha Burchardt Cf. yesterday’s talk by Michael Ellsworth
19
Conclusions (Some) more research required. Optimising the annotation process probably very important, e.g.: Use several cycles (start with ‘more certain’ cases, re- train with the additional data, …) Integrate different strategies, e.g. ‘syntax’ and ‘semantics’. Which decisions can be made automatically? Can suggestions be made? How good are they? Recall vs. precision optimisations
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.