Download presentation
Presentation is loading. Please wait.
1
Supporting e-learning with automatic glossary extraction Experiments with Portuguese Rosa Del Gaudio, António Branco RANLP, Borovets 2007
2
Presentation Plan ● LT4eL project ● ILIAS ● Corpus ● Tool ● Grammars ● Copula ● Other Verbs ● Punctuation ● Results ● Conclusion
3
LT4eL ● Improve retrieval and accessibility of LO in learning management systems ● Employ language technology resources and tools for the semi-automatic generation of descriptive metadata. ● Develop new functionalities such as a key word extractor and a glossary candidate detector, semantic search, tuned for the various languages addressed in the project (Bulgarian, Czech, Dutch, English, German, Maltese, Polish, Portuguese, Romanian).
4
ILIAS
5
Objective ● Build a Glossary in an automatic way to support e- learning process. In practice this means to extract a definition from unstructured text (scientific papers, enciclopedia, web pages) ● Better access to information for student ● Accelerate the work of the tutor
6
ILIAS: Glossary Candidate Detector
7
The Corpus 274.000 tokens Tutorials PhD Thesis Scientific papers 3 Domains evenly represented e-learning Technology for non experts Calimera
8
XML format Intranet é uma rede desenvolvida para processamento de informações em uma empresa ou organização.
9
LxTransduce Input: simple text or xml Regular expressions Substitution and markup Output the same file with changes Match tree using elements Quick Unicode friendly freeware Easy to integrate in other tools (java)
10
Rules in lxtransduce
11
First development phase ● Less than 50% of the corpus ● Focus on the verb ● Precision: manually marked/all automatic ● Recall: correct automatic/manually marked ● F2 :3*(precision*recall)/2*precision+recall 0.220.200.31Gr 01 0.260.440.14Gr 00 F2RecallPrecision
12
Second developing phase 75% of the corpus for developing 25% of the corpus for testing Specific grammar/rules for each type
13
Copula baseline grammar Verb “to be” third person singular or plural present indicative
14
Copula base result Sentence level results Problem with precision
15
Copula Grammar
16
Rules for is_type <query match="tok[@ctag = ’V’ and @base=’ser’ and (@msd[starts-with(.,’fi-3’ )] or @msd[starts-with(.,’pi- 3’ )])]........
17
Confronting Results Include that patterns that were excluded Try to gather the syntactic pattern of non definition and confront with the syntactic pattern of definition.
18
Other_Verbs grammar Collect verbs in a lexicon Three different category: reflexive, active, passive. 22 different verbs ref pas
19
Results for verb_type Analyze each verbs separately as with is_type Richer syntactic patterns
20
Punctuation Grammar ● Preliminary work ● Definition introduced by colon mark (most frequent)
21
All-in-one Combination of the previous grammars The type is not take into account to calculate precision and recall
22
Conclusions and Future Work Overall results: Recall 86%, Precision 14% Difference among domains: the style of a document influence the result. Improve the rules for verb_type and punc_type Combining with other techniques such as ML
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.