Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supporting e-learning with automatic glossary extraction Experiments with Portuguese Rosa Del Gaudio, António Branco RANLP, Borovets 2007.

Similar presentations


Presentation on theme: "Supporting e-learning with automatic glossary extraction Experiments with Portuguese Rosa Del Gaudio, António Branco RANLP, Borovets 2007."— Presentation transcript:

1 Supporting e-learning with automatic glossary extraction Experiments with Portuguese Rosa Del Gaudio, António Branco RANLP, Borovets 2007

2 Presentation Plan ● LT4eL project ● ILIAS ● Corpus ● Tool ● Grammars ● Copula ● Other Verbs ● Punctuation ● Results ● Conclusion

3 LT4eL ● Improve retrieval and accessibility of LO in learning management systems ● Employ language technology resources and tools for the semi-automatic generation of descriptive metadata. ● Develop new functionalities such as a key word extractor and a glossary candidate detector, semantic search, tuned for the various languages addressed in the project (Bulgarian, Czech, Dutch, English, German, Maltese, Polish, Portuguese, Romanian).

4 ILIAS

5 Objective ● Build a Glossary in an automatic way to support e- learning process. In practice this means to extract a definition from unstructured text (scientific papers, enciclopedia, web pages) ● Better access to information for student ● Accelerate the work of the tutor

6 ILIAS: Glossary Candidate Detector

7 The Corpus 274.000 tokens Tutorials PhD Thesis Scientific papers 3 Domains evenly represented e-learning Technology for non experts Calimera

8 XML format Intranet é uma rede desenvolvida para processamento de informações em uma empresa ou organização.

9 LxTransduce Input: simple text or xml Regular expressions Substitution and markup Output the same file with changes Match tree using elements Quick Unicode friendly freeware Easy to integrate in other tools (java)

10 Rules in lxtransduce

11 First development phase ● Less than 50% of the corpus ● Focus on the verb ● Precision: manually marked/all automatic ● Recall: correct automatic/manually marked ● F2 :3*(precision*recall)/2*precision+recall 0.220.200.31Gr 01 0.260.440.14Gr 00 F2RecallPrecision

12 Second developing phase 75% of the corpus for developing 25% of the corpus for testing Specific grammar/rules for each type

13 Copula baseline grammar Verb “to be” third person singular or plural present indicative

14 Copula base result Sentence level results Problem with precision

15 Copula Grammar

16 Rules for is_type <query match="tok[@ctag = ’V’ and @base=’ser’ and (@msd[starts-with(.,’fi-3’ )] or @msd[starts-with(.,’pi- 3’ )])]........

17 Confronting Results Include that patterns that were excluded Try to gather the syntactic pattern of non definition and confront with the syntactic pattern of definition.

18 Other_Verbs grammar Collect verbs in a lexicon Three different category: reflexive, active, passive. 22 different verbs ref pas

19 Results for verb_type Analyze each verbs separately as with is_type Richer syntactic patterns

20 Punctuation Grammar ● Preliminary work ● Definition introduced by colon mark (most frequent)

21 All-in-one Combination of the previous grammars The type is not take into account to calculate precision and recall

22 Conclusions and Future Work Overall results: Recall 86%, Precision 14% Difference among domains: the style of a document influence the result. Improve the rules for verb_type and punc_type Combining with other techniques such as ML


Download ppt "Supporting e-learning with automatic glossary extraction Experiments with Portuguese Rosa Del Gaudio, António Branco RANLP, Borovets 2007."

Similar presentations


Ads by Google