Download presentation
Presentation is loading. Please wait.
1
Semi-automatic glossary creation from learning objects Eline Westerhout & Paola Monachesi
2
Overview The LT4eL project Detecting definitions –eLearning domain –Types of definitory contexts –Grammar approach –Machine learning approach Qualitative evaluation Conclusions Future work
3
LT4eL - Language Technology for eLearning Start date: 1 December 2005 Duration: 30 months Partners: 12 Languages: 8 WWW: www.lt4el.eu
4
Tasks Creation of an archive of learning objects Semi-automatic metadata generation driven by NLP tools: –Keyword extractor Support the development of glossaries –Definition extractor Enhancing eLearning with semantic knowledge: ontologies Integration of functionalities in LMS Validation of new functionalities in LMS
6
Extraction of definitions within eLearning Definition extraction: –question answering –building dictionaries from text –ontology learning Challenges within our project: –corpus –size of LOs
7
Types is_def: ‘Gnuplot is a program for drawing graphs’ verb_def: ‘eLearning comprises resources and applications that are available via the internet and provide creative possibilities to improve the learning experience’ punct_def ‘Passes: plastic cards equipped with a magnetic strip, that [...] gets access to certain facilities. ’ pron_def ‘Dedicated readers. These are special devices, developed with the exclusive goal to make it possible to read e-books.’
8
Grammar approach General Example Results
9
Identification of definitory contexts Make use of the linguistic annotation of LOs (part-of- speech tags) Domain: computing Use of language specific grammars Workflow –Searching and marking definitory contexts in LOs (manually) –Drafting local grammars on the basis of these examples –Apply the grammars to new LOs
10
Een vette letter is een letter die zwarter wordt afgedrukt dan de andere letters.
11
Een vette letter is een letter die zwarter wordt afgedrukt dan de andere letters.
12
Results (grammar)
13
Machine learning General Features & Configurations Results
14
General information Naive Bayes classifier Weka Data set: –is-definitions: 77 / 274 –punct-definitions: 45 / 454 10-fold cross validation
15
Features Text properties: bag-of-words, bigrams, and bigram preceding the definition Syntactic properties: type of determiner within the defined term (definite, indefinite, no determiner) Proper nouns: presence of a proper noun in the defined term cf. Fahmi & Bouma, 2006
16
Configurations
17
Results – is_def (final)
18
Results – punct_def (final)
19
Final results precision: + (50 % and 40 %) recall: - (20 % and 30 %) F-score: + (30 % and 25 %)
20
Qualitative evaluation Scenario based evaluation, 1 st cycle Tested tutors (3) and students (6) Results: –usefulness glossary: useful (tutors: 2/3, students: 7/7) –usefulness search method 'Definitions': useful (students: 7/7) –performance: different (tutors: 1 ok, 1 not ok, 1 neutral)
22
Conclusions Recall: ok with pattern based grammar Precision: can be improved with machine learning approach Trade-off between recall and precision –Only PB: good recall, bad precision –PB & ML: better precision, lower recall For our purpose: recall might be more important than precision Size of objects also important
23
Future work Machine learning: try different features use other classifiers extend to all types of definitions Qualitative evaluation 2 nd cycle scenario based evaluation => focus more on preference user
24
Results – is_def (ML)
25
Results – punct_def (ML)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.