Semi-automatic glossary creation from learning objects Eline Westerhout & Paola Monachesi.

Slides:



Advertisements
Similar presentations
Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
Advertisements

LT4EL - Integrating Language Technology and Semantic Web techniques in eLearning Lothar Lemnitzer GLDV AK eLearning, 11. September 2007.
Using a domain-ontology and semantic search in an eLearning environment Lothar Lemnitzer, Kiril Simov, Petya Osenova, Eelco Mossel and Paola Monachesi.
WP 4: Integration of Language Technology Tools into ILIAS Learning Management System Alexander Killing Project review, Utrecht, 1 Feb 2007.
© NCSR, Paris, December 5-6, 2002 WP1: Plan for the remainder (1) Ontology Ontology  Enrich the lexicons for the 1 st domain based on partners remarks.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Supporting e-learning with automatic glossary extraction Experiments with Portuguese Rosa Del Gaudio, António Branco RANLP, Borovets 2007.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Crosslingual Retrieval in an eLearning Environment Cristina Vertan, Kiril Simov, Petya Osenova, Lothar Lemnitzer, Alex Killing, Diane Evans, Paola Monachesi.
WP 2: Semi-automatic metadata generation driven by Language Technology Resources Lothar Lemnitzer Project review, Utrecht, 1 Feb 2007.
Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007.
LTeL - Language Technology for eLearning -
LTeL - Language Technology for eLearning - Paola Monachesi, Lothar Lemnitzer, Kiril Simov, Alex Killing, Diane Evans, Cristina Vertan.
Integration of Information Extraction with an Ontology M. Vargas-Vera, J.Domingue, Y.Kalfoglou, E. Motta and S. Buckingham Sum.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Final Review 31 October WP2: Named Entity Recognition and Classification Claire Grover University of Edinburgh.
NERIL: Named Entity Recognition for Indian FIRE 2013.
February 2007MCST - FP7 Launch1 Michael Rosner Department of Computer Science and Artificial Intelligence University of Malta.
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
WP5: Validation Anne De Roeck Diane Evans The Open University, UK.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Natural Language Programming David Vadas The University of Sydney Supervisor: James Curran.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
MedKAT Medical Knowledge Analysis Tool December 2009.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
JISC / CETIS eLearning Conference. Metadata Quality Expose Archive Federation Search Deliver Destroy Harvest Embedded Metadata [Metadata in, e.g. content.
Evaluating NLP Features for Automatic Prediction of Language Impairment Using Child Speech Transcripts Khairun-nisa Hassanali 1, Yang Liu 1 and Thamar.
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
Institute of Informatics & Telecommunications NCSR “Demokritos” Spidering Tool, Corpus collection Vangelis Karkaletsis, Kostas Stamatakis, Dimitra Farmakiotou.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Language Identification and Part-of-Speech Tagging
 Corpus Formation [CFT]  Web Pages Annotation [Web Annotator]  Web sites detection [NEACrawler]  Web pages collection [NEAC]  IE Remote.
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Presented by: Hassan Sayyadi
Supervised Machine Learning
Social Knowledge Mining
Extracting Semantic Concept Relations
CSE 635 Multimedia Information Retrieval
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Semi-automatic glossary creation from learning objects Eline Westerhout & Paola Monachesi

Overview The LT4eL project Detecting definitions –eLearning domain –Types of definitory contexts –Grammar approach –Machine learning approach Qualitative evaluation Conclusions Future work

LT4eL - Language Technology for eLearning Start date: 1 December 2005 Duration: 30 months Partners: 12 Languages: 8 WWW:

Tasks Creation of an archive of learning objects Semi-automatic metadata generation driven by NLP tools: –Keyword extractor Support the development of glossaries –Definition extractor Enhancing eLearning with semantic knowledge: ontologies Integration of functionalities in LMS Validation of new functionalities in LMS

Extraction of definitions within eLearning Definition extraction: –question answering –building dictionaries from text –ontology learning Challenges within our project: –corpus –size of LOs

Types is_def: ‘Gnuplot is a program for drawing graphs’ verb_def: ‘eLearning comprises resources and applications that are available via the internet and provide creative possibilities to improve the learning experience’ punct_def ‘Passes: plastic cards equipped with a magnetic strip, that [...] gets access to certain facilities. ’ pron_def ‘Dedicated readers. These are special devices, developed with the exclusive goal to make it possible to read e-books.’

Grammar approach General Example Results

Identification of definitory contexts Make use of the linguistic annotation of LOs (part-of- speech tags) Domain: computing Use of language specific grammars Workflow –Searching and marking definitory contexts in LOs (manually) –Drafting local grammars on the basis of these examples –Apply the grammars to new LOs

Een vette letter is een letter die zwarter wordt afgedrukt dan de andere letters.

Een vette letter is een letter die zwarter wordt afgedrukt dan de andere letters.

Results (grammar)

Machine learning General Features & Configurations Results

General information Naive Bayes classifier Weka Data set: –is-definitions: 77 / 274 –punct-definitions: 45 / fold cross validation

Features Text properties: bag-of-words, bigrams, and bigram preceding the definition Syntactic properties: type of determiner within the defined term (definite, indefinite, no determiner) Proper nouns: presence of a proper noun in the defined term cf. Fahmi & Bouma, 2006

Configurations

Results – is_def (final)

Results – punct_def (final)

Final results precision: + (50 % and 40 %) recall: - (20 % and 30 %) F-score: + (30 % and 25 %)

Qualitative evaluation Scenario based evaluation, 1 st cycle Tested tutors (3) and students (6) Results: –usefulness glossary: useful (tutors: 2/3, students: 7/7) –usefulness search method 'Definitions': useful (students: 7/7) –performance: different (tutors: 1 ok, 1 not ok, 1 neutral)

Conclusions Recall: ok with pattern based grammar Precision: can be improved with machine learning approach Trade-off between recall and precision –Only PB: good recall, bad precision –PB & ML: better precision, lower recall For our purpose: recall might be more important than precision Size of objects also important

Future work Machine learning: try different features use other classifiers extend to all types of definitions Qualitative evaluation 2 nd cycle scenario based evaluation => focus more on preference user

Results – is_def (ML)

Results – punct_def (ML)