Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008.

Slides:



Advertisements
Similar presentations
LT4EL - Integrating Language Technology and Semantic Web techniques in eLearning Lothar Lemnitzer GLDV AK eLearning, 11. September 2007.
Advertisements

Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
University of Sheffield NLP Module 4: Machine Learning.
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
Monthly Conference Call With Superintendents and Charter School Administrators.
TAP-ET: TRANSLATION ADEQUACY AND PREFERENCE EVALUATION TOOL Mark Przybocki, Kay Peterson, Sébastien Bronsart May LREC 2008 Marrakech, Morocco.
G052 : Publishing A04 Evaluation of Publication. Evaluation of Publication (G052) Your Publication.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
5th International Conference Khartoum, Sudan Towards Building a Community of Learners February 26-28, 2015.
Supporting e-learning with automatic glossary extraction Experiments with Portuguese Rosa Del Gaudio, António Branco RANLP, Borovets 2007.
WP 2: Semi-automatic metadata generation driven by Language Technology Resources Lothar Lemnitzer Project review, Utrecht, 1 Feb 2007.
Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007.
LTeL - Language Technology for eLearning -
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Page 1 NAACL-HLT BEA Los Angeles, CA Annotating ESL Errors: Challenges and Rewards Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Semi-automatic glossary creation from learning objects Eline Westerhout & Paola Monachesi.
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Software Development, Programming, Testing & Implementation.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
류 현 정류 현 정 Human Computer Interaction Introducing evaluation.
Petter Nielsen Information Systems/IFI/UiO 1 Software Prototyping.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Requirements Analysis
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
FIIT STU Bratislava Classification and automatic concept map creation in eLearning environment Karol Furdík 1, Ján Paralič 1, Pavel Smrž.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Modeling and simulation of systems Model building Slovak University of Technology Faculty of Material Science and Technology in Trnava.
University of Sheffield NLP Teamware: A Collaborative, Web-based Annotation Environment Kalina Bontcheva, Milan Agatonovic University of Sheffield.
Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.
The Four P’s of an Effective Writing Tool: Personalized Practice with Proven Progress April 30, 2014.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
January 2005MERLOT Reusable Learning Design Guidelines OVERVIEW FOR MERLOT Copyright 2005 Reusable Learning This work is licensed under a Attribution-NoDerivs-NonCommercial.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
Modelling the Process and Life Cycle. The Meaning of Process A process: a series of steps involving activities, constrains, and resources that produce.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Institute of Informatics & Telecommunications NCSR “Demokritos” Spidering Tool, Corpus collection Vangelis Karkaletsis, Kostas Stamatakis, Dimitra Farmakiotou.
OpenACS and.LRN Conference 2008 Automatic Limited-Choice and Completion Test Creation, Assessment and Feedback in modern Learning Processes Institute for.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Academic Cooperation: Terminology Research for IATE.
WP4 Models and Contents Quality Assessment
Assessing SNOMED CT for Large Scale eHealth Deployments in the EU Workpackage 2- Building new Evidence Daniel Karlsson, Linköping University Stefan Schulz,
WP8: Demonstrators (UniCam – Regione Marche)
Block 9: Assignment Briefing
Lecture 12: Data Wrangling
Presentation transcript:

Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008

Outline Demonstration of the functionalities Where we stand Evaluation of tools Consequences for the development of the tools in the final phase

Demo We simulate a tutor who adds a learning objects and generates and edits additional data

Where we stand (1) Achievements reached in the first year of the project: Annotated corpora of learning objects Stand-alone prototype of keyword extractor (KWE) Stand-alone prototype of glossary candidate detector (GCD)

Where we stand (2) Achievements reached in the second year of the project: Quantitative evaluation of the corpora and tools Validation of the tools in user-centered usage scenarios for all languages Further development of tools in response to the results of the evaluation

Evaluation - rationale Quantitative evaluation is needed to Inform the further development of the tools (formative) Find the optimal setting / parameters for each language (summative)

Evaluation (1) Evaluation is applied to: the corpora of learning objects the keyword extractor the glossary candidate detector In the following, I will focus on the tool evaluation

Evaluation (2) Evaluation of the tools comprises of 1.measuring recall and precision compared to the manual annotation 2.measuring agreement on each task between different annotators 3.measuring acceptance of keywords / definition (rated on a scale)

KWE Evaluation step 1 On human annotator marked n keywords in document d First n choices of KWE for document d extracted Measure overlap between both sets measure also partial matches

Best methodF-Measure BulgarianTFIDF/ADRIDF0.25 CzechTFIDF/ADRIDF0.18 DutchTFIDF0.29 EnglishADRIDF0.33 GermanTFIDF0.16 PolishADRIDF0.26 PortugueseTFIDF0.22 RomanianTFIDF/ADRIDF0.15

KWE Evaluation – step 2 Measure Inter-Annotator Agreement (IAA) Participants read text (Calimera „Multimedia“) Participants assign keywords to that text (ideally not more than 15) KWE produces keywords for text

KWE Evaluation – step 2 1.Agreement is measured between human annotators 2.Agreement is measured between KWE and human annotators We have tested two measures / approaches –kappa according to Bruce / Wiebe –AC1, an alternative agreement weighting suggested by Debra Haley at OU, based on Gwet

IAA human annotators IAA of KWE with best settings Bulgarian Czech Dutch English German Polish Portuguese Romanian

KWE Evaluation – step 3 Humans judge the adequacy of keywords Participants read text (Calimera „Multimedia“) Participants see 20 KW generated by the KWE and rate them Scale 1 – 4 (excellent – not acceptable) 5 = not sure

20 kwFirst 5 kwFirst 10 kw Bulgarian Czech Dutch English German Polish Portuguese Romanian

GCD Evaluation - step 1 A human annotator marked definitions in document d GCD extracts defining contexts from same document d Measure overlap between both sets Overlap is measured on the sentence level, partial overlap counts

Is-definitionsRecallPrecision Bulgarian Czech Dutch English German Polish Portuguese Romanian

GCD Evaluation – step 2 Measure Inter-Annotator Agreement Experiments run for Polish and Dutch Prevalence-adjusted version of kappa used as a measure Polish: 0.42; Dutch: 0.44 IAA rather low for this task

GCD Evaluation – step 3 Judging quality of extracted definitions Participants read text Participants get definitions extracted by GCD for that text and rate quality Scale 1 – 4 (excellent – not acceptable) 5 = not sure

# defin.# testersAv. value Bulgarian Czech Dutch English German552.1 Polish Portuguese Romanian973.0

GCD Evaluation – step 3 Further findings relatively high variance (many ‚1‘ and ‚4‘) Disagreement between users about the quality of individual definitions

Individual user feedback - KWE The quality of the generated keywords remains an issue Variance in the responses from different language groups We suspect a correlation between language of the users and their satisfaction Performance of KWE relies on language settings, we have to investigate them further

Individual user feedback – GCD Not all the suggested definitions are real definitions. Terms are ok, but definitions cited are often not what would be expected. Some terms proposed in the glossary did not make any sense. The ability to see the context where a definition has been found is useful.

Consequences - KWE Use non-distributional information to rank keywords (layout, chains) Present first 10 keywords to user, more keywords on demand For keyphrases, present most frequent attested form Users can add their own keywords

Consequences - GCD Split definitions into types and tackle the most important types Use machine learning alongside local grammars Look into the part of the grammars which extract the defined term Users can add their own definitions

Plans for final phase KWE, work with lexical chains GCD, extend ML experiments Finalize documentation of the tools

Validation User scenarios with NLP tools embedded: 1.Content provider adds keywords and a glossary for a new learning object 2.Student uses keywords and definitions extracted from a learning object to prepare a presentation of the content of that learning object

Validation 3.Students use keywords and definitions extracted from a learning objects to prepare a quiz / exam about the content of that learning object

Validation We want to get feedback about The users‘ general attitude towards the tools The users‘ satisfaction with the results obtained by the tools in the particular situation of use (scenario)

User feedback Participants appreciate the option to add their own data Participants found it easy to use the functions

Plans for the next phase Improve precision of extraction results: KWE – implement lexical chainer GCD – use machine learning in combination with local grammars or substituting these grammars Finalize documentation of the tools

Corpus statistics – full corpus Measuring lengths of corpora (# of documents, tokens) Measuring token / tpye ratio Measuring type / lemma ratio

# of documents# of tokens Bulgarian Czech Dutch English German Polish Portuguese Romanian

Token / typeTypes / Lemma Bulgarian Czech Dutch English (tbc) German Polish Portuguese Romanian

Corpus statistics – full corpus Bulgarian, German and Polish corpora have a very low number of tokens per type (probably problems with sparseness) English has by far the highest ratio Czech, Dutch, Portuguese and Romanian are in between type / lemma ration reflects richness of inflectional paradigms

To do Please check / verify this numbers Report, for the M24 deliverable, about improvements / recanalysis of the corpora (I am aware of such activities for Bulgarian, German, and English)

Corpus statistics – annotated subcorpus Measuring lenghts of annotated documents Measuring distribution of manually marked keywords over documents Measuring the share of keyphrases

# of annotated documents Average length (# of tokens) Bulgarian Czech Dutch English German Polish Portuguese Romanian413375

# of keywordsAverage # of keywords per doc. Bulgarian Czech Dutch English German Polish Portuguese99734 Romanian255562

Keyphrases Bulgarian43 % Czech27 % Dutch25 % English62 % German10 % Polish67 % Portuguese14 % Romanian30 %