Supporting e-learning with automatic glossary extraction Experiments with Portuguese Rosa Del Gaudio, António Branco RANLP, Borovets 2007.

Slides:



Advertisements
Similar presentations
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
ThemeInformation Extraction for World Wide Web PaperUnsupervised Learning of Soft Patterns for Generating Definitions from Online News Author Cui, H.,
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
CoopIS2001 Trento, Italy The Use of Machine-Generated Ontologies in Dynamic Information Seeking Giovanni Modica Avigdor Gal Hasan M. Jamil.
Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier.
An Introduction to GATE
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
Semiautomatic Generation of Data-Extraction Ontologies Master’s Thesis Proposal Yihong Ding.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
USP workshop Using the Corpógrafo Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA.
Crosslingual Retrieval in an eLearning Environment Cristina Vertan, Kiril Simov, Petya Osenova, Lothar Lemnitzer, Alex Killing, Diane Evans, Paola Monachesi.
WP 2: Semi-automatic metadata generation driven by Language Technology Resources Lothar Lemnitzer Project review, Utrecht, 1 Feb 2007.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007.
Open Information Extraction From The Web Rani Qumsiyeh.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
Information Extraction and Ontology Learning Guided by Web Directory Authors:Martin Kavalec Vojtěch Svátek Presenter: Mark Vickers.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Semi-automatic glossary creation from learning objects Eline Westerhout & Paola Monachesi.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Using Use Case Scenarios and Operational Variables for Generating Test Objectives Javier J. Gutiérrez María José Escalona Manuel Mejías Arturo H. Torres.
Developing a Basic Web Page Posting Files on UMBC
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
Overview of Search Engines
DEiXTo.
Towards Automatic Structured Web Data Extraction System Tomas Grigalis, 2nd year PhD student Scientific supervisor: prof. habil. dr. Antanas Čenys.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
February 2007MCST - FP7 Launch1 Michael Rosner Department of Computer Science and Artificial Intelligence University of Malta.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
Information Extraction From Medical Records by Alexander Barsky.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
A Language Independent Method for Question Classification COLING 2004.
1 A Hierarchical Approach to Wrapper Induction Presentation by Tim Chartrand of A paper bypaper Ion Muslea, Steve Minton and Craig Knoblock.
Presenter: Shanshan Lu 03/04/2010
SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
XML The Extensible Markup Language (XML ), which is comparable to SGML and modeled on it, describes how to describe a collection of data. A standard way.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
Institute of Informatics & Telecommunications NCSR “Demokritos” Spidering Tool, Corpus collection Vangelis Karkaletsis, Kostas Stamatakis, Dimitra Farmakiotou.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Grammar Review for Essay Writing “Punctuation Marks.”
The CoNLL-2014 Shared Task on Grammatical Error Correction
Extracting Recipes from Chemical Academic Papers
Presentation transcript:

Supporting e-learning with automatic glossary extraction Experiments with Portuguese Rosa Del Gaudio, António Branco RANLP, Borovets 2007

Presentation Plan ● LT4eL project ● ILIAS ● Corpus ● Tool ● Grammars ● Copula ● Other Verbs ● Punctuation ● Results ● Conclusion

LT4eL ● Improve retrieval and accessibility of LO in learning management systems ● Employ language technology resources and tools for the semi-automatic generation of descriptive metadata. ● Develop new functionalities such as a key word extractor and a glossary candidate detector, semantic search, tuned for the various languages addressed in the project (Bulgarian, Czech, Dutch, English, German, Maltese, Polish, Portuguese, Romanian).

ILIAS

Objective ● Build a Glossary in an automatic way to support e- learning process. In practice this means to extract a definition from unstructured text (scientific papers, enciclopedia, web pages) ● Better access to information for student ● Accelerate the work of the tutor

ILIAS: Glossary Candidate Detector

The Corpus tokens Tutorials PhD Thesis Scientific papers 3 Domains evenly represented e-learning Technology for non experts Calimera

XML format Intranet é uma rede desenvolvida para processamento de informações em uma empresa ou organização.

LxTransduce Input: simple text or xml Regular expressions Substitution and markup Output the same file with changes Match tree using elements Quick Unicode friendly freeware Easy to integrate in other tools (java)

Rules in lxtransduce

First development phase ● Less than 50% of the corpus ● Focus on the verb ● Precision: manually marked/all automatic ● Recall: correct automatic/manually marked ● F2 :3*(precision*recall)/2*precision+recall Gr Gr 00 F2RecallPrecision

Second developing phase 75% of the corpus for developing 25% of the corpus for testing Specific grammar/rules for each type

Copula baseline grammar Verb “to be” third person singular or plural present indicative

Copula base result Sentence level results Problem with precision

Copula Grammar

Rules for is_type <query = ’V’ and )] 3’ )])]

Confronting Results Include that patterns that were excluded Try to gather the syntactic pattern of non definition and confront with the syntactic pattern of definition.

Other_Verbs grammar Collect verbs in a lexicon Three different category: reflexive, active, passive. 22 different verbs ref pas

Results for verb_type Analyze each verbs separately as with is_type Richer syntactic patterns

Punctuation Grammar ● Preliminary work ● Definition introduced by colon mark (most frequent)

All-in-one Combination of the previous grammars The type is not take into account to calculate precision and recall

Conclusions and Future Work Overall results: Recall 86%, Precision 14% Difference among domains: the style of a document influence the result. Improve the rules for verb_type and punc_type Combining with other techniques such as ML