LT4eL - WP1: Setting the scene WP leader: UAIC Univ. AI. I. Cuza of Iasi Faculty of Computer Science Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol,

Slides:



Advertisements
Similar presentations
M-CAST Multilingual Content Aggregation System based on TRUST Search Engine Borys Czerniejewski Sebastian Lisek Infovide-Matrix S.A. (PL)
Advertisements

National Institute of Statistics, Geography and Informatics (INEGI) Implementation of SDMX in Mexico.
LT4EL - Integrating Language Technology and Semantic Web techniques in eLearning Lothar Lemnitzer GLDV AK eLearning, 11. September 2007.
Using a domain-ontology and semantic search in an eLearning environment Lothar Lemnitzer, Kiril Simov, Petya Osenova, Eelco Mossel and Paola Monachesi.
Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment Eelco Mossel LSP 2007, Hamburg.
WP 4: Integration of Language Technology Tools into ILIAS Learning Management System Alexander Killing Project review, Utrecht, 1 Feb 2007.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Applying Ontology-Based Lexicons to the Semantic Annotation of Learning Objects Kiril Simov and Petya Osenova BulTreeBank Project
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
HTML5 ETDs Edward A. Fox, Sung Hee Park, Nicholas Lynberg, Jesse Racer, Phil McElmurray Digital Library Research Laboratory Virginia Tech ETD 2010, June.
© NCSR, Paris, December 5-6, 2002 WP1: Plan for the remainder (1) Ontology Ontology  Enrich the lexicons for the 1 st domain based on partners remarks.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
SEVENPRO – STREP KEG seminar, Prague, 8/November/2007 © SEVENPRO Consortium SEVENPRO – Semantic Virtual Engineering Environment for Product.
Metalogix – Confidential Professional Archive Manager for SharePoint.
Multilingual eLearning in LANGuage Engineering. Project Overview  Project span: Oct 2004 – Oct 2007  Kick-off meeting Oct  Project goals:
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg.
Supporting e-learning with automatic glossary extraction Experiments with Portuguese Rosa Del Gaudio, António Branco RANLP, Borovets 2007.
USP workshop Using the Corpógrafo Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA.
Crosslingual Retrieval in an eLearning Environment Cristina Vertan, Kiril Simov, Petya Osenova, Lothar Lemnitzer, Alex Killing, Diane Evans, Paola Monachesi.
WP 2: Semi-automatic metadata generation driven by Language Technology Resources Lothar Lemnitzer Project review, Utrecht, 1 Feb 2007.
Digitisation and Access to Archival Collections: A Case Study of the Sofia Municipal Government (1878 – 1879) Maria Nisheva-Pavlova, Pavel Pavlov Faculty.
Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007.
LTeL - Language Technology for eLearning -
LTeL - Language Technology for eLearning - Paola Monachesi, Lothar Lemnitzer, Kiril Simov, Alex Killing, Diane Evans, Cristina Vertan.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Semi-automatic glossary creation from learning objects Eline Westerhout & Paola Monachesi.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
Specific Programme Cooperation in FP7 - Proposal submission -
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Leuven, Computer Aided Document Indexing System for Accessing Legislation A Joint Venture of Flanders and Croatia Bojana Dalbelo Bašić Faculty.
Software All parts of the computer people can NOT touch, such as programs, files, documents and any other data.
Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008.
February 2007MCST - FP7 Launch1 Michael Rosner Department of Computer Science and Artificial Intelligence University of Malta.
Constructing Your Own Corpus from Written Language.
FIIT STU Bratislava Classification and automatic concept map creation in eLearning environment Karol Furdík 1, Ján Paralič 1, Pavel Smrž.
Linguistics & AI1 Linguistics and Artificial Intelligence Linguistics and Artificial Intelligence Frank Van Eynde Center for Computational Linguistics.
PANACEA - Y2 After the 2 nd Annual Review, 28 th February 2012, Barcelona 1.
University of Economics Prague Information Extraction (WP6) Martin Labský MedIEQ meeting Helsinki, 24th October 2006.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Eurocris Membership Meeting Lisbon 9-11 November 2005 Sérgio Tenreiro de Magalhães Luís Amaral University.
European Virtual Laboratory of Mathematics Daniela Velichová Department of Mathematics Mechanical Engineering Faculty Slovak University of Technology.
FP OntoGrid: Paving the way for Knowledgeable Grid Services and Systems Communication in the consortium Review meeting Delft,
TALC Applying some Developments in Corpus Building Technology to Language Teaching and Learning TALC 2006 Paris.
FP WIKT '081 Marek Skokan, Ján Hreňo Semantic integration of governmental services in the Access-eGov project Faculty of Economics.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
(C) 2014 Logrus International Visualizing ITS 2.0 Categories for the localization process.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
1 Hierarchical XML Layers Representation for Heavily Annotated Corpora Dan Cristea Cristina Butnariu “ Al. I. Cuza.
Examples for Open Access Scholar Electronic Repository by New Bulgarian University IP LibCMASS Sofia 2011 Contract № 2011-ERA-IP-7 Sofia, September,
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
© NCSR, Frascati, July 18-19, 2002 WP1: Plan for the remainder (1) Ontology Ontology  Use of PROTÉGÉ to generate ontology and lexicons for the 1 st domain.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
NCSR “Demokritos” Institute of Informatics & Telecommunications CROSSMARC CROSS-lingual Multi Agent Retail Comparison Costas Spyropoulos & Vangelis Karkaletsis.
E. Kaldoudi – DUTH Deliverable 1.2: Study Report on Content Sharing Functional Requirements 3 rd Project Meeting Plovdiv, Bulgaria, January 2010.
1 Manage your Research Articles : Using Mendeley Fall Term 2012 Helen B. Josephine
Quick Launch. Google Drive 30 GB Cloud Space Document.
July 2002, DI Colloquium Semantic Annotation for Semantic Indexing Paul Buitelaar, Martin VolkMuchMore DFKI Language Technology Saarbrücken, Germany Eurospider.
Institute of Informatics & Telecommunications NCSR “Demokritos” Spidering Tool, Corpus collection Vangelis Karkaletsis, Kostas Stamatakis, Dimitra Farmakiotou.
WP1: Plan for the remainder (1) Ontology –Finalise ontology and lexicons for the 2 nd domain (RTV) Changes agreed in Heraklion –Improvement to existing.
XP Creating Web Pages with Microsoft Office
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Communication and Information Resource Centre Administrator
Computer Aided Document Indexing System for Accessing Legislation A Joint Venture of Flanders and Croatia Bojana Dalbelo Bašić Faculty of Electrical Engineering.
Text Format Files Number Files Size(Bytes) Words Number
Presentation transcript:

LT4eL - WP1: Setting the scene WP leader: UAIC Univ. AI. I. Cuza of Iasi Faculty of Computer Science Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene Contact: Utrecht Review Meeting, February 1, 2007

Objectives 1.inventarization and classification of existing tools necessary for the development of the relevant functionalities (i.e. key word extractor, glossary candidate detector); 2.collection and normalization of the learning material related to the use of the computer in education (Humanities, Social Sciences); 3.investigation of IPR issues; 4.adoption of relevant standards for linguistic annotation of learning objects; 5.dissemination of the results through a Web portal

Partners in WP1 Utrecht University (UU), The Netherlands University of Hamburg (UHH), Germany University of Lisbon (FFCUL), Portugal Charles University Prague (CUP), Czech Republic Institute for Parallel Processing, Bulgarian Academy of Sciences (IPP-BAS), Bulgaria University of Tübingen (UTU), Germany Institute of Computer Science, Polish Academy of Sciences (ICS-PAS), Poland Zürich University of Applied Sciences Winterthur (ZHW), Switzerland University of Malta (UOM), Malta

Lexikon CZ EN CONVERTOR 1 Documents SCORM Pseudo-Struct. Basic XML LING. PROCESSOR Lemmatizer, POS, Partial Parser CROSSLINGUAL RETRIEVAL LMS User Profile Documents SCORM Pseudo-Struct Metadata (Keywords) Ling. Annot XML Ontology CONVERTOR 2 Documents HTML Lexikon PT Lexikon RO Lexikon PL Lexicon GE Lexikon MT Lexikon BG Lexikon DT Lexicon EN PLGE BG PTMTDTRO EN Documents User (PDF, DOC, HTML, SCORM,XML) REPOSITORY Glossary

The Portal A working space: –Repository for resources, tools, deliverables –Exchange information among participants –Statistics Hosted by UAIC: –January 2007: 1.15 Gb (without realTimeStat, searchForm, upload/updateForm) Address: –Username: guestLt4eL –Passwd: elearning Demo version on CDCD

O1. Collection of language resources and tools (1) Inventarization and classification of existing tools ( relevant to: –the integration of language technology resources in eLearning (WP2) –the integration of semantic knowledge (WP3)

O1. Collection of language resources and tools (2) Inventarization and classification of existing language resources –corpora and frequencies lists: –lexica:

O2. Collection of LOs: the portal Uploads, updates & real-time statistics at Criteria (→ attributes): -Subdomains relevant for beginners in IST & e-learning → Domain -Multilingualism → Language -Medium sized documents → Number of words -IPR~clear → IPR -Uniformity in topics → keywords selected initially

Collection of LOs: domains 1. Use of computers in education, with sub-domains: 1.1 Teaching academic skills, with sub-domains: Academic skills Relevant computer skills for the above tasks (MS Word, Excel, Power Point, LaTex, Web pages, XML) Basic skills (use of computer for beginners) (chats, , Intenet) 1.2 e-Learning, e-Marketing 1.3 The I*Teach document (Leonardo project, Impact of use of computers in society 1.5 Studies about use of computers in schools / high schools 1.6 Impact of e-Learning on education 2. Calimera documents (parallel corpus developped in the Calimera FP5 project, )

Collection of LOs: domains coverage

The hierarchy of LOs’ formats

Collection of LOs: annotation layers 1.Initial documents: doc, pdf, html, txt → Base-XML 2.Linguistic annotation: tokens, POS, lemma, chunks → WP2 XML format (LT4ELAna.dtd) 3.Keywords, definitions and ontology links annotations

Level 1 conversions Base-XML plain texthtml otherlatexpdfdoc doc → html

Level 1 conversions doc → html (UTF-8) 1. MS Office: Save As html 2.OpenOffice Writer SXC/ODT: Save As html

Level 1 conversions Base-XML plain texthtml otherlatexpdfdoc pdf → html

Level 1 conversions: pdf → html (UTF-8) 1. Adobe on-line conversion tool 2. pdfbox (Windows) 3. pdftohtml (Linux) 4. OpenOffice 5. Adobe Acrobat Professional

Level 1 conversions Base-XML plain texthtml otherlatexpdfdoc Base-XML convertor

Level 1 conversions: html → Base-XML The UAIC Java converter –keeps all the tags possibly useful (fixed) –produces a log of all the removed tags/data The CUP html2xml.pl converter –tags kept according to a DTD

Collection of LOs: second level WP2 XML format tok-pos-lemma lemmapostokmorpho NP Language specific tools

Collection of LOs: second level WP2 XML format tok-pos-lemma lemmapostokmorpho NP scripts

Collection of LOs: KW extractor WP2 XML format Man KD XML Auto KD XML Level 2 Level 3 KW extractor

Collection of LOs: KW extractor WP2 XML format Man KD XML Auto KD XML Level 2 Level 3 KW extractor evaluation

Collection of LOs: third level Incl. akw, adefIncl. km.xml, dm.xml Man KD XML Auto KD XML def extractor kmxml: manually annotated kws dmxml: manually annotated defs akw: automatically annotated kws adef: automatically annotated defs

Collection of LOs: third level Incl. akw, adefIncl. km.xml, dm.xml Man KD XML Auto KD XML def extractor kmxml: manually annotated kws dmxml: manually annotated defs akw: automatically annotated kws adef: automatically annotated defs def extractor evaluation

Open issues Convertors –Tables, figures, page look… IPRs –Clarify the IPR status authors & EU + national legislation –Define IPR categories for LOs: usage (free, restricted, for research...)

WP1 over time December 05 February 06 NowMay 06 Initial collection on Portal Structure & functionalities to the portal - BaseXML convertors - new LOs Levels 2&3 additions - new tools - grammars - guides, docs - ontology, TermLex D1.1 Official end of WP1 Beginning of project Evaluation

Proposal: the hierarchy seen as a processing environment Level 2 docpdflatexother htmltxt sxml morphotokposlemmaNP wp2xml tpl akwadef axml Level 3 Level 1

Conclusions LOs, resources and tools collected Initially: portal seen as a repository Now: portal potentially integrated with the LMS as a processing environment