Download presentation
Presentation is loading. Please wait.
1
LT4eL - WP1: Setting the scene WP leader: UAIC Univ. AI. I. Cuza of Iasi Faculty of Computer Science Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene Contact: dcristea@info.uaic.rodcristea@info.uaic.ro Utrecht Review Meeting, February 1, 2007
2
Objectives 1.inventarization and classification of existing tools necessary for the development of the relevant functionalities (i.e. key word extractor, glossary candidate detector); 2.collection and normalization of the learning material related to the use of the computer in education (Humanities, Social Sciences); 3.investigation of IPR issues; 4.adoption of relevant standards for linguistic annotation of learning objects; 5.dissemination of the results through a Web portal
3
Partners in WP1 Utrecht University (UU), The Netherlands University of Hamburg (UHH), Germany University of Lisbon (FFCUL), Portugal Charles University Prague (CUP), Czech Republic Institute for Parallel Processing, Bulgarian Academy of Sciences (IPP-BAS), Bulgaria University of Tübingen (UTU), Germany Institute of Computer Science, Polish Academy of Sciences (ICS-PAS), Poland Zürich University of Applied Sciences Winterthur (ZHW), Switzerland University of Malta (UOM), Malta
4
Lexikon CZ EN CONVERTOR 1 Documents SCORM Pseudo-Struct. Basic XML LING. PROCESSOR Lemmatizer, POS, Partial Parser CROSSLINGUAL RETRIEVAL LMS User Profile Documents SCORM Pseudo-Struct Metadata (Keywords) Ling. Annot XML Ontology CONVERTOR 2 Documents HTML Lexikon PT Lexikon RO Lexikon PL Lexicon GE Lexikon MT Lexikon BG Lexikon DT Lexicon EN PLGE BG PTMTDTRO EN Documents User (PDF, DOC, HTML, SCORM,XML) REPOSITORY Glossary
5
The Portal A working space: –Repository for resources, tools, deliverables –Exchange information among participants –Statistics Hosted by UAIC: –January 2007: 1.15 Gb (without realTimeStat, searchForm, upload/updateForm) Address: http://consilr.info.uaic.ro/uploads_lt4elhttp://consilr.info.uaic.ro/uploads_lt4el –Username: guestLt4eL –Passwd: elearning Demo version on CDCD
6
O1. Collection of language resources and tools (1) Inventarization and classification of existing tools (http://consilr.info.uaic.ro/uploads_lt4el/tools/all.php?) relevant to:http://consilr.info.uaic.ro/uploads_lt4el/tools/all.php –the integration of language technology resources in eLearning (WP2) –the integration of semantic knowledge (WP3)
7
O1. Collection of language resources and tools (2) Inventarization and classification of existing language resources –corpora and frequencies lists: http://consilr.info.uaic.ro/uploads_lt4el/menu/all.php http://consilr.info.uaic.ro/uploads_lt4el/menu/all.php –lexica: http://www.let.uu.nl/lt4el/wiki/index.php/Lexica_Joint_Table http://www.let.uu.nl/lt4el/wiki/index.php/Lexica_Joint_Table
8
O2. Collection of LOs: the portal Uploads, updates & real-time statistics at http://consilr.info.uaic.ro/uploads_lt4el/ http://consilr.info.uaic.ro/uploads_lt4el/ Criteria (→ attributes): -Subdomains relevant for beginners in IST & e-learning → Domain -Multilingualism → Language -Medium sized documents → Number of words -IPR~clear → IPR -Uniformity in topics → keywords selected initially
9
Collection of LOs: domains 1. Use of computers in education, with sub-domains: 1.1 Teaching academic skills, with sub-domains: 1.1.1 Academic skills 1.1.2 Relevant computer skills for the above tasks (MS Word, Excel, Power Point, LaTex, Web pages, XML) 1.1.3 Basic skills (use of computer for beginners) (chats, e-mail, Intenet) 1.2 e-Learning, e-Marketing 1.3 The I*Teach document (Leonardo project, http://i-teach.fmi.uni-sofia.bg/)http://i-teach.fmi.uni-sofia.bg/ 1.4 Impact of use of computers in society 1.5 Studies about use of computers in schools / high schools 1.6 Impact of e-Learning on education 2. Calimera documents (parallel corpus developped in the Calimera FP5 project, http://www.calimera.org/ )http://www.calimera.org/
10
Collection of LOs: domains coverage
11
The hierarchy of LOs’ formats
12
Collection of LOs: annotation layers 1.Initial documents: doc, pdf, html, txt → Base-XML 2.Linguistic annotation: tokens, POS, lemma, chunks → WP2 XML format (LT4ELAna.dtd) 3.Keywords, definitions and ontology links annotations
13
Level 1 conversions Base-XML plain texthtml otherlatexpdfdoc doc → html
14
Level 1 conversions doc → html (UTF-8) 1. MS Office: Save As html 2.OpenOffice Writer SXC/ODT: Save As html
15
Level 1 conversions Base-XML plain texthtml otherlatexpdfdoc pdf → html
16
Level 1 conversions: pdf → html (UTF-8) 1. Adobe on-line conversion tool 2. pdfbox (Windows) 3. pdftohtml (Linux) 4. OpenOffice 5. Adobe Acrobat Professional
17
Level 1 conversions Base-XML plain texthtml otherlatexpdfdoc Base-XML convertor
18
Level 1 conversions: html → Base-XML The UAIC Java converter –keeps all the tags possibly useful (fixed) –produces a log of all the removed tags/data The CUP html2xml.pl converter –tags kept according to a DTD
19
Collection of LOs: second level WP2 XML format tok-pos-lemma lemmapostokmorpho NP Language specific tools
20
Collection of LOs: second level WP2 XML format tok-pos-lemma lemmapostokmorpho NP scripts
21
Collection of LOs: KW extractor WP2 XML format Man KD XML Auto KD XML Level 2 Level 3 KW extractor
22
Collection of LOs: KW extractor WP2 XML format Man KD XML Auto KD XML Level 2 Level 3 KW extractor evaluation
23
Collection of LOs: third level Incl. akw, adefIncl. km.xml, dm.xml Man KD XML Auto KD XML def extractor kmxml: manually annotated kws dmxml: manually annotated defs akw: automatically annotated kws adef: automatically annotated defs
24
Collection of LOs: third level Incl. akw, adefIncl. km.xml, dm.xml Man KD XML Auto KD XML def extractor kmxml: manually annotated kws dmxml: manually annotated defs akw: automatically annotated kws adef: automatically annotated defs def extractor evaluation
25
Open issues Convertors –Tables, figures, page look… IPRs –Clarify the IPR status authors & EU + national legislation –Define IPR categories for LOs: usage (free, restricted, for research...)
26
WP1 over time December 05 February 06 NowMay 06 Initial collection on Portal Structure & functionalities to the portal - BaseXML convertors - new LOs Levels 2&3 additions - new tools - grammars - guides, docs - ontology, TermLex D1.1 Official end of WP1 Beginning of project Evaluation
27
Proposal: the hierarchy seen as a processing environment Level 2 docpdflatexother htmltxt sxml morphotokposlemmaNP wp2xml tpl akwadef axml Level 3 Level 1
28
Conclusions LOs, resources and tools collected Initially: portal seen as a repository Now: portal potentially integrated with the LMS as a processing environment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.