Presentation is loading. Please wait.

Presentation is loading. Please wait.

LT4eL - WP1: Setting the scene WP leader: UAIC Univ. AI. I. Cuza of Iasi Faculty of Computer Science Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol,

Similar presentations


Presentation on theme: "LT4eL - WP1: Setting the scene WP leader: UAIC Univ. AI. I. Cuza of Iasi Faculty of Computer Science Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol,"— Presentation transcript:

1 LT4eL - WP1: Setting the scene WP leader: UAIC Univ. AI. I. Cuza of Iasi Faculty of Computer Science Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene Contact: dcristea@info.uaic.rodcristea@info.uaic.ro Utrecht Review Meeting, February 1, 2007

2 Objectives 1.inventarization and classification of existing tools necessary for the development of the relevant functionalities (i.e. key word extractor, glossary candidate detector); 2.collection and normalization of the learning material related to the use of the computer in education (Humanities, Social Sciences); 3.investigation of IPR issues; 4.adoption of relevant standards for linguistic annotation of learning objects; 5.dissemination of the results through a Web portal

3 Partners in WP1 Utrecht University (UU), The Netherlands University of Hamburg (UHH), Germany University of Lisbon (FFCUL), Portugal Charles University Prague (CUP), Czech Republic Institute for Parallel Processing, Bulgarian Academy of Sciences (IPP-BAS), Bulgaria University of Tübingen (UTU), Germany Institute of Computer Science, Polish Academy of Sciences (ICS-PAS), Poland Zürich University of Applied Sciences Winterthur (ZHW), Switzerland University of Malta (UOM), Malta

4 Lexikon CZ EN CONVERTOR 1 Documents SCORM Pseudo-Struct. Basic XML LING. PROCESSOR Lemmatizer, POS, Partial Parser CROSSLINGUAL RETRIEVAL LMS User Profile Documents SCORM Pseudo-Struct Metadata (Keywords) Ling. Annot XML Ontology CONVERTOR 2 Documents HTML Lexikon PT Lexikon RO Lexikon PL Lexicon GE Lexikon MT Lexikon BG Lexikon DT Lexicon EN PLGE BG PTMTDTRO EN Documents User (PDF, DOC, HTML, SCORM,XML) REPOSITORY Glossary

5 The Portal A working space: –Repository for resources, tools, deliverables –Exchange information among participants –Statistics Hosted by UAIC: –January 2007: 1.15 Gb (without realTimeStat, searchForm, upload/updateForm) Address: http://consilr.info.uaic.ro/uploads_lt4elhttp://consilr.info.uaic.ro/uploads_lt4el –Username: guestLt4eL –Passwd: elearning Demo version on CDCD

6 O1. Collection of language resources and tools (1) Inventarization and classification of existing tools (http://consilr.info.uaic.ro/uploads_lt4el/tools/all.php?) relevant to:http://consilr.info.uaic.ro/uploads_lt4el/tools/all.php –the integration of language technology resources in eLearning (WP2) –the integration of semantic knowledge (WP3)

7 O1. Collection of language resources and tools (2) Inventarization and classification of existing language resources –corpora and frequencies lists: http://consilr.info.uaic.ro/uploads_lt4el/menu/all.php http://consilr.info.uaic.ro/uploads_lt4el/menu/all.php –lexica: http://www.let.uu.nl/lt4el/wiki/index.php/Lexica_Joint_Table http://www.let.uu.nl/lt4el/wiki/index.php/Lexica_Joint_Table

8 O2. Collection of LOs: the portal Uploads, updates & real-time statistics at http://consilr.info.uaic.ro/uploads_lt4el/ http://consilr.info.uaic.ro/uploads_lt4el/ Criteria (→ attributes): -Subdomains relevant for beginners in IST & e-learning → Domain -Multilingualism → Language -Medium sized documents → Number of words -IPR~clear → IPR -Uniformity in topics → keywords selected initially

9 Collection of LOs: domains 1. Use of computers in education, with sub-domains: 1.1 Teaching academic skills, with sub-domains: 1.1.1 Academic skills 1.1.2 Relevant computer skills for the above tasks (MS Word, Excel, Power Point, LaTex, Web pages, XML) 1.1.3 Basic skills (use of computer for beginners) (chats, e-mail, Intenet) 1.2 e-Learning, e-Marketing 1.3 The I*Teach document (Leonardo project, http://i-teach.fmi.uni-sofia.bg/)http://i-teach.fmi.uni-sofia.bg/ 1.4 Impact of use of computers in society 1.5 Studies about use of computers in schools / high schools 1.6 Impact of e-Learning on education 2. Calimera documents (parallel corpus developped in the Calimera FP5 project, http://www.calimera.org/ )http://www.calimera.org/

10 Collection of LOs: domains coverage

11 The hierarchy of LOs’ formats

12 Collection of LOs: annotation layers 1.Initial documents: doc, pdf, html, txt → Base-XML 2.Linguistic annotation: tokens, POS, lemma, chunks → WP2 XML format (LT4ELAna.dtd) 3.Keywords, definitions and ontology links annotations

13 Level 1 conversions Base-XML plain texthtml otherlatexpdfdoc doc → html

14 Level 1 conversions doc → html (UTF-8) 1. MS Office: Save As html 2.OpenOffice Writer SXC/ODT: Save As html

15 Level 1 conversions Base-XML plain texthtml otherlatexpdfdoc pdf → html

16 Level 1 conversions: pdf → html (UTF-8) 1. Adobe on-line conversion tool 2. pdfbox (Windows) 3. pdftohtml (Linux) 4. OpenOffice 5. Adobe Acrobat Professional

17 Level 1 conversions Base-XML plain texthtml otherlatexpdfdoc Base-XML convertor

18 Level 1 conversions: html → Base-XML The UAIC Java converter –keeps all the tags possibly useful (fixed) –produces a log of all the removed tags/data The CUP html2xml.pl converter –tags kept according to a DTD

19 Collection of LOs: second level WP2 XML format tok-pos-lemma lemmapostokmorpho NP Language specific tools

20 Collection of LOs: second level WP2 XML format tok-pos-lemma lemmapostokmorpho NP scripts

21 Collection of LOs: KW extractor WP2 XML format Man KD XML Auto KD XML Level 2 Level 3 KW extractor

22 Collection of LOs: KW extractor WP2 XML format Man KD XML Auto KD XML Level 2 Level 3 KW extractor evaluation

23 Collection of LOs: third level Incl. akw, adefIncl. km.xml, dm.xml Man KD XML Auto KD XML def extractor kmxml: manually annotated kws dmxml: manually annotated defs akw: automatically annotated kws adef: automatically annotated defs

24 Collection of LOs: third level Incl. akw, adefIncl. km.xml, dm.xml Man KD XML Auto KD XML def extractor kmxml: manually annotated kws dmxml: manually annotated defs akw: automatically annotated kws adef: automatically annotated defs def extractor evaluation

25 Open issues Convertors –Tables, figures, page look… IPRs –Clarify the IPR status authors & EU + national legislation –Define IPR categories for LOs: usage (free, restricted, for research...)

26 WP1 over time December 05 February 06 NowMay 06 Initial collection on Portal Structure & functionalities to the portal - BaseXML convertors - new LOs Levels 2&3 additions - new tools - grammars - guides, docs - ontology, TermLex D1.1 Official end of WP1 Beginning of project Evaluation

27 Proposal: the hierarchy seen as a processing environment Level 2 docpdflatexother htmltxt sxml morphotokposlemmaNP wp2xml tpl akwadef axml Level 3 Level 1

28 Conclusions LOs, resources and tools collected Initially: portal seen as a repository Now: portal potentially integrated with the LMS as a processing environment


Download ppt "LT4eL - WP1: Setting the scene WP leader: UAIC Univ. AI. I. Cuza of Iasi Faculty of Computer Science Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol,"

Similar presentations


Ads by Google