WP1: Plan for the remainder (1) Ontology –Finalise ontology and lexicons for the 2 nd domain (RTV) Changes agreed in Heraklion –Improvement to existing functionalities, development of new ones in Protégé (RTV) Nodes ids, import/export, FE schema generation, stereotyped editor, … –Customisation tools and methodology (using the same ontology + lexicons schema, create through Protégé the new ontology and lexicons and the new FE schema) (RTV) Specify measurements for customisation Examine in new domains applying the measurements –Relevant section in D1.3(b) (2 nd domain, improvements, customisation tools and methodology, …) (RTV) Draft Final
WP1: Plan for the remainder (2) Corpus formation for the needs of page filtering –Customisation methodology Specify measurements Examine in new domains, apply measurements –Relevant section in D1.3(b) (2 nd domain, customisation methodology) (NCSR) Draft Final
WP1: Plan for the remainder (3) Web spidering (NEAC) –Incorporate WebXimmler Examine whether there are relevant commercial products Examine WebXimmler in both domains New version of WebXimmler to handle the remaining 20% (javascripts, …), problems with charsets XML vs XHTML output (is WebXimmler output appropriate for all partners ?) –Incorporate Language Identification Module (LIM) EDIN’s LIM improvement, evaluate in both domains Examine other LIMs Speed performance tests Updates in NEAC (apply LIM in each visited page, add a meta-tag when saving the page)
WP1: Plan for the remainder (4) Web spidering (NEAC) –Finalise site navigator Handle the rest of the navigation cases (javascript, search forms) Examine in both domains –Page filtering Evaluation for the 2 nd domain in all 4 languages Customisation methodology (specify measurements, examine in new domains) –Link scoring Evaluation for the 2 nd domain in all 4 languages Customisation methodology (specify measurements, examine in new domains) –Relevant section in D1.3(b) (NCSR) Draft Final
WP1: Plan for the remainder (5) Focused Crawling Tool –Evaluation of the integrated tool (EDIN crawler + NEAC-light) in the 4 languages both domains –Customisation methodology Specify measurements, Examine in new domains, apply measurements –Relevant section in D1.3(b) (new version, evaluation in both domains, customisation methodology) Draft Final
WP1: Plan for the remainder (6) Other tools for web pages collection –Cross-merge NCSR, RTV version, documentation Integrated web pages collection system –Report modifications(due to agents strategy and support of more than one domain) –Relevant section in D1.3(b)
WP1: Plan for the remainder (6) Corpus collection for the needs of NERC and FE –Report on the corpus collection task for 2 nd domain for D1.3(b) Web Annotator –Customisation methodology –Relevant section in D1.3(b) (final version, customisation methodology)
WP1: Plan for the remainder (8) User Evaluation –Experiment with new Web UI for focused crawling, spidering – Possible improvements Deliverable D1.3(b) –Template –Draft –Final
WP2: Plan for the remainder (1) NERC DTD –Specifying NERC DTDs for new domains Guidelines, Examine in new domains Relevant section in D2.4 Corpus annotation for the needs of NERC –Final NERC annotation guidelines for the 2 nd domain based on the partners remarks during the annotation task –Relevant section in D2.4
WP2: Plan for the remainder (2) NERC v.3 (incorporation of mechanisms for rapid adaptation to new domains) –Exploit machine learning techniques for each language EDIN (max entropy, …), Lingway (induction of rules to support knowledge engineer, …), NCSR (Decision trees, TBEDL-Brill, Combination), RTV (lexicon acquisition to enrich lexicons/gazetteers, …) Application and evaluation in both domains for each partner –Customisation methodology Template for relevant section in D2.4 Specify measurements Application in new domains Report per partner Integrated report on a NERC customisation methodology –Relevant section in D2.4
WP2: Plan for the remainder (3) NERC-based demarcator (NCSR) –Compare the rule-based version with the ML-based one in both domains –Customisation methodology Specify measurements Examine in new domains –Delivery of Demarcator application to the partners –Relevant section in D2.4
WP2: Plan for the remainder (4) Deliverable 2.4 –Template –Draft –Final
WP3: Plan for the remainder (1) FE schema –Final corrections to FE schema for 2 nd domain –Specifying FE schemas for new domains through Protégé Guidelines, Examine in new domains Relevant section in D3.2 Corpus annotation for the needs of FE –Final Fact annotation guidelines for the 2 nd domain based on the partners remarks during the annotation task –Relevant section in D3.2
WP3: Plan for the remainder (2) FE v.2 –WHISK (RTV) Provide the WHISK v.1 application to the partners (specify the necessary modules) Relevant section in D3.2 Normalisation –Evaluation results per partner –Relevant section in D3.2 –Customisation to the 2 nd domain Evaluation results per partner Relevant section in D3.3 –Customisation methodology
WP3: Plan for the remainder (3) Name matching –Evaluation results per partner –Relevant section in D3.2 –Customisation to the 2 nd domain Evaluation results per partner Relevant section in D3.3 –Customisation methodology
WP3: Plan for the remainder (4) FE v.3 –Application of the 4 techniques (also Lingway’s) to the 2 nd domain Evaluation results per partner –Examine the combination of the 3 techniques (meta-learning) Decide on the strategy, Evaluation results –Customisation methodology Template for relevant section in D3.3 Specify measurements Application in new domains Report per partner Integrated report on a FE customisation methodology –Relevant section in D3.3
WP3: Plan for the remainder (5) Image Segmentation - OCR –Customisation to the 2 nd domain Annotation Evaluation Relevant section in D3.3
WP3: Plan for the remainder (6) IERI + monolingual IE systems –Integration of FE v.2 with NERC v.2 Evaluation of the integrated IE system per language (combination of 3 FE techniques with NERC v.2) Relevant section in D3.2 –Integration of FEv.3 with NERC v.3 Evaluation of the integrated IE system per language (combination of 4 FE techniques with NERC v.3) Relevant section in D3.3 –Modifications to monolingual IE systems to handle runs in more than one domain Relevant section in D3.2
WP3: Plan for the remainder (7) Deliverables –D3.2 Deliver Final version –D3.3 Template Draft Final
WP4: Plan for the remainder (1) End-User Interface –Changes in the UI taking into account the remarks from the evaluation workshops –Report on the UI of the 2 nd prototype in D4.3 –Report on the UI of the Final prototype in D4.4
WP4: Plan for the remainder (2) System Integration –Finalise agents Focused crawling agent, Spidering agent Data Storage agent, Personalisation agent –2 nd integrated prototype Installation & User Manual, Documentation –Final integrated prototype Installation & User Manual, Documentation
WP4: Plan for the remainder (3) Personalisation –Improvements in the final prototype –Stereotypes editor through Protégé –Personalisation methodology –Relevant section in D4.4
WP4: Plan for the remainder (4) Evaluation –Evaluation workshop at RTV –Evaluation report in D4.3 –Final evaluation Finalise evaluation methodology Other evaluation workshops: Where ? When ? Evaluation report in D4.4
WP4: Plan for the remainder (5) Deliverables –D4.3 Deliver Final version –D4.4 Template Draft Final
WP5: Plan for the remainder (1) Management reports –9 th Quarterly Report (deliver end June) –10 th Quarterly Report (deliver mid September) –5 th Semestrial Report, 5 th Cost Statements (deliver mid September) –Final Report (deliver end September)
WP5: Plan for the remainder (3) Dissemination Plan - I –1st European Summer School on Ontological Engineering and the Semantic Web (SSSW-2003), July 21-26, Cercedilla, Spain ( Promote CROSSMARC and invite to a web-based evaluation. –2nd International Workshop on Web Document Analysis (WDA- 2003), August 3, Edinburgh ( Promote CROSSMARC and invite to a web-based evaluation. –Ontologies and Information Extraction International Workshop held as part of the EUROLAN Summer School, July 28 - August 8, 2003, Bucarest, Romania ( Schoolhttp://ic2.epfl.ch/~pallotta/ontoIE/
WP5: Plan for the remainder (4) Dissemination Plan - II –Recent Advances in Natural Language Processing (RANLP-2003), September 2003, Borovets, Bulgaria –8th National Congress of the Italian Association of Artificial Intelligence (AI*IA 2003), September 23-26, Pisa ( aiia2003.di.unipi.it/aiia2003/index-eng.html). Promote CROSSMARC and invite to a web-based evaluation (??). aiia2003.di.unipi.it/aiia2003/index-eng.html –IST-2003, October 4-7, Milan, Italy. Plans to demonstrate CROSSMARC technology through a CROSSMARC exhibition (a relevant proposal has already been submitted). –9th Panhellenic Conference on Informatics, November 21-23, 2003, Thessaloniki (
WP5: Plan for the remainder (5) Dissemination Plan – III –Multilingual CROSSMARC site –Multilingual Questionnaire Technology Implementation Plan –Draft –Final Date and place of final meeting before the review