Cooperation for Arabic Language Resources and Tools – The MEDAR Project Bente Maegaard, Mohamed Attia, Khalid Choukri, Olivier Hamon, Steven Krauwer, Mustafa.

Slides:



Advertisements
Similar presentations
MIRA - WP 2 Observatory of Euro-Med S&T cooperation White Paper Coord. IRD (France) CNRS (Lebanon) MIRA Mediterranean Innovation and Research coordination.
Advertisements

New market instruments for RES-E to meet the 20/20/20 targets Sophie Dourlens-Quaranta, Technofi (Market4RES WP4 leader) Market4RES public kick-off Brussels,
Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
DEVELOPMENT OF A EUROPEAN NETWORK OF LIBRARIES Hans Geleijnse Director of Library and IT Services & CIO Tilburg University, The Netherlands.
ECVET WORKSHOP 2 22/23/24 November The European Quality Assurance Reference Framework.
We’re here for you. “European Exchange of Best Practice in Arson Investigation and Prevention” European exchange of best practice in arson investigation.
Steven KrauwerCLARIN-NL Launch CLARIN-EU: Where do we stand? Steven Krauwer Utrecht institute of Linguistics UiL OTS CLARIN-EU Coordinator.
CLARIN: Common Language Resources and Technology Infrastructure for the Social Sciences and Humanities Steven Krauwer Utrecht institute of Linguistics.
1 Bertelsmann Foundation New ways of transferring knowledge to the library community New Ways Of Transferring Knowledge To The Library Scene Internet Librarian.
FlareNet – Forum Vienna 12-13/02/2009 FLaReNet1 FLaRe Net Fostering Language Resources Network Session 1 Session 1/KC/ELRA The European Language Resources.
EUM Electronic Networking: EUMEDCONNECT Project Sabine Jaume-Rajaonia External affairs manager GIP RENATER.
Resources Distributors Linguistic Data Consortium NEMLAR (Network for Euro-Mediterranean LAnguage Resources)NEMLAR (Network for Euro-Mediterranean LAnguage.
REGIONAL COOPERATION IN TERTIARY EDUCATION IN MENA Nina Arnhold Senior Education Specialist Education Global Practice Bologna Policy Forum Yerevan 14 May.
HEPTech Funding Opportunities Ute Gunsenheimer ESS October 29, 2014.
E-Government and interoperability : the role of Machine Translation Francisco García Morán Chief IT Advisor European Commission e-Government powered.
LKR2004, Tokyo March The European Resources Landscape Steven Krauwer ELSNET / Utrecht University The Netherlands.
Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.
June ELSNET Review1 A Roadmap for Computational Linguistics Steven Krauwer ELSNET / Utrecht University (NL)
Strengthening the quality of research for policy engagement in the African context – achievements and aspirations Tebogo B. Seleka Botswana Institute for.
REAL European federation of language teacher associations REAL 2 PROJECT This project has been funded with support from the European Commission. This communication.
European Life Sciences Infrastructure for Biological Information ELIXIR
Co-ordination & Harmonisation of Advanced e-Infrastructures Research Infrastructures – Grant Agreement n Regional progress report: Mediterranean.
This project is funded by the European Union M EDSTAT II  Euro-Mediterranean Statistical Co-operation Introduction and objectives of the training session.
15/11/2011EVA Minerva Jerusalem1 Linked Heritage : Coordination of standards and technologies for the enrichment of Europeana Marie-Véronique Leroi Ministry.
Cocosda 2001  ELRA/ELDA KC/1 Brief Overview of recent activities in Europe Khalid CHOUKRI ELRA/ELDA 55 Rue Brillat-Savarin, F Paris, France Tel.
Quality Evaluation in MEDA Higher Education: Engineering JEP
Higher Education in Greece 2 equal HEI’s sectors of Degree granting Institutes: T.E.Is ---- Universities.
Roadmap for Language Resources and Evaluation in a Multilingual Environment Minority Languages in the African Context Justus Roux Centre for Language and.
13 May 2006Mellange Workshop Vienna1 MeLLANGE Multilingual eLearning for Language Engineering
WHO European Ministerial Conference on Couteracting Obesity European Commission Workshop About Shape Up.
How to write a successful EU funded project proposal? Fred de Vries Brussels 21 April 2004 Seminar Networking eLearning Practitioners.
Towards a European network for digital preservation Ideas for a proposal Mariella Guercio, University of Urbino.
ENABLER, BLARK, what’s next? Steven Krauwer Utrecht University / ELSNET.
Global Standards Symposium “Towards a better inclusion of the Arab region in the international standardization process” Khédija Ghariani – Secretary General.
Results of the HPC in Europe Taskforce (HET) e-IRG Workshop Kimmo Koski CSC – The Finnish IT Center for Science April 19 th, 2007.
Quality Control of Language Resources at ELRA Henk van den Heuvel a, Khalid Choukri b, Harald Höge c, Bente Maegaard d, Jan Odijk e, Valerie Mapelli b.
ERAMED & PROMEDAccess Projects. ERAMED project ERAMED project “ Strengthening the European Research Area in the Mediterranean Countries” “ Strengthening.
UNICA EC PROJECTS. COMPLETED PROJECTS LLL, Information Project on Higher Education Reform II: Lisbon Strategy and Bologna Process Bologna Experts and.
M.M. El-FoulyAmman-Jordan 4/2011 Approaches and Experiences Arabic Perspective Mohamed M. El-Fouly National Research Centre Cairo – Egypt IDEAS General.
MAP-IT! Review Meeting 5 March Brussels Intermediate results Jordan INNOVA.
 ELRA/ELDA EU Enlargement and Integration Workshop Arona, September 2005 Victoria Arranz 1 European Language Resources Association ELRA/ELDA: The Importance.
Changing the way the world learns English 1. Intellectual leadership A few years from now, anyone wanting to know about teaching or learning English.
CLARIN work packages. Conference Place yyyy-mm-dd
Riga, Apr HLT in the Baltics, 10 years after 1994 Steven Krauwer ELSNET / Utrecht University (NL)
International Research Networking Eumedgrid EGEE ’07: Grids and their role in sustaining development 1 October 2007 e-Infrastructures in the.
MEDITERRANEAN LIVING HERITAGE (MedLiHer) IMPLEMENTING THE CONVENTION FOR THE SAFEGUARDING OF THE INTANGIBLE CULTURAL HERITAGE IN EGYPT, JORDAN, LEBANON.
TEMPUS III Guide for applicants NEW. Web addresses TEMPUS III.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Hong Kong, 7 October 2000 Europe ELSNET and Europe What is ELSNET What is happening in Europe Steven Krauwer.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Research Infrastructures Grant Agreement n
Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure Catia Cucchiarini, Walter Daelemans.
United Nations Economic Commission for Europe Statistical Division Knowledge Management and Information Sharing: The Structure of UNECE’s work on Gender.
Regional Programme of Statistics in the Mediterranean Region MEDSTAT Phase II This project is funded by the European Union 1 Workshop on Data Compilation.
China July 2004 The European Union Programmes for EU-China Cooperation in ICT.
Collection of Pan-European Terminology Resources through Cooperation of Terminology Institutions EUROTERMBANK Andrejs Vasiļjevs, Tilde, Latvia.
Youth in Action Youth in Action supports providing competencies for young people contributes to the Lisbon strategy builds on the previous.
Workshop on Census Cartography and Management - October World Population and Housing Census Programme United Nations Statistics Division.
The Medstat programme: current situation and future work in the region Tenth meeting of the Management Group on Statistical Cooperation Luxembourg, 24.
EGEE is a project funded by the European Union under contract IST EGEE Summary NA2 Partners April
FP6−2004−Infrastructures−6-SSA [ Empowering e Science across the Mediterranean ] EUMEDGRID NGIs F. Ruggieri – INFN (EUMEDGRID Project Manager) EGEE’07.
25-September-2005 Manjit Dosanjh Welcome to CERN International Workshop on African Research & Education Networking September ITU, UNU and CERN.
E-science grid facility for Europe and Latin America CHAIN Proposal v0.1 EELA-2 compilation CERN e-Infrastructure projects Meeting ( )
Promoting Canada’s Language Industry & Stakeholder Collaboration Promoting and Supporting Canada’s Linguistic Duality This project is funded by the Government.
Regional and Global Initiatives in Statistics Neda JAFAR Statistics Division, UN-ESCWA RWG of the Statistical Committee Cairo, 3 September 2007.
METROLOGY LABORATORIES NETWORK UNDER MIRA-EMIS MIRA-WBC Joint Workshop on Innovation Sarajevo Ayşe SAYIN ÜKE Scientific Programs Expert International.
The ACCEPT Project Enabling machine translation for the emerging community content paradigm. Allowing citizens across the EU better access to communities.
HLT in DK Views and proposals
Infrastructrural Language Resources and International Cooperation
COCOSDA/WRITE Roadmap for Language Resources and Evaluation
Presentation transcript:

Cooperation for Arabic Language Resources and Tools – The MEDAR Project Bente Maegaard, Mohamed Attia, Khalid Choukri, Olivier Hamon, Steven Krauwer, Mustafa Yaseen Presented by: Bente Maegaard, University of Copenhagen, Co-ordinator of MEDAR

2 MEDAR: Background and mission Mission Support the development of language technology, language resources and tools for the Arabic language Important for the people, the economy and the culture in the Arab countries But current efforts are too small and too fragmented MEDAR is funded by the European Commission, and focuses on the Mediterranean area, but our scope for collaboration is much broader – all Arab countries, all continents – and we also want to include other Semitic languages in the future.

3 MEDAR partners University of Copenhagen, Denmark (coord.) ELDA, France University of Balamand, Lebanon Al-Ahlyya Amman University, Jordan Universiteit Utrecht, The Netherlands ILSP - Athena, Greece RDI, Egypt Birzeit University, West Bank and Gaza Strip ENSIAS, University of Mohammed V Soussi, Morocco CEA, France CNRS, France The Open University, United Kingdom Université Lumière Lyon 2, France IBM, Egypt Sakhr, Egypt

4 MEDAR Objectives and ‘streams’ 1) Technical stream Survey of players, projects, products BLARK for Arabic Focus on multilingual tools, develop MT 2) Roadmap stream Cooperation roadmap Network creation 3) Dissemination stream

5 Multilingual sub-project Focus: Machine Translation English-Arabic Into Arabic Important to use Open Source Education and training

6 MT system, corpora MOSES was chosen as the MT system Wide community Already experiments English-Arabic Previous experience of consortium partners Basic MOSES system developed by Balamand Enhanced system provided by IBM Cairo and Dublin City University. Partners collected parallel corpus, monolingual corpora

7 Evaluation - 1 Automatic evaluation 10,000 words evaluation corpus In 200,000 words masking corpus Four human translations have been produced, validated Human evaluation

8 Evaluation - 2 Second evaluation campaign will take place in June External participants have been invited and expressed interest

9 Resources for the community MT systems, the baselines developed in the project will be made publicly available according to the original licenses (MOSES, Giza++..) Training data, through ELRA, fair conditions Evaluation package, through ELRA, fair conditions

10 Cooperation roadmap Roadmap concept Set goals Define the steps to get there Define timeline The MEDAR roadmap covers 3 periods

11 Elements of the roadmap Players and human resources, education Technology and R&D E-infrastructure: internet penetration, mobile penetration Market A few examples are presented here, please refer to the booklet

12 Players and human resources, Education Players need skilled work force - not enough HLT experts We need HLT enabled professionals Typically one could add Linguistics, phonetics, language or speech processing – to engineers’ education Computing, machine learning, language or speech processing – to linguists’ education Do this in collaboration with other universities in the region, and with e.g. universities in Europe or the US

13 Players and human resources, Education - 2 Staff exchange Student grants Participation of (more) Arabic partners in EU funded projects MEDAR has chosen this as an area to investigate further Partners will elaborate a cooperation scheme

14 Technology BLARK - Basic building blocks: LR and tools Reusable Can be shared with other players Follow standards We need more resources and tools for Semitic languages, and they need to be shared. Free or cheap. Essential for education, research and first development

15 Technology - 2 Driving applications Fight illiteracy through HLT – speech enabled software etc Collaborate to make this happen Governments could introduce eGovernment etc. Many basic technologies are needed Discussion ongoing with other parties Agree what they are Agree on distribution of tasks, if possible

16 E-infrastructure - Internet users

17 Penetration rates

18 Market Important factors Piracy (38% worldwide, 60% in Middle-East and Africa) Fight piracy – this is ongoing Provide IT services, not products which can be copied

19 Conclusions Long-term goal of MEDAR Create better conditions for the development of language and speech technology for Arabic – in order to support the people, the culture, the economy Through collaboration and networking Therefore we welcome all comments and invite for a broad cooperation, Not only for Arabic, also for other Semitic languages. And also with partners outside the EU/Mediterranean Arabic countries

20 MEDAR Acknowledgement: All MEDAR partners Mediterranean Arabic Language and Speech Technology See the full Roadmap report and other information at