1 Egyptian Ministry of Communications and Information Technology Research and Development Centers of Excellence Initiative Data Mining and Computer Modeling.

Slides:



Advertisements
Similar presentations
Towards Science, Technology and Innovation2/10/2014 Sustainable Development Education, Research and Innovation Vision for Knowledge Economy Professor Maged.
Advertisements

Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
ENERGY ENGINEERING MASTERS DEGREE. Why a Consortium? Great problems are solved with high performance equipment. We celebrate that eight leader companies.
Steps towards E-Government in Syria
Near East Plant Protection Network for Regional Cooperation & Knowledge Sharing Food and Agriculture Organization of the United Nations An Overview on.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
by DATUK IR. HAMZAH HASAN CHIEF EXECUTIVE CIDB MALAYSIA
How do we work in a virtual multilingual classroom? A virtual multilingual classroom with Moodle and Apertium Cultural and Linguistic Practices in the.
International Conference on Language, Literature and Culture in Education - LLCE May, 2014 Nitra, Slovakia.
ULM-CAIRO BILATERAL CULTURAL EXCHANGE AGREEMENT IN JOINT COLLABORATION WITH THE DIES PROGRAMME ” Cairo University 1-4 July, 2006 Prof. Dr. M. A. Zaki Ewiss.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
University of Vienna Rectorate – Office of the Rectorate May 30, 2008 Claudia Kögler University of Vienna, Office of the Rectorate.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Productivity, Investment in Human Capital and the Challenge of Youth Employment THE ROLE OF VARIOUS SOCIAL ACTORS IN THE FIELD OF YOUTH EMPLOYMENT IN RUSSIA.
Review questions for vocabulary study  What is the purpose of a big vocabulary? Can you have a vocabulary lesson in isolation?  What is best practice.
Tips and Tricks … with INTEX/NOOJ Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences Max Silberztein University.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
National Strategy for Jordanian Women (NSJW)
Second High Level Forum on GGIM Seminar on Regional Cooperation in Geospatial Information Management Doha, Qatar, 7 February 2013 Overview on Geospatial.
Union of Electronics, Electrical Engineering and Telecommunications (CEEC) Technical University of Sofia Third International Seminar - UPB University-
Addressing National Priorities in TEMPUS Projects TEMPUS Project for Establishing a Center of Excellence for Research & Training at Damascus University.
Dr. Tal Lotan Manufacturers' Association of Israel November 2011.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Conference on Productivity, Investment in Human Capital and the Challenge of Youth Employment Bergamo (Italy) December 2010.
Some Advances in Transformation-Based Part of Speech Tagging
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Management information systems. Management information systems, and shortened (in English: MIS), is a type of information technology and computer is.
HOW CAN WE ENSURE THAT STEM GRADUATES HAVE THE SKILLS REQUIRED BY LABOR MARKETS? Some Points for Discussion STI Forum, April 2012 Dr. Boukary Savadogo,
Human Resources Development for Sudan Nuclear Power Programme Presented by Prof. F.I.Habbani Technical Committee For Sudan NPP Ministry of Energy and Mining.
How to use the VSS to design a National Strategy for the Development of Statistics (NSDS) 1.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.
Challenges in Dissemination: Meeting Users’ Needs with Limited Resources Niamey, 13 – 15 May 2014 by S. Mungralee Senior Statistician Statistics Mauritius.
Languages at Inxight Ian Hersey Co-Founder and SVP, Corporate Development and Strategy.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Highlights on the New ICT Strategy After the 25 th January Revolution Dr. Mohamed Salem Minister of Communications and Information Technology October 2011.
ROMANIAN AGENCY OF QUALITY ASSURANCE IN HIGHER EDUCATION – ARACIS Consolidating and Developing the Human Capital for Sustainable knowledge-based Societies.
Part-of-Speech Tagging with Limited Training Corpora Robert Staubs Period 1.
A knowledge rich morph analyzer for Marathi derived forms Ashwini Vaidya IIIT Hyderabad.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Web Service Exchange Protocols Preliminary Proposal ISO TC37 SC4 WG1 2 September 2013 Pisa, Italy.
Economic Performance of Arabic Translation Industry in Arab Countries Dr. Najib Harabi Professor of Economics at the University of Applied Sciences, Northwestern.
In this lecture, we will learn about: Translation.
Pentti Pulkkinen Programme Manager Academy of Finland Research funding and administration in Finland
1 Towards Excellence: Research, Development and Innovation Policy Measures of the Ministry of Education, Youth and Sports Dalibor Štys Minister of Education,
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
TAIEX Multi-Beneficiary Workshop on Validation of Non-formal and Informal Learning Country presentation Albania Prepared by: Ejvis Gishti - NAVETQ Albina.
Facts about Turkey & Turkish Educational System. TURKEY ON THE CROSSROADS OF CONTINENTS  Geographical Area: 774,576 km 2  Population (2013) :
NATIONAL POLICIES FOR STEPPING-UP RESEARCH, TECHNOLOGICAL DEVELOPMENT AND INNOVATION.
Measuring the Innovation Potential of the Bulgarian Economy Establishing an IRC in Macedonia, Skopje, March 29, 2006 Ruslan Stefanov Economic Program Center.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
NLP Midterm Solution #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source.
"Innovation-based Growth – the Development and the Future Challenges of the Finnish Innovation Environment” Timo Kekkonen Director, Confederation of Finnish.
Developing reporting system for SDG and Agenda 2063, contribution of National Statistical System, issues faced and challenges CSA Ethiopia.
STRATEGIC ACADEMIC UNIT “PEOPLE & TECHNOLOGIES”
Approaches to Machine Translation
Sentiment analysis algorithms and applications: A survey
Second Language Acquisition and Morphology
Statistical NLP: Lecture 3
Approaches to Machine Translation
Strategic Management Seminar, Split, Croatia, November 2007
Presentation transcript:

1 Egyptian Ministry of Communications and Information Technology Research and Development Centers of Excellence Initiative Data Mining and Computer Modeling Center of Excellence Arabic Text Mining Project Presentation By Prof. Mohsen A. A. Rashwan; Cairo University, RDI & Dr. Mohamed Attia; RDI

2 Formation EMCIT has sought to make the Centers-of-Excellence initiative in a try to establish slim, focused, responsive, and effective bodies of R&D in vital modernistic areas of advanced CIT, beyond any bureaucracy of the bulkier conventional institutions. EMCIT has started with the Data Mining & Computer Modeling CoE, and other centers of Mobile Computing, Micro-Electronics, …, are following. The Data Mining CoE is now up and running with 5 major projects serving; Arabic Text Mining, Basic DM Research, Tourism, e-Health, and Oil & Gas. The staff of the Text Mining project is a selected group of - so far 27 - brightest professors, graduate researchers, and engineers specialized in Computer Science, Computational Linguistics, and Classic Linguistics. They come from both the academia and the private IT sector.

3 Need, Challenge, Edge, and Capability The strategic move towards CIT as a firm basis of a modernized economy infrastructure for Egypt makes it clear why Data Mining in general and Text Mining in specific emerge as an R&D priority in Egypt. As mountains of Arabic text documents have been accumulating over years, the knowledge contained in these treasures are badly sought as the basis of sound decision making regarding virtually all kinds of vital activities. The novelty of the TM paradigm, along with the sophisticated Arabic language specifics which is years aged and spoken natively by about 6% of world population, both present the non trivial challenge of developing effective Arabic Text Mining tools & applications. In addition of the well chosen HR devoted to such as task, we think we have an edge in this area due to being native specialists in Arabic NLP with good past experiences in such projects; e.g. the Euro-Med. project of NEMLAR;

4 Arabic NLP infrastructure, Text Mining tools, and Applications

5 Phenomenon, Challenge, and Solution Phenomenon: Arabic is a highly reflective and inflective language with a tremendous vocabulary generation capabilities. Billions of full-form words are possible! Challenge: This makes all various kinds of stochastic methodologies deployed in language-independent Text Mining tools perform poorly when applied on full-form Arabic text than on other less inflective and derivative languages (e.g. English) due to a higher dimensionality and more diluted correlations. Solution: Our approach is to replace the surface target text by effective types of Text Factorization that both reduces dimensionality and concentrates correlations of the resulting sequences over the (original) surface text. Finding and deploying effective language factorization(s) with those two features strikingly helps whatever kind of statistical machine learning methodology used for text mining applications on Arabic text (or the languages alike).

6 Arabic Language Factorisation Arabic lexical factorization, Part-of-Speech tagging, and lexical semantic factorization are kinds of text factorizations of special relevance to text mining as we think. A simple, regular, and comprehensive Arabic lexical model with a compact set of morphemes has been designed and proven to cover the lexical sophistications of Arabic language. Arabic lexicon, lexical analyzer, and PoS tagger have been built according to this model and deployed into many application where they proved effective. A knowledge base that maps the Arabic lexicon to (tokenized) semantic fields have been built. Cont.

7 Arabic Language Factorisation Cont.’d The standard semantic relations (synonymy, antonymy, …, etc.) among our set of semantic fields along with the lexical semantic analyzer based on them are being perfected over the rest of the TM project life time. In fact, that lexical → semantic knowledge base maps minimally constrained lexical compounds (not final-form words) to semantic fields which allows best chances for maximum hits ratio as well as least ambiguous lexical semantic factorization of input Arabic text. In all the aforementioned types of Arabic text factorization, considerable ambiguity arises in different phases of analysis. Disambiguation is done through statistical methods working on stochastic supervised training models.

8 Thanks for your attention. To probe further.. &