Download presentation
Presentation is loading. Please wait.
Published byStewart Palmer Modified over 9 years ago
1
Arabic NLP: Challenges & Opportunities Dr. Samir Tartir Scientific Day Faculty of Information Philadelphia University May 15 th 2013
2
ثمن
3
علم
4
قِ
5
General Information History – (Classical) Arabic has remained unchanged, intelligible and functional for more than fifteen centuries. Strategically important – 330 million speakers living in an important region huge oil reserves, sacred sites. – 1.4 billion Muslims use in their prayers. Cultural and literary heritage – Closely associated with Islam
6
Distribution
7
Versions Classical Modern Dialects
8
Arabic Language Characteristics Highly structured Highly derivational language – Morphology Free word order Modern Arabic lacks diacritics (short vowels)
9
Example* *Microsoft Arabic NLP Toolkit (ATK) For Academia in the Arab World Presentation, 11/2012
10
Arabic Language Characteristics Synonymy and confusion of non-standardized terms – Thermometer: محر، محرار، مقياس حرارة، ميزان حرارة، ترمومتر Technical translation – Hydrometer: جهاز قياس كثافة السوائل Uncle, parent…
11
Letters One letter, one sound Letters change shape Hamza No capital letters Can use normalization
12
Ambiguity Homographs –قدم Internal word structure ambiguity –بعقوبة Syntactic ambiguity –قابلت مدير البنك الجديد Semantic ambiguity –يحب علي احمد اكثر من ابراهيم Anaphoric ambiguity – قابل الصحفي الوزير الذي انتقده
13
NLP Automatic summarization Machine translation Named entity recognition (NER) Natural language generation Natural language understanding Optical character recognition (OCR) Question answering Sentiment analysis Speech recognition Word sense disambiguation Information retrieval (IR) Speech processing Text-to-speech Natural language search Automated essay scoring etc
14
Question Answering** Hammo et al. QARAB: A Question Answering System to Support the Arabic Language. Workshop on Computational Approaches to Semitic Languages. ACL 2002
15
Arabic NLP Issues Lack of tools Lack of linguistic references Lack of training data
16
Available Tools Arabic Treebank Arabic WordNet – MySQL database – SUMO Ontology – Java Microsoft Arabic Toolkit (ATK)
17
Summary Arabic is difficult to deal with Progress has been made More work is done on different parts Any progress is valuable – Business – Personal – Governmental
18
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.