Data Mining and Text Analytics By Saima Rahna & Anees Mohammad Quranic Arabic Corpus.

Slides:



Advertisements
Similar presentations
How to Write a Review Article
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Quranic Arabic Corpus Data Mining & Text Analytics By Ismail Teladia & Abdullah Alazwari.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Web Categorization Crawler – Part I Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Final Presentation Sep Web Categorization.
Disasters and Human Factors Literature Nestor L Osorio Northern Illinois University.
Chapter 3 Database Management
Requirements Specification
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Chapter 2: Algorithm Discovery and Design
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Chapter 2: Algorithm Discovery and Design
Chapter 2: Algorithm Discovery and Design
Chapter 2: Algorithm Discovery and Design
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Lecture-8/ T. Nouf Almujally
Slide 3.1 Saunders, Lewis and Thornhill, Research Methods for Business Students, 5 th Edition, © Mark Saunders, Philip Lewis and Adrian Thornhill 2009.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Artificial Intelligence. Agenda StartEnd Introduction AI Future Recent Developments Turing Test Turing Test Evaluation.
8/19/20151 بسم الله الرحمن الرحيم ICS 482 Natural Language Processing Lecture 24: Project Ideas + Students Presentations Husni Al-Muhtaseb.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Qur’an Sacred book of Islam Actual word of God told to Muhammad through the Angel Gabriel Guide to all Muslims for daily life God’s COMPLETE message to.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Exploring a topic in depth... From Reading to Writing The drama Antigone was written and performed 2,500 years ago in a society that was very different.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science, C++ Version, Third Edition.
Invitation to Computer Science 6th Edition
1 Computational Linguistics Ling 200 Spring 2006.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
INTRODUCTION TO RESEARCH. Learning to become a researcher By the time you get to college, you will be expected to advance from: Information retrieval–
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
DC 2004 Metadata Generation and Accessibility Auditing Liddy Nevile La Trobe University, Australia Mail
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Introduction to Islam. Islam is the newest of the three Abrahamic religions – Founded in the 7 th century in the city of Mecca in modern day Saudi Arabia.
Semantic, Hierarchical, Online Clustering of Web Search Results Yisheng Dong.
POLS 2300: Introduction to Library Research Timothy Bristow Research & Instruction Librarian, Scott Library.
1 Team Members: Rohan Kothari Vaibhav Mehta Vinay Rambhia Hybrid Review System.
BUSINESS ANALYTICS AND DATA VISUALIZATION
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
Encyclopaedia Idea1 New Library Feature Proposal 22 The Encyclopaedia.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Organizing Data and Information
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Compiler Construction (CS-636)
CISB594 – Business Intelligence Business Analytics and Data Visualization Part I.
WEB PAGE CONTENTS VERIFICATION AGAINST TAGS USING DATA MINING TOOL IKNOW VІI scientific and practical seminar with international participation "Economic.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Development of an Intelligent Translation Memory MorphoLogic SZAK Publishers Balázs Kis
NEW REQUIREMENTS New requirements – American Sign Language – Recently Generated Sentences Issues with Requirements Options for Implementation Choice and.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
Detecting terrorist activities Presentation on a specialist topic in Data Mining and Text Analytic.
People and Families of the Bible Nathan Friedly. Overview Introduction Key Ideas Description and use Deliverables Demonstration Conclusion.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
LESSON ESSENTIAL QUESTION: WHAT KEY ELEMENTS ARE FOUND IN MOST TEXTBOOKS?
Albert J. Moore, ASA, MAAA Ohio National Financial Services Chairman, SOA Technology Section August 1, 2014 IABA Conference New Orleans, LA.
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
Mohammad Alqahtani, Dr. Eric Atwell
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Smart IT Job Advisor and Analysis on web application
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Summarizing Entities: A Survey Report
Presentation transcript:

Data Mining and Text Analytics By Saima Rahna & Anees Mohammad Quranic Arabic Corpus

Summary ● Quranic Arabic corpus enables further analysis of the Quran ● Uses linguistic resources for each word and verse in the quran – e.g. Morphology and syntax ● Automated algorithms were used in the Quran.

Introduction ● Islam was born in Arabia (1400 years ago) ● The key sacred texts are in Arabic ● Only a minority Muslims can speak and understand Arabic ● A larger percentage of Muslims know English as a second language or even first ● Web resources and book resources use English in parallel with Arabic.

Data Mining ● Uses tools and techniques to extract data ● Different aspects of a single topic in the Quran can reappear in many chapters ● Therefore frequent patterns can be used to construct a subjective index where all versus on a single topic can be covered easily.

Text Analytic ● Referred to as information extraction ● The Quranic corpus is an advantage to those who don't understand Arabic ● Can give the English readers a better insight into the source ● The translation is at a detailed text Analytic level

Resources & Techniques Statistical techniques ● Implementing statistical techniques such as keyword extraction ● Can explore semiotic relationships between sound and meaning in the Quran ● Recognise reoccurring patterns ● Recognise reoccurring patterns for high level of accuracy ● Linguistic resource ● Arabic grammar and syntax used for each word in the quran ● A comment based system used online for visitors to discuss and correct the data.

Algorithms ● Quranic Arabic Corpus used Java to implement their algorithms. ● Search feature ● (searching concepts and key words in the Holy Quran) ● Finding multi-word repetitions ● Mining frequent patterns to a graph.

Algorithm for indexing the Quran When a word is encountered for the first time, it is added to the index; if it already exists there, then a new location is added to its list. For each verse V parse word list -> list(W) For each word W If INDEX contains W is false add W and W.location to Index Else fetch W in INDEX add new location to W

Filtering algorithm ● The Quranic 'quote filtering' algorithm ● The Quran has the use of Arabic diacritics (symbols) ● The filtering algorithm has 3 filtering stages after making the input text. Algorithm-Sub path Mining ● This is used to generate frequent patterns within the Quran corpus ● The process starts by scanning the transaction database, calculating the count for each vertex in the graph

Conclusion ● Algorithms used ● Resources and techniques used for ● implementation of the Quranic Arabic corpus ● How data mining is applied ● How text analytic has also been applied

Thank you :-)