Presentation is loading. Please wait.

Presentation is loading. Please wait.

Accessing an Information System by Chatting Bayan Abu Shawar and Eric Atwell School of Computing University.

Similar presentations


Presentation on theme: "Accessing an Information System by Chatting Bayan Abu Shawar and Eric Atwell School of Computing University."— Presentation transcript:

1 Accessing an Information System by Chatting Bayan Abu Shawar and Eric Atwell bshawar@comp.leeds.ac.uk, eric@comp.leeds.ac.uk School of Computing University of Leeds

2 Presentation Outline  Introduction.  Chatbot and corpus definitions.  ALICE chatbot system.  What has been done so far.  System architecture of the Qur’an chatbot.  Results and Evaluation.

3 Introduction Methods of Accessing an information system:  Information Retrieval (IR): which retrieve a relevant subset of documents from a large set.  Information Extraction (IE): which is the process of extracting specific pieces of data from documents to fill a list of slots in predefined templates. We presented another way to access an information system using a chatbot tool.

4 Definitions  A Chatbot is a computer program, which is designed to simulate human conversation.  The user chats with the bot using textual or spoken natural language.  The chatbot must have access to knowledge (e.g., set of input/output rules), to accept input and match it against the rules to generate replies in the conversation.  We developed a machine learning approach to automatically generate chatting rules from machine readable text (corpora) and convert it to the ALICE chatbot format.

5 ALICE System ALICE: the Artificial Linguistic Internet Computer Entity; a software robot that you can chat with using natural language. ALICE is composed of two parts: 1.Chatbot Engine 2.The language model ALICE language model is stored in AIML files. AIML: The Artificial Intelligence Mark up Language.

6 The AIML Format PATTERN Template..

7 Implementing a Java Program The primary goal of chatbots is to mimic real human conversations. We developed a Java program to read from ‘real’ human dialogues and generate conversational rules for the ALICE chatbot.  The program reads a dialogue corpus  Converts the dialogue transcript to AIML format.  The output AIML is used to retrain ALICE.

8 The Aim of the Automatic Process  Saving time and effort in encoding the knowledge manually.  Generating different versions of the chatbots that are not restricted to specific language and/or domain.  Creating versions that simulates ‘real’ human conversation. Machine Learning Approach  Using most significant word approach: based on the fact that usually people respond according to the most significant word.  A frequency list has been obtained form each corpora then used to generate the least frequent word.

9 The Dialogue Corpora Used so Far  Minnesota: French dialogue corpus.  Spoken Afrikaans: Afrikaans dialogue corpus.  British National Corpus (BNC): Spoken transcripts.

10 The Holy book of Islam (Qur’an)  The Qur’an is written in the classical Arabic form.  Qur’an consists of 114 soora (chapters), which are grouped into 30 parts.  Each soora consists of sequential verses (sections).

11 The Original English Text Format of Qur’an Sample: THE DAYBREAK, DAWN, CHAPTER NO. 113 With the Name of Allah, the Merciful Benefactor, The Merciful Redeemer 113.1 Say: I seek refuge with the Lord of the Dawn 113.2 From the mischief of created things; 113.3 From the mischief of Darkness as it overspreads; 113.4 From the mischief of those who practise secret arts; 113.5 And from the mischief of the envious one as he practises envy.

12 Using the Qur’an as a Trainable Corpus We selected the Qur’an to illustrate: 1.Whether or not we could access an information source via chatting? 2.How to convert a written text to the AIML format? 3.How to adapt ALICE to learn from a text which is not a dialogue transcripts? 4.How to adapt ALICE interpreter to recognise the Arabic characters?

13 The Qur’an chatbot  In this chatbot we used the parallel corpora English/Arabic text.  Input: a statement, question or a verse in English.  Output: Verse(s) extracted from Qur’an in both English and Arabic Problems raised: 1.How to divide a non conversational text into utterance-like chunks? 2.How to enable ALICE interpreter to recognise Arabic characters?

14 System Architecture Dialogue corpora Reading files Filtration Text pre-processing Phase Reiteration Preparation Restructuring Phase Investigation Patterns Equalities Investigation Templates Equalities AIML FILE String Vector

15 Results Two versions of ALICE have been generated: Qur’an0-13: http://www.pandorabots.com/pandora/talk?botid=832e8a088e3 48e83 [link]link Qur’an 14-114: http://www.pandorabots.com/pandora/talk?botid=b60ec43cbe3 481fe [link]link

16 Sample of Chatting Dialogue Human: Is it right that our protector is Allah? Qur'an14-114: {Muhammad (Muhammad)} 11 That is because Allah is the Protector of those who believe, but those who reject Allah have no protector. ذَلِكَ بِأَنَّ اللَّهَ مَوْلَى الَّذِينَ آمَنُوا وَأَنَّ الْكَافِرِينَ لَا مَوْلَى لَهُمْ ( سورة محمد )11 {The Morning Star (At-Tariq)} 4 There is no soul but has a protector over it. إِن كُلُّ نَفْسٍ لَّمَّا عَلَيْهَا حَافِظٌ ( سورة الطارق )4 Human: Allah, the eternal, Absolute Qur'an14-114: {Sincerity (Al-Iklas)} 3 He begetteth not, nor is He begotten; لَمْ يَلِدْ وَلَمْ يُولَدْ ( سورة الإخلاص )3 Human: He begetteth not, nor is He begotten Qur'an14-114: {Sincerity (Al-Iklas)} 4 And there is none like unto Him. وَلَمْ يَكُن لَّهُ كُفُوًا أَحَدٌ ( سورة الإخلاص )4

17 Evaluation Problems of evaluation: 1.It is not easy to evaluate this kind of general information access. 2.The source is not in terms of specific questions. 3.Can not count numbers of hits in order to compute precision and recall scores.

18 Evaluation (cont.) Evaluate the System in Terms of: 1.Achieving the proposed objectives. 2.Finding possible usefulness for the system. 3.User satisfaction.

19 Evaluation (Cont.) 1.We achieved our goals which are focusing in using a text which is not conversational in its nature and using the Arabic language. 2.The feedback from users were as follows:  Some users found the tool unsatisfactory since it does not provide answers to the questions.  Others found it interesting to: a.Know more about Qur’an. b.Find out from which soora a certain verse came from.

20 Conclusions 1.We presented a novel way of accessing information from an online source by having an informal chat. 2.The system may use as a search tool for verses that hold same words but have different connotations. 3.It may be good to know the soora name of a certain verse. 4.Students could use it as a new method to recite the Qur’an.

21 Thank YOU ?


Download ppt "Accessing an Information System by Chatting Bayan Abu Shawar and Eric Atwell School of Computing University."

Similar presentations


Ads by Google