Presentation is loading. Please wait.

Presentation is loading. Please wait.

Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

Similar presentations


Presentation on theme: "Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,"— Presentation transcript:

1 Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz, Dr. Haim Mizrahi Dr. Haim Mizrahi Academic coordinator: Prof. Michael Elad Academic coordinator: Prof. Michael Elad Students: Eyal Sharabi Horwitz, Students: Eyal Sharabi Horwitz, Shiran Cohen Shiran Cohen

2 Project Objectives  This project is part of an overall development of an organizational Wiki meant for sharing information within the organization.  Our project’s objective is to serve as an automatic tagging tool for key phrases, based on an organizational taxonomy. The project is composed of two separate modules – a service module and the GUI module  The Objectives of the Service Module: Identifying key phrases that relate to an organizational taxonomy in an unstructured text. Identifying key phrases that relate to an organizational taxonomy in an unstructured text. Develop and implement algorithms to identify and extract new key phrases from a given document. Develop and implement algorithms to identify and extract new key phrases from a given document.

3 Project Objectives – cont.  The Objectives of the Service Module – cont. Present the findings in an excel file to allow future analysis of the key phrases found by the automatic tagging tool. Present the findings in an excel file to allow future analysis of the key phrases found by the automatic tagging tool.  The Objectives of the GUI Module: Design an Interface that enables the user to analyze the key phrases found by the automatic tagging tool: Design an Interface that enables the user to analyze the key phrases found by the automatic tagging tool: Insert a new key phrase into the taxonomy.Insert a new key phrase into the taxonomy. Delete a key phrase suggested by the automatic tagging tool.Delete a key phrase suggested by the automatic tagging tool. Edit the text of a key phrase suggested by the automatic tagging tool before adding it to the taxonomy.Edit the text of a key phrase suggested by the automatic tagging tool before adding it to the taxonomy. Present the rationale that lead to the finding of a key phrase by the service module, Present the rationale that lead to the finding of a key phrase by the service module, Allow the user to add new key phrases to the taxonomy Allow the user to add new key phrases to the taxonomy

4 Methodology  In depth understanding of the morphology analyzed documents and taxonomy and using this information in the different tagging algorithms.  Literature survey used for developing algorithms to present new key phrases to the user from a given document: Frequency based tagging algorithm – checks how frequent a key phrase appear in a given document and in the whole corpus. Frequency based tagging algorithm – checks how frequent a key phrase appear in a given document and in the whole corpus. Location based tagging algorithm – gives a score to a key phrase based on it’s distance from the beginning and end of the document and it’s life span in the document. Location based tagging algorithm – gives a score to a key phrase based on it’s distance from the beginning and end of the document and it’s life span in the document. Noun tagging algorithm – gives higher score to key phrases with multiple nouns. Noun tagging algorithm – gives higher score to key phrases with multiple nouns.  Microsoft’s.Net WinForms API was used to create the GUI.  Access DB was used to save the information about the key phrases used by the different algorithms, and to save the updated taxonomy.

5 Achievements  The Service Module Implementing an algorithm for identifying key phrases from the taxonomy in a given text. Using an advanced screening process of similar key phrases. Implementing an algorithm for identifying key phrases from the taxonomy in a given text. Using an advanced screening process of similar key phrases. Implementing several tagging algorithms used to suggest new key phrases to the user. Implementing several tagging algorithms used to suggest new key phrases to the user. Frequency, location and noun based tagging (presented in the methodology section)Frequency, location and noun based tagging (presented in the methodology section) Foreign language tagging – tagging the foreign language phrases in the textForeign language tagging – tagging the foreign language phrases in the text Flexibility: Flexibility: GUI-Process separation to allow portability and usage with various systemsGUI-Process separation to allow portability and usage with various systems Expansion of the taxonomy to effectively unlimited sizeExpansion of the taxonomy to effectively unlimited size New tagging algorithms can be added easily to the process.New tagging algorithms can be added easily to the process.

6  The GUI An Interface was created to enable the user to analyze the key phrases found by the automatic tagging tool: An Interface was created to enable the user to analyze the key phrases found by the automatic tagging tool: Insert a new key phrase into the taxonomy – adding the new key phrase under an existing main subject and secondary subject in the taxonomy hierarchy or adding new ones.Insert a new key phrase into the taxonomy – adding the new key phrase under an existing main subject and secondary subject in the taxonomy hierarchy or adding new ones. Delete a key phrase.Delete a key phrase. Edit the text of a key phrase suggested by the automatic tagging tool before adding it to the taxonomy.Edit the text of a key phrase suggested by the automatic tagging tool before adding it to the taxonomy. Present the rationale that lead to the finding of a key phrase by the service module, Present the rationale that lead to the finding of a key phrase by the service module, Allow the user to add new key phrases to the taxonomy by marking the desired text in the document. Allow the user to add new key phrases to the taxonomy by marking the desired text in the document. Achievements – cont.

7  The GUI – cont. Algorithm selection window: Algorithm selection window: used to select the different algorithms to be used in order to find new key phrases in a given text.used to select the different algorithms to be used in order to find new key phrases in a given text. Allows to manage the parameters of the different algorithms, to give different weights to different algorithms and different weights to phrases of different size in order to give preference in the tagging process to phrases of a certain size.Allows to manage the parameters of the different algorithms, to give different weights to different algorithms and different weights to phrases of different size in order to give preference in the tagging process to phrases of a certain size. Saving the findings for future use and analysis: Saving the findings for future use and analysis: Enables the user to save the current taxonomy into the DB for future use in other documentsEnables the user to save the current taxonomy into the DB for future use in other documents Enables the user to save the current taxonomy and the new key phrase found by the automatic tagging tool to an excel file for future analysisEnables the user to save the current taxonomy and the new key phrase found by the automatic tagging tool to an excel file for future analysis Achievements – cont.

8  Documentation provided User’s manual User’s manual Developers’ guide Developers’ guide Inline documentation of the code Inline documentation of the code

9 Example of the tagging process A new document was loaded to the automatic tagging tool

10 By pressing the “Initiate Tagging” button the tagging process begins. here presented are the tagging results of the taxonomy based tagging algorithms

11 The user can press on the phrases that were found and see their location in the document and their location in the hierarchy of the organizational taxonomy

12 The user can choose which of the implemented tagging algorithms he wishes to run and their weight in determining whether a phrase found in the document will be presented to the user as a new suggested key phrase

13 The new key phrases found by the automatic tagging tool are presented to the user and he can chose whether to approve or delete each of the suggested key phrases

14 If the user chose to approve a certain key phrase, he enters an editing window were he decides where the new key phrase should be in the taxonomy hierarchy.

15 The user can save all the new findings and the new taxonomy into an excel file or into the DB for future use and analysis

16 Conclusions  When developing a system, large or small, one must take the time to plan and create a high level design and not rush to implement the system.  A considerable amount of time should be dedicated to fully understand the morphology analyzer’s output.  To optimize the system’s output it should be tested on a large document corpus.  This course has contributed a lot to us in learning how to work with different software tools, develop a large system and work in a team.

17 Points for improvement  Choose a certain appearance of a key phrase in the text based on high number of key phrases surrounding it.  Integrate algorithms using advanced natural language processing tools for better understanding of the text.  Add machine learning abilities that enable the system to adjust the parameters of the different algorithms as the system analyzes more documents.


Download ppt "Final Presentation Industrial project 234313 Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,"

Similar presentations


Ads by Google