Ontology Based Annotation of Text Segments Presented by Ahmed Rafea Samhaa R. El-Beltagy Maryam Hazman.

Slides:



Advertisements
Similar presentations
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Advertisements

Semiautomatic Generation of Data-Extraction Ontologies Master’s Thesis Proposal Yihong Ding.
SPICE! An Ontology Based Web Application By Angela Maduko and Felicia Jones Final Presentation For CSCI8350: Enterprise Integration.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
FAO of the UN Library and Documentation Systems Division ECDL 2003 Trondheim August 03 Automatic multi-label subject indexing in a multilingual environment.
Information and Business Work
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
INFO 624 Week 3 Retrieval System Evaluation
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
BioText Infrastructure Ariel Schwartz Gaurav Bhalotia 10/07/2002.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Developing an Ontology for Irrigation Information Resources *Cornejo, C., H.W. Beck, D.Z. Haman, F.S. Zazueta. University of Florida Gainesville, FL. USA.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
H. Lundbeck A/S3-Oct-151 Assessing the effectiveness of your current search and retrieval function Anna G. Eslau, Information Specialist, H. Lundbeck A/S.
Multilingual Information Exchange APAN, Bangkok 27 January 2005
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Agropedia IIT Kanpur The Knowledge & Interaction Hub for Indian Agriculture (
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
University of Economics Prague Information Extraction (WP6) Martin Labský MedIEQ meeting Helsinki, 24th October 2006.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Tamil Summary Generation for a Cricket Match
 Copyright 2008 Digital Enterprise Research Institute. All rights reserved. Semantic on the Social Semantic Desktop.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech 이 은 정
Translating Dialects in Search: Mapping between Specialized Languages of Discourse and Documentary Languages Vivien Petras UC Berkeley School of Information.
AGROVOC Thesaurus. 1980s: developed as multilingual structured thesaurus for agricultural terminology (“rice”) : parallel effort to express thesaurus.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Ontology based Information Extraction
Data mining, interactive semantic structuring, and collaboration: A diversity-aware method for sense-making in search Mathias Verbeke, Bettina Berendt,
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
1 Aligning the Parasite Experiment Ontology and the Ontology for Biomedical Investigations Using AgreementMaker Valerie Cross, Cosmin Stroe Xueheng Hu,
Synchronise work on DEXs and reference data between PLCS pilots and OASIS/PLCS Workshop #3 10 – 11 November 2004.
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Marko Grobelnik, Janez Brank, Blaž Fortuna, Igor Mozetič.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Developing GRID Applications GRACE Project
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Prabhakar TV 1 Agropedia Knowledge Models Launch Workshop Prabhakar TV 12 Jan 2009.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Mohammad Alqahtani, Dr. Eric Atwell
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Thai AGROVOC Ontology Base for Agricultural Information Retrieval
Improving Data Discovery Through Semantic Search
RECENT TRENDS IN METADATA GENERATION
CS 430: Information Discovery
A Suite to Compile and Analyze an LSP Corpus
Presentation transcript:

Ontology Based Annotation of Text Segments Presented by Ahmed Rafea Samhaa R. El-Beltagy Maryam Hazman

Agenda Objective Problems related to AGROVOC Requirements of the Proposed Annotation System The Architecture of the Proposed Annotation System Evaluation Conclusion and Future Work

Objective To annotate segments of organizational electronic publications and web pages with domain meta data, using the segments headings and an ontology, to enhance the quality and focus of the information retrieved when these publications are searched.

Problems related to AGROVOC Some Arabic terms are not available though they are available in the English version e.g. Moziac viruses, Physiological disorders,.. Arabic agricultural terminology differs from one country to another, e.g.. Wheat is: حنطة in AGROVOC while it is قمح in Egypt Agricultural entries that are very specific to a country (for example: country specific crop varieties). Some important concepts are missing from AGROVOC all together. E.g. all instances of viral diseases (the entry exists in AGROVOC but has no narrower terms). Inaccurate Arabic term e.g. the term cultivation is translated to فلاحة الأرض which narrows down the actual meaning of the cultivation term. A more accurate translation for this term could have been زراعية مماراسات Some important relationships were found to be missing e.g.The terms ‘Viral diseases’, and ‘Bacterial diseases’ are not listed as narrower terms of ‘Plant Diseases’ Inaccurate relationship The concept نباتات ضارة (Noxious plants) is not listed as a NT of نباتات (Plants), but as a related term to it.

Requirements of the Annotation System It is required to build a system that is capable of: Extending or customizing an existing ontology (automatically / semi-automatically) Extending or customizing an existing ontology (automatically / semi-automatically) Identifying multiple possible descriptors associated with any single segment. Identifying multiple possible descriptors associated with any single segment. Annotating segments with as specific concepts as possible. Annotating segments with as specific concepts as possible. Normalizing input text and the ontology through stemming to facilitate matching. Normalizing input text and the ontology through stemming to facilitate matching.

The Architecture of The Proposed Annotation System HTML Doc Segmentor Segment 1 Segment Segment n Annotator Ontology Annotated Segment Repository Ontology Extender Annotated Segments user

Evaluation An expert was asked to annotate 4088 segments The implemented system run on these segments heading and the results were as follows:- The number of terms added to the ontology was 395 (which is equivalent to 95.6% of the 412 terms added by the expert). The number of terms added to the ontology was 395 (which is equivalent to 95.6% of the 412 terms added by the expert). Precision was 97%, Recall was 91%, and F-score was 94%. Precision was 97%, Recall was 91%, and F-score was 94%. Running the system on another 359 segment headings, without allowing any ontology extension, the results were as follows: the precision was 96%, the recall was 86% and the F-score was 91%. the precision was 96%, the recall was 86% and the F-score was 91%.

Conclusion The results of experiments carried out to evaluate this work, show that it can be used to annotate document segments with a high degree of accuracy. The problems encountered that led to deficiency in the recall were analyzed and currently we are trying to enhance the results accuracy. Some of these problems are due to Ontology and others are due to processing Arabic text. We plan to investigate the use of the generated annotated segments to build classifiers in order to assign labels to segments that have no headings. We explore ontology extraction from information rich documents so as to be able to apply our approach when an initial ontology does not exist.