Semantic Web Course - Semantic Annotation

Slides:



Advertisements
Similar presentations
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Advertisements

Chapter 5: Introduction to Information Retrieval
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Information Retrieval in Practice
Human Language Technologies. Issue Corporate data stores contain mostly natural language materials. Knowledge Management systems utilize rich semantic.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 1: Introduction to Decision Support Systems Decision Support.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Search Engines
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Survey of Semantic Annotation Platforms
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Ontology-Based Information Extraction: Current Approaches.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Dimitrios Skoutas Alkis Simitsis
Semantic Technologies & GATE NSWI Jan Dědek.
Presenter: Shanshan Lu 03/04/2010
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Introduction to the Semantic Web and Linked Data
Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Information Retrieval in Practice
Visual Information Retrieval
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Introduction Multimedia initial focus
Presented by: Hassan Sayyadi
CHAPTER 3 Architectures for Distributed Systems
Social Knowledge Mining
Web Mining Department of Computer Science and Engg.
Text Mining & Natural Language Processing
Hierarchical, Perceptron-like Learning for OBIE
Information Retrieval and Web Design
Information Retrieval and Web Design
Discovering Companies we Know
Presentation transcript:

Semantic Web Course - Semantic Annotation Sadegh Aliakbary Mohammad Amin Badiezadegan Mahdy Khayyamian Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Presentation Outline Semantic annotation overview KIM as a semantic annotation tool A semantic annotation paper review Spring 2007 Semantic Web Course - Semantic Annotation

Need for Semantic Annotation Gartner reported in 2002: 95% of human to computer information input involve textual language (Gartner reported in 2002) Taxonomic and hierarchical knowledge mapping and indexing will be prevalent in almost all information-rich application by 2012 So There is a great gap between these two information representation that should be bridged by Automatic Semantic Annotation Spring 2007 Semantic Web Course - Semantic Annotation

Need for Semantic Annotation (cont.) The semantic web aims to add a machine readable layer to complement the existing web In order to realize this vision, the creation of semantic annotation, the linking of web pages to ontologies must become automatic or semi automatic process Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Definitions The process of tying semantic models and natural language together It is about assigning to entities and relations in the text links to other semantic descriptions in ontologies Spring 2007 Semantic Web Course - Semantic Annotation

Information Extraction (IE) Semantic annotation process involves Information Extraction Information Extraction is a technology based on analyzing natural language in order to extract snippets of information. The process takes text as input and produces fixed format unambiguous data as output Data may be used directly for displaying to users may be stored in a database may be used for indexing purposes in information retrieval systems as internet search engines Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation IE vs. IR an IR system finds relevant texts and presents them to the user; an IE application analyses texts and present only the specific information from them that the user is interested in. IE systems are more difficult and knowledge-intensive to build IE systems are Domain dependent but IR systems are not IE is more computationally intensive than IR IE is more efficient than IR when large amount of text volume is available because it reduces the amount of time people need to read IE is more suitable than IR where results need to be presented in structured unambiguous format Spring 2007 Semantic Web Course - Semantic Annotation

Information types in IE Entities: things in the text, for example people, places, organizations, amounts of money, dates, etc. Mentions: all the places that particular entities are referred to in the text. Descriptions of the entities present. Relations between entities. Events involving the entities Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation IE Example Consider the text : ‘Ryanair announced yesterday that it will make Shannon its next European base, expanding its route network to 14 in an investment worth around 180m. The airline says it will deliver 1.3 million passengers in the first year of the agreement, rising to two million by the fifth year’. Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation IE Example (cont.) IE will discover that ‘Shannon’ and ‘Ryanair’ as entities IE will discover ‘it’ and ‘its’ in the first sentence refer to Ryanair via a process of reference resolution IE will discover descriptive information like ‘Shannon is a European base’ IE will discover relations like ‘Sahanon will be a base of Ryanair’ IE will discover events like ‘Ryanair will invest 180 million euro in Shanon’ Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Entity extraction Entity recognition is the simplest and most reliable IE technology Entity recognition can be performed at up to around 95% accuracy human annotators do not perform to the 100 % level So entity recognition functions at human performance levels Spring 2007 Semantic Web Course - Semantic Annotation

Finding the mentions of entities co reference resolution (CO) is used to identify identity relations between entities in texts. These entities are both those identified by Entity recognition rand anaphoric references to those entities. This process is less relevant to end users than other IE tasks It is a basis for other IE tasks like relation end event extraction It breaks down to two sub problems : anaphoric resolution (e.g., ‘I’ with Ali) proper-noun resolution (e.g., ‘IBM’, ‘IBM Europe’, ‘International Business Machines Ltd) CO resolution is an imprecise process about (50-60%) particularly when applied to the solution of anaphoric reference. Spring 2007 Semantic Web Course - Semantic Annotation

Description Extraction build up on Entity recognition and co reference resolution associating descriptive information with the entities. For example, in a news article the ‘Bush administration’ can be also referred to as ‘government official’ Good scores for Description Extraction systems are around 80% on similar tasks humans can achieve results in the mid 90s. It is weakly domain independent Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Relation Extraction requires the identification of a small number of possible relations between the elements Extraction of relations among entities is a central feature of almost any information extraction task In general Relation Extraction (TR) system scores around 75% Relation Extraction is weakly domain dependent Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Event Extraction Representing information relating to events. It is the prototypical outputs of IE systems, being the original task for which the term was coined. It is is a difficult IE task, the best systems score around 60% and The human score is around 80% It is possible to increase precision at the expense of recall Spring 2007 Semantic Web Course - Semantic Annotation

Event Extraction example Description Extraction may have identified Mr. Smith and Mr. Jones as person entities and a company in a news article Relation Extraction would identify that these people work for the company. Event extraction identifies facts such as they signed a contract on behalf of the company with another supplier company Spring 2007 Semantic Web Course - Semantic Annotation

Realizing the semantic web vision Formally annotate and hyperlink (references to) entities and relations. Index and retrieve documents with respect to entities/relations Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Applications Highlighting Semantic search Categorization Generation of more advanced metadata Smooth traversal between unstructured text and formal knowledge Spring 2007 Semantic Web Course - Semantic Annotation

Ontology based Information Extraction (OBIE) a formal ontology as one of the system’s resources. It may involve reasoning linking it to its semantic description in the instance base.(URI mechanism) which allows entity tracking and description enrichment through the IE process. Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation OBIE subtasks Identification of Instances From the Ontology Automatic Ontology Population Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Presentation Outline Semantic annotation overview KIM as a semantic annotation tool A semantic annotation paper review Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Annotation Tools Categorized as: Traditional Information Extraction (IE) Ontology-based IE (OBIE) Difference OBIE use ontologies as a resource OBIE may also involve reasoning OBIE assign each term its semantic using hyperlink Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Annotation Tools Traditional IE AeroDAML Amilcare MnM S-Cream Ontology-based IE Magpie Pankow SemTag KIM Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation KIM Introduction The Knowledge and Information Management system A software platform for: Automatic semantic annotation, indexing, and retrieval of unstructured and semi-structured content Query and exploration of formal knowledge Co-occurrence tracking and ranking of entities Entity popularity timelines analysis Applications: Generation of meta-data for the Semantic Web Knowledge Management Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation KIM Platform Based on GATE, Sesame, OWLIM, and Lucene The KIM Platform includes: KIM Ontology (KIMO) KIM World KB KIM Server–with API for remote access and integration Clients: KIM Web UI, Plug-in for Internet Explorer Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation World Knowledge Common Knowledge based on social, cultural, historical, and education context. Common sense Common culture Events, famous people, films, companies, … KIM tries to provide common knowledge for most popular entities like the ones appears in the news. KIM knows Locations, Organizations and specific people. Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation KIM Features Annotations are separate from content An API is used for management Populated with 200,000 frequently used entities Mainly locations, their alias, geographic co-ordinates and co-positioning relations Spring 2007 Semantic Web Course - Semantic Annotation

KIM Annotating Process KIM analyzes texts and recognizes references to entities (like persons, organizations, locations, dates). Matches the reference with a known entity, having a unique URI and description. Alternatively, a new URI and description are automatically generated. Finally, the reference in the document gets annotated with the URI of the entity. For each term identifies: Class Alias Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation KIM Clients KIM Plug-in for Internet Explorer Use for semantic annotation Highlight instances in colors KIM Web UI Powerful semantic search interface Address: http://www.ontotext.com/kim/ Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation KIM IE Plug-in Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Annotated Page Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Entity Description Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation KIM Web UI http://62.213.161.156/KIM/screen/KWUIMain.jsp Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Entity Pattern Search Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Presentation Outline Semantic annotation overview KIM as a semantic annotation tool A semantic annotation paper review Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Annotation Methods Manual Rule-based Machine learning based Spring 2007 Semantic Web Course - Semantic Annotation

Two-dimensional Content Annotation methods typically convert the web page into an ‘object’ sequence. And then they utilize techniques to identify a subsequence that we want to annotate. However, information on a web page is usually two-dimensionally laid-out and should not be simply described as a sequence. Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Original Content Spring 2007 Semantic Web Course - Semantic Annotation

One-dimensional Context Spring 2007 Semantic Web Course - Semantic Annotation

Two-dimensional Context Spring 2007 Semantic Web Course - Semantic Annotation

Horizontal and vertical context By context, we mean the surrounding information of the targeted instance. By horizontal context, we mean information left to and right to the targeted instance e.g., the previous tokens and the next tokens. By vertical context, we mean information above and below of the targeted instance e.g., the previous lines and the next lines. Spring 2007 Semantic Web Course - Semantic Annotation

The Annotation Process The process of annotation is done in two stages: Block Detection Text Annotation Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Block Detection A block is a specific informative unit in a document. It can be defined by different granularity, e.g. text line, section, or paragraph. We also assign one or more labels to each block. Each label corresponds to a concept in the ontology. A block can also have no label. Spring 2007 Semantic Web Course - Semantic Annotation

Block Detection (cont.) We define block as a text line. because in our experiments, statistic shows that 99.6% of the targeted instances are in one single text line. In block detection, we detect the label of each block using one classification model . SVM is used for this purpose. Spring 2007 Semantic Web Course - Semantic Annotation

Company Annual Reports Fourteen sections, including “Introduction to Company”, “Company Financial Report”, etc. We will only describe the annotation of the first part (i.e. Section “Introduction to Company”). Section “Introduction to Company” contains company information such as Company-Chinese-Name, Legal-Representative and Office-Address. Spring 2007 Semantic Web Course - Semantic Annotation

Learning to Detect Blocks We view block detection as classification. For each concept, we train a SVM model to detect whether a block contains instance(s) of that concept. Spring 2007 Semantic Web Course - Semantic Annotation

Block Detection Features we define features at token level and line level. Main features in block detection are: Positive Word Features Negative Word Features Special Pattern Features Line Position Feature Number of Words Feature Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Text Annotation An identified block contains at least one instance. We then try to identify the start position and the end position of the targeted instance. Two SVMs are employed for this purpose. Spring 2007 Semantic Web Course - Semantic Annotation

Text Annotation Features Token Features tokens in the previous four positions, the current position, and in the next two positions. Because the previous tokens seem more important in our annotation tasks. Special Pattern Features Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation References Semantic Web Technologies: Trends and Research in Ontology-based Systems. John Davies, Rudi Studer, Paul Warren, 2006 John Wiley & Sons, Ltd http://annotation.semanticweb.org http://www.ontotext.com/kim/ Mingcai Hong, Jie Tang, and Juanzi Li, Semantic Annotation using Horizontal and Vertical Contexts Joachims T., Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning. Schölkopf B. and Burges C. and Smola A. (ed.), MIT-Press, 1999. Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation Thanks Any Question? Spring 2007 Semantic Web Course - Semantic Annotation