Semantic Web Course - Semantic Annotation

Name: Semantic Web Course - Semantic Annotation
Uploaded: 2017-07-30T23:40:17+00:00
Duration: PTM24S29
Channel: Clinton Griffin Bradford
Description: Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation
Sadegh Aliakbary Mohammad Amin Badiezadegan Mahdy Khayyamian Spring 2007 Semantic Web Course - Semantic Annotation

Presentation Outline Semantic annotation overview KIM as a semantic annotation tool A semantic annotation paper review Spring 2007 Semantic Web Course - Semantic Annotation

Need for Semantic Annotation
Gartner reported in 2002: 95% of human to computer information input involve textual language (Gartner reported in 2002) Taxonomic and hierarchical knowledge mapping and indexing will be prevalent in almost all information-rich application by 2012 So There is a great gap between these two information representation that should be bridged by Automatic Semantic Annotation Spring 2007 Semantic Web Course - Semantic Annotation

Need for Semantic Annotation (cont.)
The semantic web aims to add a machine readable layer to complement the existing web In order to realize this vision, the creation of semantic annotation, the linking of web pages to ontologies must become automatic or semi automatic process Spring 2007 Semantic Web Course - Semantic Annotation

Definitions The process of tying semantic models and natural language together It is about assigning to entities and relations in the text links to other semantic descriptions in ontologies Spring 2007 Semantic Web Course - Semantic Annotation

Information Extraction (IE)
Semantic annotation process involves Information Extraction Information Extraction is a technology based on analyzing natural language in order to extract snippets of information. The process takes text as input and produces fixed format unambiguous data as output Data may be used directly for displaying to users may be stored in a database may be used for indexing purposes in information retrieval systems as internet search engines Spring 2007 Semantic Web Course - Semantic Annotation

IE vs. IR an IR system finds relevant texts and presents them to the user; an IE application analyses texts and present only the specific information from them that the user is interested in. IE systems are more difficult and knowledge-intensive to build IE systems are Domain dependent but IR systems are not IE is more computationally intensive than IR IE is more efficient than IR when large amount of text volume is available because it reduces the amount of time people need to read IE is more suitable than IR where results need to be presented in structured unambiguous format Spring 2007 Semantic Web Course - Semantic Annotation

Information types in IE
Entities: things in the text, for example people, places, organizations, amounts of money, dates, etc. Mentions: all the places that particular entities are referred to in the text. Descriptions of the entities present. Relations between entities. Events involving the entities Spring 2007 Semantic Web Course - Semantic Annotation

IE Example Consider the text : ‘Ryanair announced yesterday that it will make Shannon its next European base, expanding its route network to 14 in an investment worth around 180m. The airline says it will deliver 1.3 million passengers in the first year of the agreement, rising to two million by the fifth year’. Spring 2007 Semantic Web Course - Semantic Annotation

IE Example (cont.) IE will discover that ‘Shannon’ and ‘Ryanair’ as entities IE will discover ‘it’ and ‘its’ in the first sentence refer to Ryanair via a process of reference resolution IE will discover descriptive information like ‘Shannon is a European base’ IE will discover relations like ‘Sahanon will be a base of Ryanair’ IE will discover events like ‘Ryanair will invest 180 million euro in Shanon’ Spring 2007 Semantic Web Course - Semantic Annotation

Entity extraction Entity recognition is the simplest and most reliable IE technology Entity recognition can be performed at up to around 95% accuracy human annotators do not perform to the 100 % level So entity recognition functions at human performance levels Spring 2007 Semantic Web Course - Semantic Annotation

Finding the mentions of entities
co reference resolution (CO) is used to identify identity relations between entities in texts. These entities are both those identified by Entity recognition rand anaphoric references to those entities. This process is less relevant to end users than other IE tasks It is a basis for other IE tasks like relation end event extraction It breaks down to two sub problems : anaphoric resolution (e.g., ‘I’ with Ali) proper-noun resolution (e.g., ‘IBM’, ‘IBM Europe’, ‘International Business Machines Ltd) CO resolution is an imprecise process about (50-60%) particularly when applied to the solution of anaphoric reference. Spring 2007 Semantic Web Course - Semantic Annotation

Description Extraction
build up on Entity recognition and co reference resolution associating descriptive information with the entities. For example, in a news article the ‘Bush administration’ can be also referred to as ‘government official’ Good scores for Description Extraction systems are around 80% on similar tasks humans can achieve results in the mid 90s. It is weakly domain independent Spring 2007 Semantic Web Course - Semantic Annotation

Relation Extraction requires the identification of a small number of possible relations between the elements Extraction of relations among entities is a central feature of almost any information extraction task In general Relation Extraction (TR) system scores around 75% Relation Extraction is weakly domain dependent Spring 2007 Semantic Web Course - Semantic Annotation

Event Extraction Representing information relating to events. It is the prototypical outputs of IE systems, being the original task for which the term was coined. It is is a difficult IE task, the best systems score around 60% and The human score is around 80% It is possible to increase precision at the expense of recall Spring 2007 Semantic Web Course - Semantic Annotation

Event Extraction example
Description Extraction may have identified Mr. Smith and Mr. Jones as person entities and a company in a news article Relation Extraction would identify that these people work for the company. Event extraction identifies facts such as they signed a contract on behalf of the company with another supplier company Spring 2007 Semantic Web Course - Semantic Annotation

Realizing the semantic web vision
Formally annotate and hyperlink (references to) entities and relations. Index and retrieve documents with respect to entities/relations Spring 2007 Semantic Web Course - Semantic Annotation

Applications Highlighting Semantic search Categorization Generation of more advanced metadata Smooth traversal between unstructured text and formal knowledge Spring 2007 Semantic Web Course - Semantic Annotation

Ontology based Information Extraction (OBIE)
a formal ontology as one of the system’s resources. It may involve reasoning linking it to its semantic description in the instance base.(URI mechanism) which allows entity tracking and description enrichment through the IE process. Spring 2007 Semantic Web Course - Semantic Annotation

OBIE subtasks Identification of Instances From the Ontology Automatic Ontology Population Spring 2007 Semantic Web Course - Semantic Annotation

Annotation Tools Categorized as: Traditional Information Extraction (IE) Ontology-based IE (OBIE) Difference OBIE use ontologies as a resource OBIE may also involve reasoning OBIE assign each term its semantic using hyperlink Spring 2007 Semantic Web Course - Semantic Annotation

Annotation Tools Traditional IE AeroDAML Amilcare MnM S-Cream Ontology-based IE Magpie Pankow SemTag KIM Spring 2007 Semantic Web Course - Semantic Annotation

KIM Introduction The Knowledge and Information Management system A software platform for: Automatic semantic annotation, indexing, and retrieval of unstructured and semi-structured content Query and exploration of formal knowledge Co-occurrence tracking and ranking of entities Entity popularity timelines analysis Applications: Generation of meta-data for the Semantic Web Knowledge Management Spring 2007 Semantic Web Course - Semantic Annotation

KIM Platform Based on GATE, Sesame, OWLIM, and Lucene The KIM Platform includes: KIM Ontology (KIMO) KIM World KB KIM Server–with API for remote access and integration Clients: KIM Web UI, Plug-in for Internet Explorer Spring 2007 Semantic Web Course - Semantic Annotation

World Knowledge Common Knowledge based on social, cultural, historical, and education context. Common sense Common culture Events, famous people, films, companies, … KIM tries to provide common knowledge for most popular entities like the ones appears in the news. KIM knows Locations, Organizations and specific people. Spring 2007 Semantic Web Course - Semantic Annotation

KIM Features Annotations are separate from content An API is used for management Populated with 200,000 frequently used entities Mainly locations, their alias, geographic co-ordinates and co-positioning relations Spring 2007 Semantic Web Course - Semantic Annotation

KIM Annotating Process
KIM analyzes texts and recognizes references to entities (like persons, organizations, locations, dates). Matches the reference with a known entity, having a unique URI and description. Alternatively, a new URI and description are automatically generated. Finally, the reference in the document gets annotated with the URI of the entity. For each term identifies: Class Alias Spring 2007 Semantic Web Course - Semantic Annotation

KIM Clients KIM Plug-in for Internet Explorer Use for semantic annotation Highlight instances in colors KIM Web UI Powerful semantic search interface Address: Spring 2007 Semantic Web Course - Semantic Annotation

KIM IE Plug-in Spring 2007 Semantic Web Course - Semantic Annotation

Annotated Page Spring 2007 Semantic Web Course - Semantic Annotation

Entity Description Spring 2007 Semantic Web Course - Semantic Annotation

KIM Web UI Spring 2007 Semantic Web Course - Semantic Annotation

Entity Pattern Search Spring 2007 Semantic Web Course - Semantic Annotation

Annotation Methods Manual Rule-based Machine learning based Spring 2007 Semantic Web Course - Semantic Annotation

Two-dimensional Content
Annotation methods typically convert the web page into an ‘object’ sequence. And then they utilize techniques to identify a subsequence that we want to annotate. However, information on a web page is usually two-dimensionally laid-out and should not be simply described as a sequence. Spring 2007 Semantic Web Course - Semantic Annotation

Original Content Spring 2007 Semantic Web Course - Semantic Annotation

One-dimensional Context
Spring 2007 Semantic Web Course - Semantic Annotation

Two-dimensional Context

Horizontal and vertical context
By context, we mean the surrounding information of the targeted instance. By horizontal context, we mean information left to and right to the targeted instance e.g., the previous tokens and the next tokens. By vertical context, we mean information above and below of the targeted instance e.g., the previous lines and the next lines. Spring 2007 Semantic Web Course - Semantic Annotation

The Annotation Process
The process of annotation is done in two stages: Block Detection Text Annotation Spring 2007 Semantic Web Course - Semantic Annotation

Block Detection A block is a specific informative unit in a document. It can be defined by different granularity, e.g. text line, section, or paragraph. We also assign one or more labels to each block. Each label corresponds to a concept in the ontology. A block can also have no label. Spring 2007 Semantic Web Course - Semantic Annotation

Block Detection (cont.)
We define block as a text line. because in our experiments, statistic shows that 99.6% of the targeted instances are in one single text line. In block detection, we detect the label of each block using one classification model . SVM is used for this purpose. Spring 2007 Semantic Web Course - Semantic Annotation

Company Annual Reports
Fourteen sections, including “Introduction to Company”, “Company Financial Report”, etc. We will only describe the annotation of the first part (i.e. Section “Introduction to Company”). Section “Introduction to Company” contains company information such as Company-Chinese-Name, Legal-Representative and Office-Address. Spring 2007 Semantic Web Course - Semantic Annotation

Learning to Detect Blocks
We view block detection as classification. For each concept, we train a SVM model to detect whether a block contains instance(s) of that concept. Spring 2007 Semantic Web Course - Semantic Annotation

Block Detection Features
we define features at token level and line level. Main features in block detection are: Positive Word Features Negative Word Features Special Pattern Features Line Position Feature Number of Words Feature Spring 2007 Semantic Web Course - Semantic Annotation

Text Annotation An identified block contains at least one instance. We then try to identify the start position and the end position of the targeted instance. Two SVMs are employed for this purpose. Spring 2007 Semantic Web Course - Semantic Annotation

Text Annotation Features
Token Features tokens in the previous four positions, the current position, and in the next two positions. Because the previous tokens seem more important in our annotation tasks. Special Pattern Features Spring 2007 Semantic Web Course - Semantic Annotation

References Semantic Web Technologies: Trends and Research in Ontology-based Systems. John Davies, Rudi Studer, Paul Warren, 2006 John Wiley & Sons, Ltd Mingcai Hong, Jie Tang, and Juanzi Li, Semantic Annotation using Horizontal and Vertical Contexts Joachims T., Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning. Schölkopf B. and Burges C. and Smola A. (ed.), MIT-Press, 1999. Spring 2007 Semantic Web Course - Semantic Annotation

Thanks Any Question? Spring 2007 Semantic Web Course - Semantic Annotation

Semantic Web Course - Semantic Annotation

Similar presentations

Presentation on theme: "Semantic Web Course - Semantic Annotation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Semantic Web Course - Semantic Annotation

Similar presentations

Presentation on theme: "Semantic Web Course - Semantic Annotation"— Presentation transcript:

Similar presentations

About project

Feedback