Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic Web Course - Semantic Annotation

Similar presentations


Presentation on theme: "Semantic Web Course - Semantic Annotation"— Presentation transcript:

1 Semantic Web Course - Semantic Annotation
Sadegh Aliakbary Mohammad Amin Badiezadegan Mahdy Khayyamian Spring 2007 Semantic Web Course - Semantic Annotation

2 Semantic Web Course - Semantic Annotation
Presentation Outline Semantic annotation overview KIM as a semantic annotation tool A semantic annotation paper review Spring 2007 Semantic Web Course - Semantic Annotation

3 Need for Semantic Annotation
Gartner reported in 2002: 95% of human to computer information input involve textual language (Gartner reported in 2002) Taxonomic and hierarchical knowledge mapping and indexing will be prevalent in almost all information-rich application by 2012 So There is a great gap between these two information representation that should be bridged by Automatic Semantic Annotation Spring 2007 Semantic Web Course - Semantic Annotation

4 Need for Semantic Annotation (cont.)
The semantic web aims to add a machine readable layer to complement the existing web In order to realize this vision, the creation of semantic annotation, the linking of web pages to ontologies must become automatic or semi automatic process Spring 2007 Semantic Web Course - Semantic Annotation

5 Semantic Web Course - Semantic Annotation
Definitions The process of tying semantic models and natural language together It is about assigning to entities and relations in the text links to other semantic descriptions in ontologies Spring 2007 Semantic Web Course - Semantic Annotation

6 Information Extraction (IE)
Semantic annotation process involves Information Extraction Information Extraction is a technology based on analyzing natural language in order to extract snippets of information. The process takes text as input and produces fixed format unambiguous data as output Data may be used directly for displaying to users may be stored in a database may be used for indexing purposes in information retrieval systems as internet search engines Spring 2007 Semantic Web Course - Semantic Annotation

7 Semantic Web Course - Semantic Annotation
IE vs. IR an IR system finds relevant texts and presents them to the user; an IE application analyses texts and present only the specific information from them that the user is interested in. IE systems are more difficult and knowledge-intensive to build IE systems are Domain dependent but IR systems are not IE is more computationally intensive than IR IE is more efficient than IR when large amount of text volume is available because it reduces the amount of time people need to read IE is more suitable than IR where results need to be presented in structured unambiguous format Spring 2007 Semantic Web Course - Semantic Annotation

8 Information types in IE
Entities: things in the text, for example people, places, organizations, amounts of money, dates, etc. Mentions: all the places that particular entities are referred to in the text. Descriptions of the entities present. Relations between entities. Events involving the entities Spring 2007 Semantic Web Course - Semantic Annotation

9 Semantic Web Course - Semantic Annotation
IE Example Consider the text : ‘Ryanair announced yesterday that it will make Shannon its next European base, expanding its route network to 14 in an investment worth around 180m. The airline says it will deliver 1.3 million passengers in the first year of the agreement, rising to two million by the fifth year’. Spring 2007 Semantic Web Course - Semantic Annotation

10 Semantic Web Course - Semantic Annotation
IE Example (cont.) IE will discover that ‘Shannon’ and ‘Ryanair’ as entities IE will discover ‘it’ and ‘its’ in the first sentence refer to Ryanair via a process of reference resolution IE will discover descriptive information like ‘Shannon is a European base’ IE will discover relations like ‘Sahanon will be a base of Ryanair’ IE will discover events like ‘Ryanair will invest 180 million euro in Shanon’ Spring 2007 Semantic Web Course - Semantic Annotation

11 Semantic Web Course - Semantic Annotation
Entity extraction Entity recognition is the simplest and most reliable IE technology Entity recognition can be performed at up to around 95% accuracy human annotators do not perform to the 100 % level So entity recognition functions at human performance levels Spring 2007 Semantic Web Course - Semantic Annotation

12 Finding the mentions of entities
co reference resolution (CO) is used to identify identity relations between entities in texts. These entities are both those identified by Entity recognition rand anaphoric references to those entities. This process is less relevant to end users than other IE tasks It is a basis for other IE tasks like relation end event extraction It breaks down to two sub problems : anaphoric resolution (e.g., ‘I’ with Ali) proper-noun resolution (e.g., ‘IBM’, ‘IBM Europe’, ‘International Business Machines Ltd) CO resolution is an imprecise process about (50-60%) particularly when applied to the solution of anaphoric reference. Spring 2007 Semantic Web Course - Semantic Annotation

13 Description Extraction
build up on Entity recognition and co reference resolution associating descriptive information with the entities. For example, in a news article the ‘Bush administration’ can be also referred to as ‘government official’ Good scores for Description Extraction systems are around 80% on similar tasks humans can achieve results in the mid 90s. It is weakly domain independent Spring 2007 Semantic Web Course - Semantic Annotation

14 Semantic Web Course - Semantic Annotation
Relation Extraction requires the identification of a small number of possible relations between the elements Extraction of relations among entities is a central feature of almost any information extraction task In general Relation Extraction (TR) system scores around 75% Relation Extraction is weakly domain dependent Spring 2007 Semantic Web Course - Semantic Annotation

15 Semantic Web Course - Semantic Annotation
Event Extraction Representing information relating to events. It is the prototypical outputs of IE systems, being the original task for which the term was coined. It is is a difficult IE task, the best systems score around 60% and The human score is around 80% It is possible to increase precision at the expense of recall Spring 2007 Semantic Web Course - Semantic Annotation

16 Event Extraction example
Description Extraction may have identified Mr. Smith and Mr. Jones as person entities and a company in a news article Relation Extraction would identify that these people work for the company. Event extraction identifies facts such as they signed a contract on behalf of the company with another supplier company Spring 2007 Semantic Web Course - Semantic Annotation

17 Realizing the semantic web vision
Formally annotate and hyperlink (references to) entities and relations. Index and retrieve documents with respect to entities/relations Spring 2007 Semantic Web Course - Semantic Annotation

18 Semantic Web Course - Semantic Annotation
Applications Highlighting Semantic search Categorization Generation of more advanced metadata Smooth traversal between unstructured text and formal knowledge Spring 2007 Semantic Web Course - Semantic Annotation

19 Ontology based Information Extraction (OBIE)
a formal ontology as one of the system’s resources. It may involve reasoning linking it to its semantic description in the instance base.(URI mechanism) which allows entity tracking and description enrichment through the IE process. Spring 2007 Semantic Web Course - Semantic Annotation

20 Semantic Web Course - Semantic Annotation
OBIE subtasks Identification of Instances From the Ontology Automatic Ontology Population Spring 2007 Semantic Web Course - Semantic Annotation

21 Semantic Web Course - Semantic Annotation
Presentation Outline Semantic annotation overview KIM as a semantic annotation tool A semantic annotation paper review Spring 2007 Semantic Web Course - Semantic Annotation

22 Semantic Web Course - Semantic Annotation
Annotation Tools Categorized as: Traditional Information Extraction (IE) Ontology-based IE (OBIE) Difference OBIE use ontologies as a resource OBIE may also involve reasoning OBIE assign each term its semantic using hyperlink Spring 2007 Semantic Web Course - Semantic Annotation

23 Semantic Web Course - Semantic Annotation
Annotation Tools Traditional IE AeroDAML Amilcare MnM S-Cream Ontology-based IE Magpie Pankow SemTag KIM Spring 2007 Semantic Web Course - Semantic Annotation

24 Semantic Web Course - Semantic Annotation
KIM Introduction The Knowledge and Information Management system A software platform for: Automatic semantic annotation, indexing, and retrieval of unstructured and semi-structured content Query and exploration of formal knowledge Co-occurrence tracking and ranking of entities Entity popularity timelines analysis Applications: Generation of meta-data for the Semantic Web Knowledge Management Spring 2007 Semantic Web Course - Semantic Annotation

25 Semantic Web Course - Semantic Annotation
KIM Platform Based on GATE, Sesame, OWLIM, and Lucene The KIM Platform includes: KIM Ontology (KIMO) KIM World KB KIM Server–with API for remote access and integration Clients: KIM Web UI, Plug-in for Internet Explorer Spring 2007 Semantic Web Course - Semantic Annotation

26 Semantic Web Course - Semantic Annotation
World Knowledge Common Knowledge based on social, cultural, historical, and education context. Common sense Common culture Events, famous people, films, companies, … KIM tries to provide common knowledge for most popular entities like the ones appears in the news. KIM knows Locations, Organizations and specific people. Spring 2007 Semantic Web Course - Semantic Annotation

27 Semantic Web Course - Semantic Annotation
KIM Features Annotations are separate from content An API is used for management Populated with 200,000 frequently used entities Mainly locations, their alias, geographic co-ordinates and co-positioning relations Spring 2007 Semantic Web Course - Semantic Annotation

28 KIM Annotating Process
KIM analyzes texts and recognizes references to entities (like persons, organizations, locations, dates). Matches the reference with a known entity, having a unique URI and description. Alternatively, a new URI and description are automatically generated. Finally, the reference in the document gets annotated with the URI of the entity. For each term identifies: Class Alias Spring 2007 Semantic Web Course - Semantic Annotation

29 Semantic Web Course - Semantic Annotation
KIM Clients KIM Plug-in for Internet Explorer Use for semantic annotation Highlight instances in colors KIM Web UI Powerful semantic search interface Address: Spring 2007 Semantic Web Course - Semantic Annotation

30 Semantic Web Course - Semantic Annotation
KIM IE Plug-in Spring 2007 Semantic Web Course - Semantic Annotation

31 Semantic Web Course - Semantic Annotation
Annotated Page Spring 2007 Semantic Web Course - Semantic Annotation

32 Semantic Web Course - Semantic Annotation
Entity Description Spring 2007 Semantic Web Course - Semantic Annotation

33 Semantic Web Course - Semantic Annotation
KIM Web UI Spring 2007 Semantic Web Course - Semantic Annotation

34 Semantic Web Course - Semantic Annotation
Entity Pattern Search Spring 2007 Semantic Web Course - Semantic Annotation

35 Semantic Web Course - Semantic Annotation
Presentation Outline Semantic annotation overview KIM as a semantic annotation tool A semantic annotation paper review Spring 2007 Semantic Web Course - Semantic Annotation

36 Semantic Web Course - Semantic Annotation
Annotation Methods Manual Rule-based Machine learning based Spring 2007 Semantic Web Course - Semantic Annotation

37 Two-dimensional Content
Annotation methods typically convert the web page into an ‘object’ sequence. And then they utilize techniques to identify a subsequence that we want to annotate. However, information on a web page is usually two-dimensionally laid-out and should not be simply described as a sequence. Spring 2007 Semantic Web Course - Semantic Annotation

38 Semantic Web Course - Semantic Annotation
Original Content Spring 2007 Semantic Web Course - Semantic Annotation

39 One-dimensional Context
Spring 2007 Semantic Web Course - Semantic Annotation

40 Two-dimensional Context
Spring 2007 Semantic Web Course - Semantic Annotation

41 Horizontal and vertical context
By context, we mean the surrounding information of the targeted instance. By horizontal context, we mean information left to and right to the targeted instance e.g., the previous tokens and the next tokens. By vertical context, we mean information above and below of the targeted instance e.g., the previous lines and the next lines. Spring 2007 Semantic Web Course - Semantic Annotation

42 The Annotation Process
The process of annotation is done in two stages: Block Detection Text Annotation Spring 2007 Semantic Web Course - Semantic Annotation

43 Semantic Web Course - Semantic Annotation
Block Detection A block is a specific informative unit in a document. It can be defined by different granularity, e.g. text line, section, or paragraph. We also assign one or more labels to each block. Each label corresponds to a concept in the ontology. A block can also have no label. Spring 2007 Semantic Web Course - Semantic Annotation

44 Block Detection (cont.)
We define block as a text line. because in our experiments, statistic shows that 99.6% of the targeted instances are in one single text line. In block detection, we detect the label of each block using one classification model . SVM is used for this purpose. Spring 2007 Semantic Web Course - Semantic Annotation

45 Company Annual Reports
Fourteen sections, including “Introduction to Company”, “Company Financial Report”, etc. We will only describe the annotation of the first part (i.e. Section “Introduction to Company”). Section “Introduction to Company” contains company information such as Company-Chinese-Name, Legal-Representative and Office-Address. Spring 2007 Semantic Web Course - Semantic Annotation

46 Learning to Detect Blocks
We view block detection as classification. For each concept, we train a SVM model to detect whether a block contains instance(s) of that concept. Spring 2007 Semantic Web Course - Semantic Annotation

47 Block Detection Features
we define features at token level and line level. Main features in block detection are: Positive Word Features Negative Word Features Special Pattern Features Line Position Feature Number of Words Feature Spring 2007 Semantic Web Course - Semantic Annotation

48 Semantic Web Course - Semantic Annotation
Text Annotation An identified block contains at least one instance. We then try to identify the start position and the end position of the targeted instance. Two SVMs are employed for this purpose. Spring 2007 Semantic Web Course - Semantic Annotation

49 Text Annotation Features
Token Features tokens in the previous four positions, the current position, and in the next two positions. Because the previous tokens seem more important in our annotation tasks. Special Pattern Features Spring 2007 Semantic Web Course - Semantic Annotation

50 Semantic Web Course - Semantic Annotation
Spring 2007 Semantic Web Course - Semantic Annotation

51 Semantic Web Course - Semantic Annotation
References Semantic Web Technologies: Trends and Research in Ontology-based Systems. John Davies, Rudi Studer, Paul Warren, 2006 John Wiley & Sons, Ltd Mingcai Hong, Jie Tang, and Juanzi Li, Semantic Annotation using Horizontal and Vertical Contexts Joachims T., Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning. Schölkopf B. and Burges C. and Smola A. (ed.), MIT-Press, 1999. Spring 2007 Semantic Web Course - Semantic Annotation

52 Semantic Web Course - Semantic Annotation
Thanks Any Question? Spring 2007 Semantic Web Course - Semantic Annotation


Download ppt "Semantic Web Course - Semantic Annotation"

Similar presentations


Ads by Google