Download presentation
Presentation is loading. Please wait.
Published byJanel Conley Modified over 9 years ago
1
Encyclopaedic Annotation of Text
2
Entity level difficulty All the entities in a document may not be in reader’s knowledge space Lexical difficulty Arises due to presence of difficult word with respect to reader’s level Collaborate work together, Biennially every two years Syntactic difficulty Arises due to use of complex syntactic constructs wrt reader level
3
Annotate text with encyclopaedic references Find important concepts and entities Link identified entities and concepts to respective knowledge sources Entity Linking (EL) problem is the task of linking name mentions in text with their referent entities in a knowledge base Entity Disambiguation (ED) problem An entity may have many referents
4
Word Sense Disambiguation Predicting the sense of a word in a sentence, when the word may have multiple senses Mapping of word to sense EL/ED Models encyclopaedia where important words, phrases or entities in a page are linked to respective informative pages
5
Given a text generate wikipedia-like annotation automatically Resource: Wikify! Linking Documents to Encyclopedic Knowledge, Rada Mihalcea and Andras Csomai
6
Collaborative encyclopaedia Wikipedia article defines and describes an entity or an event consists of a hypertext document with hyperlinks to other pages within or outside Wikipedia uniquely referenced by an identifier ▪ counter for drinks bar (counter) Hyperlink ▪ Unique identifier + anchor text ▪ “Henry Barnard, [[United States|American]] [[educationalist]], was born in [[Hartford, Connecticut]]” Disambiguation page ▪ consist of links to articles defining the different meanings of the entity
8
Keyword extraction follows Wikipedia manual Links to articles that provide deeper understanding of topics like technical terms, names, places etc. Avoid linking terms unrelated to main topic and having no article to explain Avoid too many links
9
Supervised or unsupervised Candidate keywords should be limited to those that have a valid corresponding Wikipedia article keyword vocabulary that contains only the Wikipedia article titles ▪ Augment the list with different morphological forms ▪ dissecting or dissections can be linked to the same article dissection.
10
Unsupervised keyword extraction from document Candidate extraction ▪ From input document extract all possible n-grams that are also present in controlled vocabulary Keyword ranking ▪ Assign score reflecting likelihood of a candidate to be a valuable keyphrase
14
Links can be treated as sense annotations Wiki data has larger coverage of sense annotations of entities (nouns) Presence of huge number of named entities Multi-word expressions (e.g., mother church)
15
Knowledge driven methods Lesk algorithm ▪ most likely meaning for a word in a given context based on a measure of contextual overlap between the dictionary definitions of the ambiguous word and the context ▪ Modelling Wikification as WSD ▪ Dictionary definition wikipedia page ▪ Context paragraph in which the word occurs Data driven methods
16
Document mentions Local approaches disambiguate each mention in a document separately utilize clues such as the textual similarity between the document and each candidate disambiguation’s Wikipedia page Candidate labels mention-to-label compatibility
17
Michael Jeffrey Jordan (born February 17, 1963), also known by his initials, MJ, is an American former professional basketball player. Jordan joined the NBA's Chicago Bulls in 1984. Michael Jordan fuelled the success of Nike's Air Jordan sneakers. He also starred in the 1996 feature film Space Jam as himself.
18
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II. Resource: Local and Global Algorithms for Disambiguation to Wikipedia, Ratinov el al.
19
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.
20
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.
21
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II. Used_In Is_a Succeeded Released
22
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.
23
Document mentions mention-to-label compatibility Inter-label topical coherence Collective Entity Linking Candidate labels
25
Text Document(s)—News, Blogs,… Wikipedia Articles
28
Text Document(s)—News, Blogs,… Wikipedia Articles many-to-one matching in a bipartite graph
29
Γ is a solution to the problem A set of pairs (m,t) m: a mention in the document t: the matched Wikipedia Title Text Document(s)—News, Blogs,… Wikipedia Articles
30
Γ is a solution to the problem A set of pairs (m,t) m: a mention in the document t: the matched Wikipedia Title Text Document(s)—News, Blogs,… Wikipedia Articles Local score of matching the mention to the title
31
A “global” term – evaluating how good the structure of the solution is Text Document(s)—News, Blogs,… Wikipedia Articles
32
Text Document(s)—News, Blogs,… Wikipedia Articles
33
Text Document(s)—News, Blogs,… Wikipedia Articles
35
Augment Mention List Construct Disambiguation Candidates Ranker Linker
40
Text(t) TF-IDF summary of Wikipedia title t Context(t) TF-IDF summary of the context within which t is hyperlinked in Wikipedia Text(d) TF-IDF summary of d containing m Context(m) TF-IDF summary of context window of m Local features cosine-sim(Text(t),Text(m)) cosine-sim(Text(t),Context(m)) cosine-sim(Context(t),Text(m)) cosine-sim(Context(t),Context(m))
42
Wikipedia relatedness measures Normalized Google Distance Pointwise Mutual Information
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.