Download presentation
Presentation is loading. Please wait.
Published byCurtis Perkins Modified over 9 years ago
1
Wikipedia Knowledge Extraction
2
Pronoun Resolution module Infobox extraction SRL parsing Improved refinement Clustering Hadoop compatibility
3
“His mother wanted him to get a good education so she sent him to live with his grandparents in Honolulu, HI” (Barack Obama)
4
Current solution: replace pronouns with article title (very primitive) Target solution: ◦ Nobody in the world has solved this yet ◦ Use an existing system that is usually correct? ◦ Simple rules for common patterns?
5
Convert information into simple sentences: ◦ Joe Biden is Barack Obama’s Vice President ◦ Barack Obama is preceded by George W. Bush Use type of phrase (Noun Phrase, Verb Phrase) to determine sentence to form. Read papers from Turing Center (University of Washington)
6
Performs a deep analysis on each sentence. E.g. “Yoshi has a long tongue which he uses to grab enemies and eat them.” ◦ has (A0: Yoshi, A1: long tongue) ◦ use (A0: Yoshi, A1: long tongue, A2: grab enemies and eat them) Use SRL parsing to improve quality and representation of knowledge. Problem: speed and complexity
7
Current system has Subject, Object, Verb tuples Problem: hard to define what words to incorporate in each phrase E.g. “'The dog ( Canis lupus familiaris )' 'is' 'a mammal from the family Canidae‘” ◦ The dog? dog? The dog ( Canis lupus familiaris )? ◦ a mammal? a mammal from the family Canidae? Possible solutions: ◦ Different levels of information? ◦ Simple rules based on part of speech tags?
8
Idea: Determine whether two separate mentions point to the same concept ◦ ‘The dog’, ‘a dog’, ‘dogs’ ◦ ‘Cats’, ‘C.A.T.S’, ‘CAT Scan’ ◦ ‘President Obama’, ‘President Barack Obama’ Possible solutions: ◦ Feature-based classification ◦ Self organizing map ◦ Terms associated
9
Need to ensure scaling is possible for move to regular Wikipedia Hadoop is an open source implementation of the Map-Reduce algorithm Map-Reduce is an algorithm that parallelizes a process by splitting its iterations over several machines
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.