Download presentation
Presentation is loading. Please wait.
Published byKevin Church Modified over 9 years ago
1
Distant Supervision for Knowledge Base Population Mihai Surdeanu, David McClosky, John Bauer, Julie Tibshirani, Angel Chang, Valentin Spitkovsky, Christopher Manning
2
Definition and Approach We took part in TAC KBP 2010 this year (both tasks) Slot filling task: learning a pre-defined set of relations and attributes for target entities based on documents in a collection – “Warren Buffett began studying at the Warton School of Finance at the University of Pennsylvania, but transferred to the University of Nebraska where he graduated.” (per:schools_attended, Warren Buffett, University of Pennsylvania) (per:schools_attended, Warren Buffett, University of Nebraska Distant supervision approach: generate training data automatically from Wikipedia infoboxes
3
Infobox KB Map infobox fields to KBP slots (one to many mapping) IR: find relevant sentences Query: entity name + slot value Extract +/- slot candidates Train multiclass classifier Map KBP slots to fine-grained NE labels KBP query: entity name IR: find relevant sentences Query: entity name + trigger words Extract slot candidates Classify candidates Inference (greedy, local) TrainingEvaluation Extracted slots
4
Results LabelCorrectPredictActualPRF1 UNRELATED26808528913529559092.790.791.7 org:city_of_ headquarters58359040751464.577.770.5 org:country_of_ headquarters28514638372561.576.568.2 org:founded38968199666247.558.552.4 org:parents11582292252550.545.948.1 org:top_members/empl oyees12823067359641.835.738.5 per:city_of_birth17993920325245.955.350.2 per:country_of_birth19844122320448.161.954.2 per:date_of_birth39385427436272.690.380.5 per:member_of17713018288758.761.360 per:title1714336430545156.153.4 Total3716968822623675459.656.7 Training on 2/3 of infoboxes, evaluating on 1/3 Evaluating only on sentences that contain at least a valid slot Top 10 most common slots Total for all slots
5
Challenges Improve quality of data generated through distant supervision Improve IR recall – Use relation-specific trigger words (or n-grams or dependency paths etc.) to boost sentences likely to contain answers to the top – How to acquire these automatically? Better classifiers for noisy text (e.g., web snippets)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.