Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Xinxiong Chen, Yabin Zheng, Maosong Sun 2011, FCCNLL Automatic Keyphrase Extraction by Bridging Vocabulary Gap 1
Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments 2
Intelligent Database Systems Lab Motivation Most methods extract keyphrases according to their statistical properties in the given document. This makes a large vocabulary gap between a document and its keyphrases. ApproachProperty TFIDFstatistical frequencies TextRanktends to statistical frequencies ExpandRanktopic drift LDAsuggest general words 3
Intelligent Database Systems Lab Objectives We use word alignment models in statistical machine translation to learn translation probabilities between the words in documents and the words in keyphrases. 4
Intelligent Database Systems Lab Methodology- Bridging Vocabulary Gap Using WAM 5
Intelligent Database Systems Lab Methodology- Preparing Translation Pairs 6
Intelligent Database Systems Lab Methodology- Title-based Pairs 7
Intelligent Database Systems Lab Methodology- Summary-based Pairs ApproachProperty Sampling methodloses the order split methodLonger training time of WAM 8
Intelligent Database Systems Lab Methodology- Training Translation Models translation pair connection 9
Intelligent Database Systems Lab Methodology- Keyphrase Extraction Noun phrase normalized TFIDF scores 10
Intelligent Database Systems Lab Experiment Dataset: NameArticlekeyphrasesNumber of words Chinese news articles 13702website editors documentstitlessummaries average lengths fold cross validation 11
Intelligent Database Systems Lab Experiment- Evaluation on Keyphrase Extraction Performance Comparison and Analysis 12
Intelligent Database Systems Lab Experiment- Influences of Parameters to TPR Influence of Parameters When Titles/Summaries Are Unavailable 13
Intelligent Database Systems Lab Experiment - Beyond Extraction: Keyphrase Generation 14
Intelligent Database Systems Lab Conclusions We use IBM Model-1 to bridge the vocabulary gap between the two languages for keyphrase generation. 15
Intelligent Database Systems Lab Comments Advantages – Our method can capture the semantic relations between words in documents and keyphrases. Applications – Keyphrase extraction. 16