Wiki3C: Exploiting Wikipedia for Context-aware Concept Categorization

Slides:



Advertisements
Similar presentations
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Advertisements

Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Sumblr: Continuous Summarization of Evolving Tweet Streams
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
A UTOMATICALLY A CQUIRING A S EMANTIC N ETWORK OF R ELATED C ONCEPTS Date: 2011/11/14 Source: Sean Szumlanski et. al (CIKM’10) Advisor: Jia-ling, Koh Speaker:
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao.
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.
1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou KBS Computing.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Using Inactivity to Detect Unusual behavior Presenter : Siang Wang Advisor : Dr. Yen - Ting Chen Date : Motion and video Computing, WMVC.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
A Knowledge-Based Search Engine Powered by Wikipedia David Milne, Ian H. Witten, David M. Nichols (CIKM 2007)
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Unsupervised Learning with Mixed Numeric and Nominal Data.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
Corpus Exploitation from Wikipedia for Ontology Construction Gaoying Cui, Qin Lu, Wenjie Li, Yirong Chen The Department of Computing The Hong Kong Polytechnic.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
Customized of Social Media Contents using Focused Topic Hierarchy
Queensland University of Technology
Click Through Rate Prediction for Local Search Results
Where Did You Go: Personalized Annotation of Mobility Records
Open question answering over curated and extracted knowledge bases
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Summarizing answers in non-factoid community Question-answering
Presentation 王睿.
A Large Scale Prediction Engine for App Install Clicks and Conversions
Extracting Semantic Concept Relations
Localized Scheduling for End-to-End Delay
Intent-Aware Semantic Query Annotation
Measuring the Latency of Depression Detection in Social Media
Identifying Decision Makers from Professional Social Networks
A Framework for Benchmarking Entity-Annotation Systems
Intent-Aware Semantic Query Annotation
Sourse: Www 2017 Advisor: Jia-Ling Koh Speaker: Hsiu-Yi,Chu
Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen
Enriching Taxonomies With Functional Domain Knowledge
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
HeteroMed: Heterogeneous Information Network for Medical Diagnosis
TOPTRAC: Topical Trajectory Pattern Mining
Deep Interest Network for Click-Through Rate Prediction
Heterogeneous Graph Attention Network
Preference Based Evaluation Measures for Novelty and Diversity
Presentation transcript:

Wiki3C: Exploiting Wikipedia for Context-aware Concept Categorization Date: 2014/03/04 Author: Peng Jiang, Huiman Hou, Lijiang Chen, Shimin Chen, Conglei Yao, Chengkai Li, Min Wang Source: WSDM’ 13 Advisor: Jia-Ling Koh Speaker: Yi-Hsuan Yeh

Outline Introduction Wiki3C: Context-aware Concept Categorization Experiments Conclusion

Introduction Wikipedia is a rich human-generated knowledge base containing over 21 million articles organized into millions of categories. A concept in Wikipedia usually has many categories. The categories bear different importance in different contexts.

Introduction cont’s

Introduction cont’s Goal: In this task, we are interested in ranking categories for a concept to determine which categories describe it better with respect to a particular textual context.

Wiki3C: context-aware concept categorization Framework Category Ranking Child Article Selection Split Article Selection Compute Relatedness Basic Model Probabilistic Model

Framework Categories Child Article Split Article

Category Ranking 𝑡 𝑖 :𝑡𝑎𝑟𝑔𝑒𝑡 𝑐𝑜𝑛𝑐𝑒𝑝𝑡 𝑡 ′ :𝑐𝑜𝑛𝑡𝑒𝑥𝑡𝑢𝑎𝑙 𝑐𝑜𝑛𝑐𝑒𝑝𝑡 Contextual concepts Target concept Child articles Split articles 𝑡 𝑖 :𝑡𝑎𝑟𝑔𝑒𝑡 𝑐𝑜𝑛𝑐𝑒𝑝𝑡 𝑡 ′ :𝑐𝑜𝑛𝑡𝑒𝑥𝑡𝑢𝑎𝑙 𝑐𝑜𝑛𝑐𝑒𝑝𝑡 𝑐 𝑖𝑗 :𝑎 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦

Child Article Selection Use K articles with the highest relatedness with other articles in the category to represent the category.

Child Article Selection cont’s Category: “Film characters” Assume: K: 2 Link threshold: 10 Blade (comic) 15 0.5 0.8 Blade (comic) 1.3 Captain America 1.0 Banshee (comic) 0.7 Ghost Rider (Johnny Blaze) 8 Captain America 23 0.2 Banshee (comic) 18 𝑟𝑒𝑙𝑎𝑡𝑒𝑑𝑛𝑒𝑠𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑖𝑛𝑘𝑠 𝑖𝑡 ℎ𝑎𝑣𝑒

Child Article Selection cont’s The relatedness between a concept and filtered child articles: If the number of filtered child articles less then K 𝑡:𝑎 𝑐𝑜𝑛𝑐𝑒𝑝𝑡 𝑡 𝑖 :𝑎 𝑓𝑖𝑙𝑡𝑒𝑟𝑒𝑑 𝑐ℎ𝑖𝑙𝑑 𝑎𝑟𝑡𝑖𝑐𝑙𝑒 𝑛 ′ :𝑡ℎ𝑒 𝑛𝑢𝑚𝑒𝑟 𝑜𝑓 𝑓𝑖𝑙𝑡𝑒𝑟𝑒𝑑 𝑐ℎ𝑖𝑙𝑑 𝑎𝑟𝑡𝑖𝑐𝑙𝑒 𝐫 𝒕, 𝒕 ′ :𝒓𝒆𝒍𝒂𝒕𝒆𝒅𝒏𝒆𝒔𝒔 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏

Split Article Selection Select the split article with the maximum relatedness with concept to represent the category. 𝑡:𝑎 𝑐𝑜𝑛𝑐𝑒𝑝𝑡 𝑡 𝑖 :𝑎 𝑠𝑝𝑙𝑖𝑡 𝑎𝑟𝑡𝑖𝑐𝑙𝑒 𝐫 𝒕, 𝒕 ′ :𝒓𝒆𝒍𝒂𝒕𝒆𝒅𝒏𝒆𝒔𝒔 𝒇𝒖𝒏𝒄𝒕𝒊𝒐𝒏

Compute Relatedness - Basic Model The relatedness measuring is based on the comparison of link structures in Wikipedia articles. Inlink (incoming link) Outlink (outgoing link)

Compute Relatedness cont’s - Basic Model A concept’s Wikipedia article 𝑡 𝑖 Inlink and outlink articles 𝑎1, 𝑎3, 𝑎4, 𝑎8 𝑟 𝑡 𝑖 , 𝑡 𝑗 = 2 8 a filtered child article 𝑡 𝑗 𝑎2, 𝑎3, 𝑎5,𝑎6, 𝑎8,𝑎9

Compute Relatedness cont’s - Probabilistic Model Represent a concept t as a probability distribution over links. 𝑡:𝑎 𝑐𝑜𝑛𝑐𝑒𝑝𝑡 𝑡 :𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑖𝑛𝑘𝑠 𝑖𝑛 𝑡 𝑛 𝑙𝑖𝑛𝑘;𝑡 :𝑡ℎ𝑒 𝑛𝑢𝑚𝑒𝑟 𝑜𝑓 link 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡 ′ 𝑠 𝑎𝑟𝑡𝑖𝑐𝑙𝑒 𝜇:𝑎 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝐶:𝑡ℎ𝑒 𝑠𝑒𝑡 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑒𝑠 𝑡ℎ𝑎𝑡 𝑡 𝑏𝑒𝑙𝑜𝑛𝑠 𝑡𝑜

Compute Relatedness cont’s - Probabilistic Model A concept t Assume: link: “US” 𝜇: 1000 Number of all link / link ”US” 10 / 2 𝑃 𝑈𝑆|𝐶 = 20+30+10 100+120+80 = 1 5 t’s categories 𝑛 𝑈𝑆;𝑡 =2 𝑡 =10 𝑐 1 100 / 20 𝑐 2 𝑃 𝑈𝑆| 𝜃 𝑡 = 2+1000∗ 1 5 10+1000 = 1 5 120 / 30 𝑐 3 80 / 10 𝐶

Compute Relatedness cont’s - Probabilistic Model KL-divergence If 𝑡𝑖 and 𝑡𝑗 are the same concept, 𝐷(𝜃𝑖 || 𝜃𝑗) equals 0. Use the negative KL-divergence to measure the relatedness.

Experiments Data set Experimental Results

Data Set and Evaluation 3072 concepts 39044 categories 7780 relevant categories (human selected) MAP, R-precision, bpref

Experimental Results

Experimental Results cont’s 0.7

Experimental Results cont’s 8

conclusion

Conclusion This paper proposes an unsupervised learning solution to the task of context-aware concept categorization, named Wiki3C. Two strategies of article selection are chosen to represent category. Use a probabilistic model to compute the semantic relatedness between concepts. Experimental results prove the effectiveness of Wiki3C.

bpref 1 A R 2 C I 3 B 4 X 5 K 6 D 7 N 8 F 9 G 10 E 𝑏𝑝𝑟𝑒𝑓= 1 10 1− 0 4 + 1− 1 4 + 1− 1 4 + 1− 1 4 + 1− 1 4 + 1− 2 4 + 1− 2 4 + 1− 3 4 + 1− 4 4 + 1− 4 4 + 1− 1 4 + 1− 1 4 + 1− 2 4 + 1− 2 4 1− 0 4 + 1− 1 4 + 1− 1 4 + 1− 1 4 + 1− 1 4 + 1− 2 4 + 1− 2 4 + 1− 3 4 + 1− 4 4 + 1− 4 4