Understanding User’s Query Intent with Wikipedia G200949016 여 승 후.

Slides:



Advertisements
Similar presentations
Large-Scale Entity-Based Online Social Network Profile Linkage.
Advertisements

Engeniy Gabrilovich and Shaul Markovitch American Association for Artificial Intelligence 2006 Prepared by Qi Li.
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Presented by Zeehasham Rasheed
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Scalable Text Mining with Sparse Generative Models
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Active Learning for Class Imbalance Problem
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Mining fuzzy domain ontology based on concept Vector from wikipedia category network.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Digital libraries and web- based information systems Mohsen Kamyar.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Andy Nguyen Christopher Piech Jonathan Huang Leonidas Guibas. Stanford University.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Exploring Social Tagging Graph for Web Object Classification
Exploiting Wikipedia as External Knowledge for Document Clustering
Erasmus University Rotterdam

Presentation 王睿.
Presented by: Prof. Ali Jaoua
Introduction to Linkify and its Key Technologies
Outline Background Motivation Proposed Model Experimental Results
Information Retrieval and Web Design
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

Understanding User’s Query Intent with Wikipedia G 여 승 후

content Abstract Introduction Wikipedia Methodology Experiments Conclusion

Abstract Understanding the intent behind a user’s query can help search engine to automatically route three major challenges to the query intent classification problem –Domain coverage – Intent representation –Semantic interpretation the statistical machine learning approaches is difficult and often requires many human efforts Propose a general methodology to the problem of query intent classification with Wikipedia

Introduction Intent representation challenge -how to define a semantic representation that can precisely understand and distinguish the intent of the input query Domain coverage challenge –how to clarify the semantic boundary of the intent Semantic interpretation challenge –how to correctly understand the semantic meaning of the input query The motivation of this work is to meet these challenges in query intent classification. Wikipedia, to help understand a user’s query intent.

Wikipedia - introduce the structure of Wikipedia Wikipedia ? multilingual, web-based, free content encyclopedia written collaboratively by more than 75,000 regular editing contributors 1.Article Links - Wikipedia is structured as an interconnected network of articles. contributor can insert a hyperlink between a word or phrase that occurs in the article and corresponding 2. Category Links - both articles and categories can belong to more than one category ex> “Puma” belongs to two categories: “Cat stubs” and “Felines”. 3.Redirect Links - redirect page exists for each alternative name

Wikipedia - introduce the structure of Wikipedia

Methodology - present a new methodology for understanding a user’s query intent using Wikipedia knowledge 1. select a few queries (should be representative queries of the specific domain ) 2.map these queries to Wikipedia concepts 3.browse the categories to which they belong (their sibling concepts and the concepts they link to in Wikipedia ) Result - easily collect a large amount of representative Wikipedia concepts

Experiments - Define and Propose three kinds of query intent applications and then show the results of several experiments 1.Travel Intent Identification 2. Personal Name Intent Identification 3. Job Intent Identification

Experiments Algorithms 1. Supervised learning with seed concepts. using logistic regression based on seed concepts selected from Wikipedia denote this method as “LR”. 2. Supervised learning with concepts expanded with Wikipedia. using logistic regression on selected concepts denote this method as “LRE” 3. Our method the intent predictor part and the random walk runs for 100 iterations denote this method as “WIKI” 4. Our method without the random walk step. denote this method as “WIKI-R”

Random walk algorithm –propagate intent from the seed examples into the Wikipedia ontology – assign an intent score to each Wikipedia concept – obtain an intent probability for each concept in Wikipedia

our method reaches the best performance in terms of F1 threshold of probability based on the tuning set Compared and Results

Conclusion the world’s largest knowledge resource, Wikipedia, to help understand a user’s query intent to minimize the human effort required to investigate the features of a specified domain and understand the users’ intent behind their input queries achieve by mining the structure of Wikipedia and propagating a small number of intent seeds through the Wikipedia structure achieves much better classification accuracy than other approaches.

Q & A