Improving Search Engines Using Human Computation Games (CIKM09) Date: 2011/3/28 Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting 1.

Slides:



Advertisements
Similar presentations
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Advertisements

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
Overview of Search Engines
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Evaluation David Kauchak cs458 Fall 2012 adapted from:
 Clustering of Web Documents Jinfeng Chen. Zhong Su, Qiang Yang, HongHiang Zhang, Xiaowei Xu and Yuhen Hu, Correlation- based Document Clustering using.
SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Volunteer?. What’s the population of Raleigh? How many people live in Raleigh?
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Grouping search-engine returned citations for person-name queries Reema Al-Kamha, David W. Embley (Proceedings of the 6th annual ACM international workshop.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
Flashcard Application —A facebook application with multiple purposes Aobo Wang 1.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
New and Improved: Modeling Versions to Improve App Recommendation Date: 2014/10/2 Author: Jovian Lin, Kazunari Sugiyama, Min-Yen Kan, Tat-Seng Chua Source:
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
ON THE SELECTION OF TAGS FOR TAG CLOUDS (WSDM11) Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting Date:2011/06/20 1.
From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation Oisin Boydell, Barry Smyth Adaptive Information Cluster,
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
LOGO 1 Mining Templates from Search Result Records of Search Engines Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hongkun Zhao, Weiyi.
Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Post-Ranking query suggestion by diversifying search Chao Wang.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Advisor: Koh Jia-Ling Nonhlanhla Shongwe EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Multi-Aspect Query Summarization by Composite Query Date: 2013/03/11 Author: Wei Song, Qing Yu, Zhiheng Xu, Ting Liu, Sheng Li, Ji-Rong Wen Source: SIGIR.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
{ Adaptive Relevance Feedback in Information Retrieval Yuanhua Lv and ChengXiang Zhai (CIKM ‘09) Date: 2010/10/12 Advisor: Dr. Koh, Jia-Ling Speaker: Lin,
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
Improving Search Relevance for Short Queries in Community Question Answering Date: 2014/09/25 Author : Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Information Retrieval and Web Design
Presentation transcript:

Improving Search Engines Using Human Computation Games (CIKM09) Date: 2011/3/28 Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting 1

Preview Introduction Page Hunt design Experiment Data analysis and discussion Conclusions 2

Introduction Web search engines have become an integral part of our everyday lives. Evaluation methods are usually based on measuring the relevance of documents to queries. 3

Introduction One is to use large hand-annotated evaluation. Another is to use implicit measures of relevance. ex: clickthrough….. 4

Introduction 王建民 DATA 5

Introduction we seek to develop an efficient and effective human computation model for improving web search. Page Hunt, a single-player game. 6

Page Hunt design Consider about….. – Was the game fun?? – Is the data we get from Page Hunt comparable to data we get from other sources? – Can we define a process that extracts useful information from the data? 7

Page Hunt design Player get random pages The player view the page and types in a word as query Match success -ful? YES No Match with top 5 result Not in top 5 result 1 : : 90 3 : 80 … Edits/a dd query Depend by bitext matching. If change successful then get bonus. ……... YES Match with top 5 result Depend by bitext matching. If change successful then get bonus. ……... it’s a LOOP 8

Page Hunt design Making Page Hunt fun – Timed Response 3 mins – Score Keeping and a Leader board – Frequent queries Get bonus points if they avoid using these frequent terms. – Randomness web pages Bonuses (10%-double 5%-triple) 9

10

Page Race and Page Match Page Race – A two-player competitive game, is a natural extension of Page Hunt. Page Match – A collaborative game where two players are randomly paired up. 11

12

Page Race Page Hunt Page Match 13

Experiment A period of ten days 10,000 people Generating over 123,000 non-null labels on the 698 URLs in the system. On average every player contributed over 12 labels, and every URL has over 170 labels. 14

Data analysis and discussion Analysis about….. – Was the game fun?? – Is the data we get from Page Hunt comparable to data we get from other sources? – Can we define a process that extracts useful information from the data? 15

Was the game fun ?? Players sent comments like – “addictive” – “I love the app. It has asticky, viral nature to it” – “Fun game!” – “…It’s a great game.Passing it along …”. – The players who were at the top of the leader board kept playing to keep their names on the leader board. Judging from these numbers, and from the comments we got, this game seems to be fun. 16

Is our data better than others?? Nature of queries elicited URL Findability 17

Nature of queries elicited Shongwe Nonhlanhla Shongwe Nonhlanhla super beauty Shon 、 gwe 、 Nonh 、 lanhla 18

Nature of queries elicited Took a random sample of successful 100 pairs. Two of writer analyzed these 100 pairs and categorized the queries as OK, Over-specified or Underspecified. 78% of the queries were OK, 15% were overspecified and 7% were under-specified. 19

URL Findability Findability Levels – 10.0% of the 698 URLs have >70% findability 11.3% have <10% findability. 20

URL Findability Findability as a Function of URL Length 21

What applications we gained from the data ?? Providing metadata for pages. Providing query alterations for use in query refinement. Identifying ranking issues. 22

Query Alterations from Game Data Bitext Matching Results from Bitext Matching Query Alterations : – Altering the user’s query using other terms which may have been intended. 23

Bitext Matching 1.Most likely alignment2.Phrase pairs Flip the assignment and train the models again. Bitext (A,B) Bitext (B,A) HMM Model Target side Source side Trained hidden word alignment Bitext (A,B) P(0.5) P(0.2) P(0.1) …… P(0.5) P(0.2) P(0.1) …… B A 24

Bitext Matching “new york department labor” “new york state department of labor” Extract the phrase pairs “ new york / new york state ” “ department labor / department of labor ” “ department / department of ” Assume all the words are aligned expect for “state” ”of” Not be extracted because some words inside the phrase on one side are aligned to words outside the phrase on the other side. 25

Bitext Matching We can write the probability of a target sequence and word alignment as: – s1 to sm be the source tokens – t1 to tn be the target tokens – a1 to an be the hidden alignment, where ai is a value ranging between 0 and m. 26

Results from Bitext Matching Spelling or punctuation alterations: – JC Penny for JC Penney,or J C Penny. Sitename to site alterations: – acid reflux and acidrefluxconnection.com Acronym-Expansion alterations: – iht is the International Herald Tribune, – sls refers to the Society of Laproendoscopic Surgeons etc. Conceptual alterations: – capital city airport is a valid alteration for Kentucky airport; – jlo is a valid alteration for Jennifer Lopez etc. 27

Results from Bitext Matching 28

CONCLUSIONS – Present a page-centric approach to improving Web search using human computation games. – the query data we obtain from the Page Hunt game is not very different from queries used on search engines.(good findaility) – a process to extract query alterations from game data, using the idea of bitext matching. 29

THANKS FOR YOUR LISTENING! 30