Information Retrieval and Extraction 2009 Term Project – Modern Web Search Advisor: 陳信希 TA: 蔡銘峰&許名宏.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Chapter 5: Introduction to Information Retrieval
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Deliverable #3: Document and Passage Retrieval Ling 573 NLP Systems and Applications May 10, 2011.
Search Engines and Information Retrieval
Modern Information Retrieval Chapter 5 Query Operations.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Information Retrieval Ch Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.
1 Web Search Interfaces. 2 Web Search Interface Web search engines of course need a web-based interface. Search page must accept a query string and submit.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Lemur Indri Search Engine Yatish Hegde 03/03/2010.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking, Crawling and Indexing in IR.
Search Engines and Information Retrieval Chapter 1.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
RANKING SUPPORT FOR KEYWORD SEARCH ON STRUCTURED DATA USING RELEVANCE MODEL Date: 2012/06/04 Source: Veli Bicer(CIKM’11) Speaker: Er-gang Liu Advisor:
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
IR Homework #1 By J. H. Wang Mar. 21, Programming Exercise #1: Vector Space Retrieval Goal: to build an inverted index for a text collection, and.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Web- and Multimedia-based Information Systems Lecture 2.
IR Homework #1 By J. H. Wang Mar. 16, Programming Exercise #1: Vector Space Retrieval - Indexing Goal: to build an inverted index for a text collection.
Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
Information Retrieval and Extration 期末專題實驗 — Relevant Sentence Detection.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.
INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Information Retrieval and Extraction 2010 Term Project – Modern Web Search Advisor: 陳信希 TA: 許名宏 & 王界人.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.
IR Homework #2 By J. H. Wang Apr. 13, Programming Exercise #2: Query Processing and Searching Goal: to search for relevant documents Input: a query.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Implementation Issues & IR Systems
Wikitology Wikipedia as an Ontology
IR Theory: Evaluation Methods
Relevance and Reinforcement in Interactive Browsing
Learning to Rank with Ties
Presentation transcript:

Information Retrieval and Extraction 2009 Term Project – Modern Web Search Advisor: 陳信希 TA: 蔡銘峰&許名宏

Overview (in English) The goal The goal –Using advanced approaches to improve the performance of basic IR models Group Group –1~3 person(s) per group; the name list to the TA Approach Approach –No limitations; Any resources on the Web is usable. Date of system demo and report submission Date of system demo and report submission –6/18 Thursday (provisional) Criteria for the grade Criteria for the grade –Originality and reasonableness of your approach –Effort for implementation / per person –Retrieval performance (training & testing) –Completeness of the report, division of the work and analysis for the retrieval results

Overview (in Chinese) 專題目標 專題目標 – 以進階 IR 技術提升基本檢索模型的效能 分組 分組 –1~3 人 / 組,請組長將組員名單 ( 學號、姓名 ) 給 TA 方法 方法 – 不限,可使用任何 toolkit or resource on web Demo 及報告繳交 Demo 及報告繳交 – 暫定 6/18 Thursday 評分標準 評分標準 – 所採用的方法創意、合理性 –Effort for implementation / per person – 檢索效能 (training & testing) – 報告完整性、分工及檢索結果分析

Content in the Report Detail description about your approach Detail description about your approach Parameter setting (if parametric) Parameter setting (if parametric) System performance on the training topics System performance on the training topics –The baseline performance –The performance of your approach Division of the work ( 如何分工 ) Division of the work ( 如何分工 ) What you have learned ( 心得 ) What you have learned ( 心得 ) Others (optional) Others (optional)

Basic IR Models Vector space model Vector space model –Lucene Probabilistic model Probabilistic model –Okapi-BM25 Language model Language model –Indri (Lemur toolkit)

Possible Approaches Pseudo relevance feedback (PRF) Pseudo relevance feedback (PRF) –Supported by Lemur API Simple and effective, but no originality Simple and effective, but no originality Query expansion Query expansion –Using external resources ex: WordNet, Wikipedia, query log...etc Word sense disambiguation in docs/query Word sense disambiguation in docs/query Combining Results from 2 or more IR systems Combining Results from 2 or more IR systems Learning to rank Learning to rank –What are the useful features? Others Others

Experimental Dataset A partial collection of TREC WT10g A partial collection of TREC WT10g –Link information is provided 30 topics for system development 30 topics for system development Another 30 topics for the demo Another 30 topics for the demo

Topic Example <top> Number: 476 Number: 476 Jennifer Aniston Jennifer Aniston Description: Description: Find documents that identify movies and/or television programs that Jennifer Aniston has appeared in. Narrative: Narrative: Relevant documents include movies and/or television programs that Jennifer Aniston has appeared in. </top>

Document Example <DOC><DOCNO>WTX010-B01-2</DOCNO><DOCOLDNO>IA B </DOCOLDNO><DOCHDR> text/html 264 HTTP/ OK Date: Sunday, 16-Feb-97 18:19:32 GMT Server: NCSA/SMI-1.0 MIME-version: 1.0 Content-type: text/html Last-modified: Friday, 02-Feb-96 19:51:15 GMT Content-length: 82 </DOCHDR> 1 Mr. Delleney did not participate in deliberation of this candidate. 1 Mr. Delleney did not participate in deliberation of this candidate.</DOC>

Link Information In-links In-links –“ A B C ”  B and C contain links to A ex: WTX010-B WTX010-B WTX010-B Out-links Out-links –“ A B C ”  A contains links pointed to B or C ex: WTX010-B WTX010-B01-89 WTX010-B01-119

Evaluation Evaluate top 100 retrieved documents Evaluate top 100 retrieved documents Evaluation metrics Evaluation metrics –Mean average precision (MAP) Use the program “ ireval” to evaluate system performance Use the program “ ireval” to evaluate system performance –Usage of ireval Usage of irevalUsage of ireval

Example Result for Evaluation 465Q0WTX017-B test 465 Q0WTX017-B test 465Q0WTX017-B test 465 Q0WTX017-B test 465 Q0WTX017-B test 465 Q0WTX018-B test 465 Q0WTX018-B test 465 Q0WTX012-B test 465 Q0WTX019-B test 465 Q0WTX019-B test 474 Q0WTX012-B test 474 Q0WTX017-B test 474 Q0WTX018-B test 474 Q0WTX013-B test 474 Q0WTX018-B test 474 Q0WTX015-B test 474 Q0WTX019-B test 474 Q0WTX014-B test 474 Q0WTX018-B test 474 Q0WTX018-B test

Dataset Description (1/2) “ training_topics.txt” (file) “ training_topics.txt” (file) –30 topics for system development “ qrels_training_topics.txt” (file) “ qrels_training_topics.txt” (file) –Relevance judgments for training topics “ documents ” (directory) “ documents ” (directory) –Including 10.rar files of raw documents “ in_links.txt” (file) “ in_links.txt” (file) –In-link information “ out_links.txt ” (file) “ out_links.txt ” (file) –Out-link information

Dataset Description (2/2) “ ireval.jar ” (file) “ ireval.jar ” (file) –A Java program for evaluation “ irevalGUI.jar” (file) “ irevalGUI.jar” (file) –GUI of ireval.jar